On Whole Genome Classifier Perforamnce in Relation to 16S Classifiers

Johnson, James

On Whole Genome Classifier Perforamnce in Relation to 16S Classifiers

Search for this publication on Google Scholar

Johnson, J. (2022). On Whole Genome Classifier Perforamnce in Relation to 16S Classifiers. Unc Charlotte Electronic Theses And Dissertations.

Download PDF

Analytics

36 views ◎
18 downloads ⇓

Abstract

There is little consensus in the literature as to which approach for classification of Whole Genome Shotgun (WGS) sequences is most accurate unlike 16S classifiers that have had more time to mature. In this dissertation, two of the most popular classification algorithms, Kraken2 and Metaphlan2, were examined using four publicly available datasets. Surprisingly, Kraken2 reported not only more taxa but many more taxa that were significantly associated with metadata. By comparing the Spearman correlation coefficients of each taxa in the dataset against more abundant taxa, it was found that Kraken2, but not Metaphlan2, showed a consistent pattern of classifying low abundance taxa that were highly correlated with the more abundant taxa. Neither Metaphlan2, nor 16S sequences that were available for two of four datasets, showed this pattern. These results suggest that Kraken2 consistently misclassified high abundance taxa into the same erroneous low abundance taxa. These "phantom" taxa have a similar pattern of inference as the high abundance source. Because of the ever-increasing sequencing depths of modern WGS cohorts, these "phantom" taxa will appear statistically significant in statistical models even with a low classification error rate from Kraken2. These findings suggest a novel metric for evaluating classifier accuracy.

Details

Author: Johnson, James
Title: On Whole Genome Classifier Perforamnce in Relation to 16S Classifiers
Physical Description: 1 online resource (73 pages) : PDF
Date: 2022
Degree Granting Institution: University of North Carolina at Charlotte
Abstract: There is little consensus in the literature as to which approach for classification of Whole Genome Shotgun (WGS) sequences is most accurate unlike 16S classifiers that have had more time to mature. In this dissertation, two of the most popular classification algorithms, Kraken2 and Metaphlan2, were examined using four publicly available datasets. Surprisingly, Kraken2 reported not only more taxa but many more taxa that were significantly associated with metadata. By comparing the Spearman correlation coefficients of each taxa in the dataset against more abundant taxa, it was found that Kraken2, but not Metaphlan2, showed a consistent pattern of classifying low abundance taxa that were highly correlated with the more abundant taxa. Neither Metaphlan2, nor 16S sequences that were available for two of four datasets, showed this pattern. These results suggest that Kraken2 consistently misclassified high abundance taxa into the same erroneous low abundance taxa. These "phantom" taxa have a similar pattern of inference as the high abundance source. Because of the ever-increasing sequencing depths of modern WGS cohorts, these "phantom" taxa will appear statistically significant in statistical models even with a low classification error rate from Kraken2. These findings suggest a novel metric for evaluating classifier accuracy.
Genre: doctoral dissertations
Subjects--Topics: Bioinformatics
Degree: Ph.D.
Keywords: Classifier
Kraken
Metaphlan
Microbiome
Whole Genome Shotgun
Subject Area: Bioinformatics
Advisor(s): Fodor, Anthony
Committee Members: Gibas, Cynthia
White, Richard
Dornburg, Alex
Steck, Todd
Rice-Boayue, Jacelyn
Degree Note: Thesis (Ph.D.)--University of North Carolina at Charlotte, 2022.
Rights Statement: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Rights Holder Information: Copyright is held by the author unless otherwise indicated.
Identifier: Johnson_uncc_0694D_13254
Permalink: http://hdl.handle.net/20.500.13093/etd:3131

J. Murrey Atkins Library

J. Murrey Atkins Library