A De Novo Clustering Method: Snowball for Assigning 16S Operational Taxonomic Units

Mehmet Can, Osman Gursoy


To analyze complex biodiversity in microbial communities, 16S rRNA marker gene sequences are often assigned to operational taxonomic units (OTUs). The abundance of methods that have been used to assign 16S rRNA marker gene sequences into OTUs brings discussions in which one is better. Suggestions on having clustering methods should be stable in which generated OTU assignments do not change as additional sequences are added to the dataset is contradicting some other researches contend that the methods should properly present the distances of sequences is more important. We add one more de novo clustering algorithm, Rolling Snowball to existing ones including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. We use GreenGenes, RDP, and SILVA 16S rRNA gene databases to show the success of the method. The highest accuracy is obtained with SILVA library.


16S rRNA gene; LongestCommonSubsequence; Taxonomic clustering; Snowball

Full Text:


DOI: http://dx.doi.org/10.21533/scjournal.v9i1.184


  • There are currently no refbacks.

Copyright (c) 2020 Mehmet Can, Osman Gursoy

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License