简体   繁体   English

Python中的生物信息学序列聚类

[英]bioinformatics sequence clustering in Python

I am trying to find a new method to cluster sequence data. 我试图找到一种新的方法来聚类序列数据。 I implemented my method and got an accuracy rate for it. 我实现了我的方法,并获得了准确率。 Now I should compare it with available methods to see whether it works as I expected or not. 现在我应该将它与可用的方法进行比较,看看它是否像我预期的那样工作。

Is it possible to tell me what are the most famous methods in bioinformatics domain and what are the packages corresponded to those methods in Python? 是否有可能告诉我生物信息学领域最着名的方法是什么?与Python中的那些方法相对应的包是什么? I am an engineer and have no idea about the most accurate methods in this field that I should compare my method to them. 我是一名工程师,不知道这个领域中最准确的方法,我应该将我的方法与它们进行比较。

Two common used methods are: 两种常用的方法是:

Both are command line tools and written in C++ (i think) 两者都是命令行工具,用C ++编写(我认为)

It also depends on the question for which tool you need(data reduction, otu clustering, making a tree, etc..). 它还取决于您需要哪种工具的问题(数据缩减,otu聚类,制作树等等)。 These days you see a shift in cluster tools that uses a more dynamic approach instead of a fixed similarity cutoff. 现在,您看到群集工具的转变,它使用更动态的方法而不是固定的相似性截止。 Example: 例:

  • DADA2 DADA2
  • UNOISE UNOISE
  • Seekdeep Seekdeep

Fixed clustering: 固定聚类:

  • CD-HIT CD-HIT
  • uclust uclust
  • vsearch vsearch

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM