简体   繁体   English

使用Biopython将多个FASTA文件转换为Nexus时出错

[英]Error in converting multiple FASTA files to Nexus using Biopython

I want to convert multiple FASTA format files (DNA sequences) to the NEXUS format using BIO.SeqIO module but I get this error: 我想使用BIO.SeqIO模块将多个FASTA格式文件(DNA序列)转换为NEXUS格式,但是我收到此错误:

Traceback (most recent call last):
  File "fasta2nexus.py", line 28, in <module>
    print(process(fullpath))
  File "fasta2nexus.py", line 23, in process
    alphabet=IUPAC.ambiguous_dna)
  File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 1003, in convert
    with as_handle(in_file, in_mode) as in_handle:
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/Library/Python/2.7/site-packages/Bio/File.py", line 88, in as_handle
    with open(handleish, mode, **kwargs) as fp:
IOError: [Errno 2] No such file or directory: 'c'

What am I missing? 我错过了什么?

Here is my code: 这是我的代码:

##!/usr/bin/env python

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC


test = "/Users/teton/Desktop/test"

files = os.listdir(os.curdir)

def process(filename):
    # retuns ("basename", "extension"), so [0] picks "basename"
    base = os.path.splitext(filename)[0] 
    return SeqIO.convert(filename, "fasta", 
                         base + ".nex", "nexus", 
                         alphabet=IUPAC.ambiguous_dna)

for files in os.listdir(test):
    for file in files:
        fullpath = os.path.join(file)
        print(process(fullpath))

This code should solve the majority of problems I can see. 这段代码应该解决我能看到的大多数问题。

from __future__ import print_function # or just use Python 3!

import fileinput
import os
import re
import sys

from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC

test = "/Users/teton/Desktop"

def process(filename):
    # retuns ("basename", "extension"), so [0] picks "basename"
    base = os.path.splitext(filename)[0] 
    return SeqIO.convert(filename, "fasta", 
                         base + ".nex", "nexus", 
                         alphabet=IUPAC.ambiguous_dna)

for root, dirs, files in os.walk(test):
    for file in files:
        fullpath = os.path.join(root, file)
        print(process(fullpath))

I changed a few things. 我改变了一些事情。 First, I ordered your imports (personal thing) and made sure to import IUPAC from Bio.Alphabet so you can actually assign the correct alphabet to your sequences. 首先,我订购了你的进口(个人Bio.Alphabet )并确保从Bio.Alphabet导入IUPAC ,这样你就可Bio.Alphabet你的序列分配正确的字母。 Next, in your process() function, I added a line to split the extension off the filename, then used the full filename for the first argument, and just the base (without the extension) for naming the Nexus output file. 接下来,在您的process()函数中,我添加了一行来从文件名中拆分扩展名,然后使用第一个参数的完整文件名,以及用于命名Nexus输出文件的基数(不带扩展名)。 Speaking of which, I assume you'll be using the Nexus module in later code? 说到这,我假设您将在以后的代码中使用Nexus模块? If not, you should remove it from the imports. 如果没有,您应该从导入中删除它。

I wasn't sure what the point of the last snippet was, so I didn't include it. 我不确定最后一个片段的重点是什么,所以我没有包含它。 In it, though, you appear to be walking the file tree and process() ing each file again , then referencing some undefined variable named count . 但是,在它中,您似乎正在遍历文件树并再次 process()每个文件,然后引用一些名为count未定义变量。 Instead, just run process() once, and do whatever count refers to within that loop. 相反,只需运行process()一次,并在该循环中执行任何count

You may want to consider adding some logic to your for loop to test that the file returned by os.path.join() actually is a FASTA file. 您可能需要考虑在for循环中添加一些逻辑来测试os.path.join()返回的文件实际上 FASTA文件。 Otherwise, if any other file type is in one of the directories you search and you process() it, all sorts of weird things could happen. 否则,如果任何其他文件类型位于您搜索的某个目录中并且您process()它,则可能发生各种奇怪的事情。

EDIT 编辑

OK, based on your new code I have a few suggestions. 好的,根据您的新代码,我有一些建议。 First, the line 一,行

files = os.listdir(os.curdir)

is completely unnecessary, as below the definition of the process() function, you're redefining the files variable. 完全没必要,因为在process()函数的定义下面,你正在重新定义files变量。 Additionally, the above line would fail, as you are not calling os.curdir() , you are just passing its reference to os.listdir() . 另外,上面的行会失败,因为你没有调用os.curdir() ,你只是将它的引用传递给os.listdir()

The code at the bottom should simply be this: 底部的代码应该是这样的:

for file in os.listdir(test):
    print(process(file))

for file in files is redundant, and calling os.path.join() with a single argument does nothing. for file in files是多余的,并且使用单个参数调用os.path.join()不会执行任何操作。

  1. NameError NameError

You imported SeqIO but are calling seqIO.convert(). 您导入了SeqIO但正在调用seqIO.convert()。 Python is case-sensitive. Python区分大小写。 The line should read: 该行应为:

return SeqIO.convert(filename + '.fa', "fasta", filename + '.nex', "nexus", alphabet=IUPAC.ambiguous_dna)
  1. IOError: for files in os.walk(test): IOError: for files in os.walk(test):

IOError is raised when a file cannot be opened. 无法打开文件时引发IOError。 It often arises because the filename and/ or file path provided does not exist. 这通常是因为提供的文件名和/或文件路径不存在。

os.walk(test) iterates through all subdirectories in the path test . os.walk(test)遍历路径test中的所有子目录。 During each iteration, files will be a list of 3 elements. 在每次迭代期间, files将是3个元素的列表。 The first element is the path of the directory, the second element is a list of subdirectories in that path, and the third element is a list of files in that path. 第一个元素是目录的路径,第二个元素是该路径中的子目录列表,第三个元素是该路径中的文件列表。 You should be passing a filename to process() , but you are passing a list in process(files) . 您应该将文件名传递给process() ,但是您正在传递一个列表process(files)

You have implemented it correctly in this block for root, dirs, files in os.walk(test): . 您已在此块中正确实现for root, dirs, files in os.walk(test): . I suggest you implement it similarly in the for loop below. 我建议你在下面的for循环中类似地实现它。

  1. You are adding .fa to your filename . 您正在将.fa添加到您的filename Don't add .fa . 不要添加.fa

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM