简体   繁体   English

不断获取IndexError并且不确定为什么在Python中

[英]Constantly getting IndexError and am unsure why in Python

I am new to python and really programming in general and am learning python through a website called rosalind.info, which is a website that aims to teach through problem solving. 我是python的新手,实际上是一般的编程人员,并且正在通过一个名为rosalind.info的网站学习python,该网站旨在通过解决问题的方式进行教学。

Here is the problem, wherein you're asked to calculate the percentage of guanine and thymine to the string of DNA given to for each ID, then return the ID of the sample with the greatest percentage. 这是问题所在,其中要求您计算每个ID给定的DNA串中鸟嘌呤和胸腺嘧啶的百分比,然后返回具有最大百分比的样品的ID。

I'm working on the sample problem on the page and am experiencing some difficulty. 我正在研究页面上的示例问题,遇到了一些困难。 I know my code is probably really inefficient and cumbersome but I take it that's to be expected for those who are new to programming. 我知道我的代码可能确实效率低下并且麻烦,但是我认为这对那些刚接触编程的人来说是可以预料的。

Anyway, here is my code. 无论如何,这是我的代码。

gc = open("rosalind_gcsamp.txt","r")
biz = gc.readlines()
i = 0
gcc = 0
d = {}
for i in xrange(biz.__len__()):
    if biz[i].startswith(">"):
        biz[i] = biz[i].replace("\n","")
        biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
        del biz[i+2]

What I'm trying to accomplish here is, given input such as this: 我要在这里完成的工作是给定这样的输入:

>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG

Break what's given into a list based on the lines and concatenate the two lines of DNA like so: 根据这些行将给出的内容分成一个列表,然后将DNA的两行连接起来,如下所示:

['>Rosalind_6404', 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG', 'TCCCACTAATAATTCTGAGG\n']

And delete the entry two indices after the ID, which is >Rosalind. 并删除ID后面的两个索引,即> Rosalind。 What I do with it later I still need to figure out. 以后我该怎么做,我仍然需要弄清楚。

However, I keep getting an index error and can't, for the life of me, figure out why. 但是,我不断收到索引错误,并且一生都无法找出原因。 I'm sure it's a trivial reason, I just need some help. 我敢肯定这是一个琐碎的原因,我只需要一些帮助。

I've even attempted the following to limited success: 我什至尝试了以下以取得有限成功的方式:

for i in xrange(biz.__len__()):
if biz[i].startswith(">"):
    biz[i] = biz[i].replace("\n","")
    biz[i+1] = biz[i+1].replace("\n","") + biz[i+2].replace("\n","")
elif biz[i].startswith("A" or "C" or "G" or "T") and biz[i+1].startswith(">"):
    del biz[i]

which still gives me an index error but at least gives me the biz value I want. 这仍然给我一个索引错误,但至少给了我想要的业务价值。

Thanks in advance. 提前致谢。

You are looping over the length of biz. 您正在遍及biz的长度。 So in your last iteration biz[i+1] and biz[i+2] don't exist. 因此,在您的上一次迭代中, biz[i+1]biz[i+2]不存在。 There is no item after the last. 最后一个之后没有项目。

It is very easy do with itertools.groupby using lines that start with > as the keys and as the delimiters: 使用itertools.groupby非常容易,将以>开头的行用作键和定界符:

from itertools import groupby
with open("rosalind_gcsamp.txt","r") as gc:
    # group elements using  lines that start with ">" as the delimiter
    groups = groupby(gc, key=lambda x: not x.startswith(">"))
    d = {}
    for k,v in groups:
        # if k is False we a non match to our not x.startswith(">")
        # so use the value v as the key and call next on the grouper object
        # to get the next value
        if not k:
            key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
            d[key] = val

print(d)
{'>Rosalind_0808': 'CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT', '>Rosalind_5959': 'CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC', '>Rosalind_6404': 'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG'}

If you need order use a collections.OrderedDict in place of d. 如果需要订购,请使用collections.OrderedDict代替d。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM