[英]Python find function not working. What am I doing wrong?
I'm a hobbyist programmer (my actual my actual major is biology), so I apologize if the code is atrocious. 我是一个业余程序员(我的实际专业是生物学),所以我对代码的恶意表示歉意。
Anyway, I'm doing a rosalind.info exercise ( http://rosalind.info/problems/subs/ ) that wants to me find every index where a specific DNA motif is contained within a larger DNA sequence. 无论如何,我正在进行rosalind.info练习( http://rosalind.info/problems/subs/ ),希望我能找到每个索引,其中较大的DNA序列中包含特定的DNA基序。 Basically, I need to find the indexes of a substring in a string.
基本上,我需要在字符串中找到子字符串的索引。 Should be easy, right?
应该很容易吧? Well, maybe you can help me.
好吧,也许你可以帮助我。
So here's my code: 所以这是我的代码:
with open('rosalind_subs.txt') as f:
seq = f.readline()
seq.strip()
subs = f.readline()
subs.strip()
break
def finder(x, y):
index = x.find(y)
return index
print("sequence is: " + seq)
print("subs is: " + subs)
print(finder(seq, subs))
And here's my output: 这是我的输出:
sequence is: ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT
motif is: CTCTTTTCT
-1
***Repl Closed***
I left the ***Repl Closed***
in there in an effort to leave no stone unturned. 我把
***Repl Closed***
留在那里,努力不遗余力。 Maybe it has something to do with Sublime REPL? 也许与Sublime REPL有关?
Anyway, you probably can't tell just by looking, but the motif is actually found MANY times in the DNA sequence, it's just the find function isn't picking up on it. 无论如何,您可能无法仅凭外观就能辨别出来,但是实际上在DNA序列中发现了基序很多次,只是查找功能没有被发现。 What gives?
是什么赋予了?
break is not applicable in with scope. 中断不适用于范围。 Please remove and try it.
请删除并尝试。 I have tested below code.
我已经测试了下面的代码。
with open('rosalind_subs.txt') as f:
seq = f.readline()
seq.strip()
subs = f.readline()
subs.strip()
def finder(x, y):
index = x.find(y)
return index
print("sequence is: " + seq)
print("subs is: " + subs)
print(finder(seq, subs))
The output is 输出是
>>>
sequence is: ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT
subs is: CTCTTTTCT
15
Also a fellow biologist here who has done several rosalind.info exercises . 也是这里的一位生物学家, 他做了几次rosalind.info练习 。
First of, your code to read in the sequence and the motif could be improved by using splitlines()
, which takes care of removing the newline. 首先,可以通过使用
splitlines()
来改进序列中读取的代码和主题,该代码将删除换行符。 also notice how I use tuple unpacking to assign both the seq
and motif
variable at once. 还请注意,我如何使用元组拆包一次分配
seq
和motif
变量。
with open('rosalind_subs.txt') as f:
seq, motif = f.read().splitlines()
Next, you correctly noticed that find
only returns the index of the first occurrence of your motif. 接下来,您正确地注意到
find
仅返回您的主题第一次出现的索引。 To find all the occurrences, it helps to know that find takes another optional argument start
. 要查找所有出现的事件,有助于了解find需要另一个可选参数
start
。 If you provide that, it starts to look from that index position. 如果您提供它,它将开始从该索引位置开始查找。 Use this in a loop and you get all your indexes.
循环使用它,您将获得所有索引。
Another approach is to use regular expressions . 另一种方法是使用正则表达式 。 Beware that motifs can overlap each other, so you need to make use of a lookahead assertion .
注意主题可以相互重叠,因此您需要使用先行断言 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.