Hello I am quite new to python and started taking classes for biologists but I have a problem with an assignment in python and just can't figure it out. From a .txt file i should find 2 restriction enzymes (basically just letters), "gatc" with an g or a in front and c or t in the back so: "[ga]gatc[ct]". This is 2 times in the text file and i should find out the length between them(xxxx[ga]gatc[ct] xxxxxxx [ga]gatc[ct]xxxx) -->how many x are between them . I tried to put it in groups but i make something wrong. xxxx is an unknown number of letters that is made up of "g" "a" "t" "c" : like ctactatctcatcttaaccttaa for example
My current code is:
import regex
file = "enzyme.txt"
f=open(file, "r")
content = f.read()
print(content)
pattern = regex.compile("[ga]gatc[ct]")
for line in open("enzyme.txt"):
for match in regex.finditer (pattern, line):
print(match.group())
print(line)
for lines in f:
m=regex.search("[ga]gatc[ct] {*} [ga]gatc[ct]", lines)
if m:
print(len(str(m.start(1)) + str(m.end(2))))
it shows me the correct sequence and prints the line in which it is but i don't know how to find the length in between them. the second part of the code doesn't do anything but also shows no error message.
In my perspective this will be a naive solution.
pattern = "[ga]gatc[ct]"
with open("enzyme.txt") as file:
for line in file:
parsed = line.split(pattern)[1]
print(len(parsed))
str.split
will divide the line into pieces according to given pattern [ga]gatc[ct]
1
for the xxxxxxxx
because index 0
will be ''
. An empty string.print(len(parsed))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.