I am writing a code that read a large text file line by line and find the line that starts with UNIQUE-ID (there are many of them in the file) and it comes right before a certain line (in this example, the one that starts with 'REACTION-LAYOUT -' and in which the 5th element in the string is OLEANDOMYCIN). The code is the following:
data2 = open('pathways.dat', 'r', errors = 'ignore')
pathways = data2.readlines()
PWY_ID = []
line_cont = []
L_PRMR = [] #Left primary
car = []
#i is the line number (first element of enumerate),
#while line is the line content (2nd elem of enumerate)
for i,line in enumerate(pathways):
if 'UNIQUE-ID' in line:
line_cont = line
PWY_ID_line = line_cont.rstrip()
PWY_ID_line = PWY_ID_line.split(' ')
PWY_ID.append(PWY_ID_line[2])
elif 'REACTION-LAYOUT -' in line:
L_PWY = line.rstrip()
L_PWY = L_PWY.split(' ')
L_PRMR.append(L_PWY[4])
elif 'OLEANDOMYCIN' in line:
car.append(PWY_ID)
print(car)
However, the output is instead all the lines that contain PWY_ID (output of the first if statement), like it was ignoring all the rest of the code. Can anybody help?
Edit
Below is a sample of my data (there are like 1000-ish similar "pages" in my textfile):
//
UNIQUE-ID - PWY-741
.
.
.
.
PREDECESSORS - (RXN-663 RXN-662)
REACTION-LAYOUT - (RXN-663 (:LEFT-PRIMARIES CPD-1003) (:DIRECTION :L2R) (:RIGHT-PRIMARIES CPD-1004))
REACTION-LAYOUT - (RXN-662 (:LEFT-PRIMARIES CPD-1002) (:DIRECTION :L2R) (:RIGHT-PRIMARIES CPD-1003))
REACTION-LAYOUT - (RXN-661 (:LEFT-PRIMARIES CPD-1001) (:DIRECTION :L2R) (:RIGHT-PRIMARIES CPD-1002))
REACTION-LIST - RXN-663
REACTION-LIST - RXN-662
REACTION-LIST - RXN-661
SPECIES - TAX-351746
SPECIES - TAX-644631
SPECIES - ORG-6335
SUPER-PATHWAYS - PWY-5266
TAXONOMIC-RANGE - TAX-1224
//
I think it would have been helpful if you'd posted some examples of data. But an approximation to what you're looking for is:
with open('pathways.dat','r', errors='ignore') as infile:
i = infile.read().find(string_to_search)
infile.seek(i+number_of_chars_to_read)
I hope this piece of code will help you focus your script on this line.
print(car)
is printing out the list of all lines added by PWD_ID.append(PWY_ID_line[2])
in the first if, since you are appending the whole list of PWD_ID to car
when you do car.append(PWY_ID)
. so, if you want to print out the list of lines with OLEANDOMYCIN, you might want to just do car.append(line).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.