简体   繁体   中英

Search file for string and copy all lines following until string2

I am writing a script in python3 but I can't solve the following problem.

I have a list of names with this pattern:

ZINC123456
ZINC234567
ZINC345678
ZINC456789
...

and I have a big file like this:

ZINC123456
xxx
xxx
xxx
ZINC987654
xxy
xxy
xxy
xxy
ZINC654987
...

What I want to do is: Loop over every item in the first list and search it in the second file. When this item is found copy this line and all the following until the next ZINCxxxxxx pattern is reached into a new file.

How can I do this? Thank you very much for your help!

EDIT: Thanks to Sudipta Chatterjee I found the following solution:

import sys
finZ=open(sys.argv[1],'r')
finX=open('zinc.sdf','r')
fout=open(sys.argv[1][:7]+'.sdf','w')

list=[]
thislinehaszinc = False
zincmatching    = False

for zline in finZ:
if zline[0:4] == "ZINC":
    name = zline[:-1] #line[4:-1]
    if name not in list:
        list.append(name)

for xline in finX:
if xline[0:4] == "ZINC":
    thislinehaszinc = True
    zincmatching    = False
    for line in list:
        if line == xline[:-1]:
            zincmatching    = True
            fout.write(xline)
            print('Found: '+xline)
            pass
        else:
            pass
else:
    thislinehaszinc = False

if thislinehaszinc == False and zincmatching == True:
    fout.write(xline)
# Clarified from comments - the program is to act as a filter so that any lines
# which have a pattern 'ZINC' in the second file but do not belong in the first
# should stop the dump until the next matching zinc is found

fileZ = open ('file_with_zinc_only.txt', 'r').readlines()
fileX = open ('file_with_x_info.txt', 'r').readlines()
fileOutput = open ('file_for_output.txt', 'w')

thisLineHasZinc = False
zincMatching = False

for xline in fileX:
    #print "Dealing with", xline
    if len(xline.split('ZINC')) != 1:
        thisLineHasZinc = True
        zincMatching = False
        for zline in fileZ:
            #print "Trying to match",zline
            if zline == xline:
                #print "************MATCH***************"
                zincMatching = True
                fileOutput.write (zline)
                #print "**",xline
                break
    else:    
        thisLineHasZinc = False

    # If we are currently under a block where we've found a ZINC previously
    # but not yet reached another ZINC line, write to file
    #print 'thisLineHasZinc',thisLineHasZinc,'zincMatching',zincMatching
    if thisLineHasZinc == False and zincMatching == True:
        fileOutput.write (xline)
        #print "**** "+ xline

fileOutput.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM