简体   繁体   中英

If string in one file matches string in another, print line and next line

I'm having trouble finishing off some python code I've been working on and will appreciate any suggestions. I have two files:

file1

>name1
>name3
>name4

file2

>name1 blah blah
aaaaaaaaaaaaaaaaaaaaaaaaa
>name2 blah blah
cccccccaaaaaaaaaaaaaaaaaa
>name3 blah blah
aaaaaattttttttttaaaaaaaaa
>name4 blah blah
aaaaaattttttttttggggggggg
>name5 blah blah
aaaggggcccctttttggggggggg

Each line of file1 contains a string also found in file2. For each line of file1, I would like to find the line it matches in file2, then print that line and the next line. This is my desired final result:

>name1 blah blah
aaaaaaaaaaaaaaaaaaaaaaaaa
>name3 blah blah
aaaaaattttttttttaaaaaaaaa
>name4 blah blah
aaaaaattttttttttggggggggg

I so far have the following code:

nums=set()
    with open("file1.txt") as file1:
        for line in file1:
            nums.add(line.strip())

    with open("file2.txt") as file2, open("out.txt", "wt") 
    as outfile:
        for line in file2:
            if any(word in line for word in nums):
                outfile.write(line)

This code presently contains two issues:

  • Any substring in file2 that matches a string in file1 is printed to outfile (using the example here, if >name3 is in the set nums, then lines starting with >name3 as well as >name31 and >name367 will be printed)

  • I haven't figured out how to print both the line that contains the match and the next line (perhaps this can be done with islice?)

Thanks for any advice!

First issue:

Any substring in file2 that matches a string in file1 is printed to outfile (using the example here, if >name3 is in the set nums, then lines starting with >name3 as well as >name31 and >name367 will be printed)

This problem can be solved in 2 ways.

  1. Just add space.

    If you're sure that after your "keyword" will be space, you can add just add space .

    Example:

     if any(word + " " in line for word in nums): 
  2. Regular expression.

    To solve this you can use regular expressions. You should import re and change:

     if any(word in line for word in nums): 

    To:

     if any(re.match(f"^{word}\\\\b", line) for word in nums): 

    Explanation: ^ means start of line, \\b is word boundary. Here is the link to website for regex testing.

Second issue:

I haven't figured out how to print both the line that contains the match and the next line (perhaps this can be done with islice?)

You iterate over file using for line in file2: which read file line by line. If you want to print next line you can use few methods.

  1. Boolean flag.

    To implement this you should declare boolean value before loop and set it to False . Inside loop you should write line to outfile if this variable is True and change it back to False . You should set True to this variable inside your current condition.

    Example:

     read_next = False for line in file2: if read_next: outfile.write(line) read_next = False if any(re.match(f"^{word}\\\\b", line) for word in nums): outfile.write(line) read_next = True 
  2. Change loop from for to while .

    You can use readline() method ( docs ) to iterate over file manually.

    Example:

     line = file2.readline() while line: line = line.strip() if any(re.match(f"^{word}\\\\b", line) for word in nums): outfile.write(line) line = file2.readline() if line: outfile.write(line) else: # if the end of file reached outfile.write("\\n") # delete it in case if you don't need this break line = f.readline() 
l=[]
# getting all the data from file and dividing them in two part and appending 
#them in a list
with open(r'C:\Users\user\RegForm.txt','r') as file:
    count =0 
    tmp=file.read().split('\n')
    for line in range(1,len(tmp),2):

        l.append([tmp[line-1],tmp[line]])


# getting all the value to search from file in a list
to_find=[]
with open(r'C:\Users\user\untitled0.txt','r') as file:
    for line in file:
        to_find.append(line.strip('\n'))

res =[]
# searching for file if they exist or not

for i in to_find:
    for j in l:
        if i in j[0]:
            print(j[0],j[1],sep='\n')
            break

"""
output

>name1 blah blah
aaaaaaaaaaaaaaaaaaaaaaaaa
>name3 blah blah
aaaaaattttttttttaaaaaaaaa
>name4 blah blah
aaaaaattttttttttggggggggg

"""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM