简体   繁体   中英

Search for content segments of one .txt file in another .txt file digit by digit and print matching lines

I have two txt files: file1.txt and file2.txt.

File1.txt contains the following:

12345678

File2.txt contains this:

34567999
23499899
13571234

I now want to look at the first 3 digits of line 1 of file1.txt (which are "123"). I now want to go to file2.txt and search for these three digits ("123"). When I find these digits in that order in a line, (ie: this would be the case in line 3: 1357 123 4), I want to write this line to a new file: file_new.txt.

Then, if all lines in file2.txt have been searched for this sequence from file1.txt ("123"), I want to move one digit further in file1.txt, so that the new search query is "234". Now, I want to go to file2.txt again to search for all sequences with "234" in the, (ie: line 2 ( 234 99899) and line 3 (13571 234 )). As line 3 is already contained in file_new.txt, I only want to write line 2 to file_new.txt.

I want to continue this process, searching for the next three digits until the whole line in file1.txt has been search for in file2.txt.

Could someone please help me tackle this problem?

You can use readlines to read text file into list and then generate a new list L using a while loop as below. You can then write this list L to a text file.

with open(file1_path) as file1:
    search_string = file1.readlines()[0]

with open(file2_path) as file2:
    strings_to_search = file2.readlines()

L= []
n=0 
while n < len(search_string):
    for i in strings_to_search:
        if search_string[n:n+3] in i and i not in L:
            L.append(i)
        n +=1

I got a little solution here :

f1 = open('file1.txt', 'r') # open in read mode

for digit in range(len(f1.readlines()[0])-2):
    threedigits = f1.readlines()[0][digit:digit+3] # This is the first three digits

    f2 = open('file2.txt', 'r') # open in read mode
    lines = f2.readlines() # we read all lines
    f2.close()
    file_new = []
    for i in lines:
        if firstthreedigits in i:
            file_new.append(i) # we add each lines containing the first three digits

    f3 = open('file_new.txt', 'w') # open in write mode
    for i in range(len(file_new)):
        f3.write(file_new[i]) # we write all lines with first three digits
    f3.close()

f1.close()

This should to it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM