简体   繁体   中英

Match text in 2 text files and get information from 1st file and append to another

I have 2 text files. 1st file contains meta-data information(like font size etc.) of text and 2nd file contains only text. I need to match text between 2 files, take meta-data information from 1st file and prepend it in the 2nd file. For example,

File A Data:

[Base Font : PSJEPX+Muli-Light, Font Size : 7.5, Font Weight : 300.0]We are not satisfied with our 2018 results. We have the global footprint, assets and team to 
[Base Font : SVTVFR+Muli-Light, Font Size : 7.5, Font Weight : 300.0] 

[Base Font : PSJEPX+Muli-Light, Font Size : 7.5, Font Weight : 300.0]perform better. We have made a number of changes to position for sustainable growth.
New line that does not start with square brackets.
[Base Font : SVTVFR+Muli-SemiBold, Font Size : 8.1, Font Weight : 600.0]Innovation

File B Data :

We are not satisfied with our 2018 results. We have the global footprint, assets and team to perform better. We have made a number of changes to position for sustainable growth.
New line that does not start with square brackets.

Innovation

Expected Output :

[Base Font : PSJEPX+Muli-Light, Font Size : 7.5, Font Weight : 300.0]We are not satisfied with our 2018 results. We have the global footprint, assets and team to perform better. We have made a number of changes to position for sustainable growth.
New line that does not start with square brackets.

[Base Font : SVTVFR+Muli-SemiBold, Font Size : 8.1, Font Weight : 600.0]Innovation

So, basically, the metadata from "File A" must be attached to "File B" only when there is a change in metadata information.

My Approach :

 def readB(x):
     with open(File B) as resultFile:
         for line in resultFile:
             if x in line:
                 print(x)

def readA():
    with open(File A) as bondNumberFile:
        for line in bondNumberFile:
            readB(line)

readA()

My problem is, I am not sure how to take the metadata information from File A and attach it to File B. Also, my code is not able to handle the metadata information(inside square brackets) while matching text.

Please try below program. This program first reads filea and creates a dictionary of style and lines, then reads fileb line by line to match and pick style from dictionary, and writes it to filec .

import re
table={}
with open("filea.txt","r") as f:
    for line in f:
        if line.strip():
            parts=re.findall("^(\[.*?\])?(.*)$",line)[0]
            if parts[0] in table:
                table[parts[0]]+=parts[1]
            else:
                table[parts[0]]=parts[1]
with open("fileb.txt","r") as f, open("filec.txt","w") as f1:
    for line in f:
        if line.strip():
            for i in table:
                if line.strip() in table[i]:
                    f1.write(i+line)
                    break
                else:
                    pass
        else:
            f1.write(line)

Output

[Base Font : PSJEPX+Muli-Light, Font Size : 7.5, Font Weight : 300.0]We are not satisfied with our 2018 results. We have the global footprint, assets and team to perform better. We have made a number of changes to position for sustainable growth.
New line that does not start with square brackets.

[Base Font : SVTVFR+Muli-SemiBold, Font Size : 8.1, Font Weight : 600.0]Innovation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM