简体   繁体   中英

Python to search values in one text file, compare them with values in another text file, then replace values if there is a match

I have two files.

First file (~4 million entries) has 2 columns: [Label] [Energy]
Second file (~200,000 entries) has 2 columns: [Upper Label] [Lower Label]

For Example:

File 1:

375677 4444.5              
375678 6890.4        
375679  786.0

File 2:

375677 375679      
375678 375679

I want to replace the 'label' values in file 2 with the 'energy' values in file 1 such that file 2 becomes:

File 2(new):

4444.5 786.0   
6890.4 786.0

Or add the 'energy' values to file 2, such that file 2 becomes:

File 2(alternative):

375677 375679 4444.5 786.0  
375678 375679 6890.4 786.0

There must be a way to do this in python, but my brain is not working.

So far I have written

from sys import argv   
from scanfile import scanner   
class UnknownCommand(Exception): pass   

def processLine(line):       
  if line.startswith('23'):   
    print line[0:-1]

filename = 'test1.txt'   
if len(argv) == 2: filename = argv[1]   
scanner (filename, processLine)   

where scanfile is:

def scanner(name, function):   
  file = open(name, 'r')   
  while True:   
    line = file.readline()   
    if not line: break   
    function(line)   
  file.close()   

This allows me to search for, and print, the lable + value in file 1 by manually inserting the lable from file 2 (eg 23). Pointless and time-consuming.

I need to write a section which reads the lables from file 2 and puts them into 'line.startswith('lable') consecutively, until the end of file 2.

Any suggestions?

Thank you for your help.

Assuming that the labels in file1 are unique, I would first read that file into a dictionary:

with open('file1') as fd:
    data1 = dict(line.strip().split()
                 for line in fd if line.strip())

This gives a dictionary data1 with content like the following:

{
  '375677': '4444.5',
  '375678': '6890.4',
  '375679': '786.0',
}

Now, read through file2 , performing the appropriate modifications as you iterate through the file:

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print data1[data[0]], data1[data[1]]

Or, for your alternative:

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print ' '.join(data), data1[data[0]], data1[data[1]]

this approach worth taking only if 4M entries is too much for your memory

  1. create a set from all File2 ids (upper and lower)
  2. loop over the big file (File1) and create a dict only with entries in the map
  3. loop on File2 again and build the output file

some code to demonstrate it:

s = set()
with open('File2') as file2:
    for line in file2:
        for i in line.split():
            s.add(i)
d = {}
with open('File1') as file1:
    for line in file1:
        k,v = line.split()
        if k in s:
            d[k] = v
with open('NewFile2', 'w') as out_file:
    with open('File2') as file2:
        for line in file2:
            k1,k2 = line.split()
            out_file.write(' '.join([k1,k2,d[k1],d[k2]]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM