I have two files.
First file (~4 million entries) has 2 columns: [Label] [Energy]
Second file (~200,000 entries) has 2 columns: [Upper Label] [Lower Label]
For Example:
File 1:
375677 4444.5
375678 6890.4
375679 786.0
File 2:
375677 375679
375678 375679
I want to replace the 'label' values in file 2 with the 'energy' values in file 1 such that file 2 becomes:
File 2(new):
4444.5 786.0
6890.4 786.0
Or add the 'energy' values to file 2, such that file 2 becomes:
File 2(alternative):
375677 375679 4444.5 786.0
375678 375679 6890.4 786.0
There must be a way to do this in python, but my brain is not working.
So far I have written
from sys import argv
from scanfile import scanner
class UnknownCommand(Exception): pass
def processLine(line):
if line.startswith('23'):
print line[0:-1]
filename = 'test1.txt'
if len(argv) == 2: filename = argv[1]
scanner (filename, processLine)
where scanfile is:
def scanner(name, function):
file = open(name, 'r')
while True:
line = file.readline()
if not line: break
function(line)
file.close()
This allows me to search for, and print, the lable + value in file 1 by manually inserting the lable from file 2 (eg 23). Pointless and time-consuming.
I need to write a section which reads the lables from file 2 and puts them into 'line.startswith('lable') consecutively, until the end of file 2.
Any suggestions?
Thank you for your help.
Assuming that the labels in file1
are unique, I would first read that file into a dictionary:
with open('file1') as fd:
data1 = dict(line.strip().split()
for line in fd if line.strip())
This gives a dictionary data1
with content like the following:
{
'375677': '4444.5',
'375678': '6890.4',
'375679': '786.0',
}
Now, read through file2
, performing the appropriate modifications as you iterate through the file:
with open('file2') as fd:
for line in fd:
data = line.strip().split()
print data1[data[0]], data1[data[1]]
Or, for your alternative:
with open('file2') as fd:
for line in fd:
data = line.strip().split()
print ' '.join(data), data1[data[0]], data1[data[1]]
this approach worth taking only if 4M entries is too much for your memory
some code to demonstrate it:
s = set()
with open('File2') as file2:
for line in file2:
for i in line.split():
s.add(i)
d = {}
with open('File1') as file1:
for line in file1:
k,v = line.split()
if k in s:
d[k] = v
with open('NewFile2', 'w') as out_file:
with open('File2') as file2:
for line in file2:
k1,k2 = line.split()
out_file.write(' '.join([k1,k2,d[k1],d[k2]]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.