Python在一个文本文件中搜索值，将它们与另一个文本文件中的值进行比较，然后在匹配时替换值

Question

I have two files. 我有两个文件。

First file (~4 million entries) has 2 columns: [Label] [Energy] 第一个文件（约400万个条目）有2列：[标签] [能量]
Second file (~200,000 entries) has 2 columns: [Upper Label] [Lower Label] 第二个文件（~200,000个条目）有2列：[上标签] [下标签]

For Example: 例如：

File 1: 档案1：

375677 4444.5              
375678 6890.4        
375679  786.0

File 2: 文件2：

375677 375679      
375678 375679

I want to replace the 'label' values in file 2 with the 'energy' values in file 1 such that file 2 becomes: 我想用文件1中的'energy'值替换文件2中的'label'值，使文件2变为：

File 2(new): 档案2（新）：

4444.5 786.0   
6890.4 786.0

Or add the 'energy' values to file 2, such that file 2 becomes: 或者将'energy'值添加到文件2，这样文件2变为：

File 2(alternative): 文件2（替代）：

375677 375679 4444.5 786.0  
375678 375679 6890.4 786.0

There must be a way to do this in python, but my brain is not working. 必须有一种方法可以在python中执行此操作，但我的大脑无法正常工作。

So far I have written 到目前为止我写的

from sys import argv   
from scanfile import scanner   
class UnknownCommand(Exception): pass   

def processLine(line):       
  if line.startswith('23'):   
    print line[0:-1]

filename = 'test1.txt'   
if len(argv) == 2: filename = argv[1]   
scanner (filename, processLine)   

where scanfile is:

def scanner(name, function):   
  file = open(name, 'r')   
  while True:   
    line = file.readline()   
    if not line: break   
    function(line)   
  file.close()

This allows me to search for, and print, the lable + value in file 1 by manually inserting the lable from file 2 (eg 23). 这允许我通过从文件2（例如23）手动插入标签来搜索和打印文件1中的标签+值。 Pointless and time-consuming. 毫无意义且耗时。

I need to write a section which reads the lables from file 2 and puts them into 'line.startswith('lable') consecutively, until the end of file 2. 我需要编写一个部分，从文件2中读取标签并将它们连续放入'line.startswith（'lable'），直到文件2结束。

Any suggestions? 有什么建议么？

Thank you for your help. 谢谢您的帮助。

Answer 1

Assuming that the labels in file1 are unique, I would first read that file into a dictionary: 假设file1中的标签是唯一的，我首先会将该文件读入字典：

with open('file1') as fd:
    data1 = dict(line.strip().split()
                 for line in fd if line.strip())

This gives a dictionary data1 with content like the following: 这为字典data1提供了如下内容：

{
  '375677': '4444.5',
  '375678': '6890.4',
  '375679': '786.0',
}

Now, read through file2 , performing the appropriate modifications as you iterate through the file: 现在，通读file2 ，在遍历文件时执行适当的修改：

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print data1[data[0]], data1[data[1]]

Or, for your alternative: 或者，替代方案：

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print ' '.join(data), data1[data[0]], data1[data[1]]

Answer 2

this approach worth taking only if 4M entries is too much for your memory 这种方法值得一提，只有4M条目对你的记忆太多了

create a set from all File2 ids (upper and lower) 从所有File2 ID（上部和下部）创建一个集合
loop over the big file (File1) and create a dict only with entries in the map 循环遍历大文件（File1）并仅使用地图中的条目创建一个dict
loop on File2 again and build the output file 再次在File2上循环并构建输出文件

some code to demonstrate it: 一些代码来演示它：

s = set()
with open('File2') as file2:
    for line in file2:
        for i in line.split():
            s.add(i)
d = {}
with open('File1') as file1:
    for line in file1:
        k,v = line.split()
        if k in s:
            d[k] = v
with open('NewFile2', 'w') as out_file:
    with open('File2') as file2:
        for line in file2:
            k1,k2 = line.split()
            out_file.write(' '.join([k1,k2,d[k1],d[k2]]))

Python在一个文本文件中搜索值，将它们与另一个文本文件中的值进行比较，然后在匹配时替换值

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-01-20 01:51:21

解决方案2
1 2014-01-20 01:59:15

Python在一个文本文件中搜索值，将它们与另一个文本文件中的值进行比较，然后在匹配时替换值

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-01-20 01:51:21

解决方案2 1 2014-01-20 01:59:15

解决方案1
1 已采纳 2014-01-20 01:51:21

解决方案2
1 2014-01-20 01:59:15