简体   繁体   English

Python在一个文本文件中搜索值,将它们与另一个文本文件中的值进行比较,然后在匹配时替换值

[英]Python to search values in one text file, compare them with values in another text file, then replace values if there is a match

I have two files. 我有两个文件。

First file (~4 million entries) has 2 columns: [Label] [Energy] 第一个文件(约400万个条目)有2列:[标签] [能量]
Second file (~200,000 entries) has 2 columns: [Upper Label] [Lower Label] 第二个文件(~200,000个条目)有2列:[上标签] [下标签]

For Example: 例如:

File 1: 档案1:

375677 4444.5              
375678 6890.4        
375679  786.0

File 2: 文件2:

375677 375679      
375678 375679

I want to replace the 'label' values in file 2 with the 'energy' values in file 1 such that file 2 becomes: 我想用文件1中的'energy'值替换文件2中的'label'值,使文件2变为:

File 2(new): 档案2(新):

4444.5 786.0   
6890.4 786.0

Or add the 'energy' values to file 2, such that file 2 becomes: 或者将'energy'值添加到文件2,这样文件2变为:

File 2(alternative): 文件2(替代):

375677 375679 4444.5 786.0  
375678 375679 6890.4 786.0

There must be a way to do this in python, but my brain is not working. 必须有一种方法可以在python中执行此操作,但我的大脑无法正常工作。

So far I have written 到目前为止我写的

from sys import argv   
from scanfile import scanner   
class UnknownCommand(Exception): pass   

def processLine(line):       
  if line.startswith('23'):   
    print line[0:-1]

filename = 'test1.txt'   
if len(argv) == 2: filename = argv[1]   
scanner (filename, processLine)   

where scanfile is:

def scanner(name, function):   
  file = open(name, 'r')   
  while True:   
    line = file.readline()   
    if not line: break   
    function(line)   
  file.close()   

This allows me to search for, and print, the lable + value in file 1 by manually inserting the lable from file 2 (eg 23). 这允许我通过从文件2(例如23)手动插入标签来搜索和打印文件1中的标签+值。 Pointless and time-consuming. 毫无意义且耗时。

I need to write a section which reads the lables from file 2 and puts them into 'line.startswith('lable') consecutively, until the end of file 2. 我需要编写一个部分,从文件2中读取标签并将它们连续放入'line.startswith('lable'),直到文件2结束。

Any suggestions? 有什么建议么?

Thank you for your help. 谢谢您的帮助。

Assuming that the labels in file1 are unique, I would first read that file into a dictionary: 假设file1中的标签是唯一的,我首先会将该文件读入字典:

with open('file1') as fd:
    data1 = dict(line.strip().split()
                 for line in fd if line.strip())

This gives a dictionary data1 with content like the following: 这为字典data1提供了如下内容:

{
  '375677': '4444.5',
  '375678': '6890.4',
  '375679': '786.0',
}

Now, read through file2 , performing the appropriate modifications as you iterate through the file: 现在,通读file2 ,在遍历文件时执行适当的修改:

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print data1[data[0]], data1[data[1]]

Or, for your alternative: 或者,替代方案:

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print ' '.join(data), data1[data[0]], data1[data[1]]

this approach worth taking only if 4M entries is too much for your memory 这种方法值得一提,只有4M条目对你的记忆太多了

  1. create a set from all File2 ids (upper and lower) 从所有File2 ID(上部和下部)创建一个集合
  2. loop over the big file (File1) and create a dict only with entries in the map 循环遍历大文件(File1)并使用地图中的条目创建一个dict
  3. loop on File2 again and build the output file 再次在File2上循环并构建输出文件

some code to demonstrate it: 一些代码来演示它:

s = set()
with open('File2') as file2:
    for line in file2:
        for i in line.split():
            s.add(i)
d = {}
with open('File1') as file1:
    for line in file1:
        k,v = line.split()
        if k in s:
            d[k] = v
with open('NewFile2', 'w') as out_file:
    with open('File2') as file2:
        for line in file2:
            k1,k2 = line.split()
            out_file.write(' '.join([k1,k2,d[k1],d[k2]]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较一个文本文件和另一个文本文件中的值以进行匹配 - Compare values between one text file and another for matching 在一个文本文件中搜索另一个文本文件的值,然后打印该行 - Search one text file for values of another and then print that line 是否可以在python中比较csv和文本文件的值? - Is it possible to compare the values of a csv and text file in python? 仅当某些值相等时,如何才能将一个文本文件中的值替换为另一文本文件中的其他值? - How can I replace values from one text file with other values from another text file only if certain values are equal? 将文本文件中的一组值与另一组值匹配 - Matching one set of values to another in a text file 如何替换文本文件中的值? - How to replace values in text file? 搜索文本文件并找到用值替换字典键 - Search text file and find replace dictionary keys with values 如何使用Python搜索和替换文本从一个文件到另一个文件? - How to search and replace text from one file to another using Python? Python和精美汤-获取值并将其保存在文本文件中 - Python and Beautiful soup - getting values and saving them in a text file 如何使用 openpyxl 将一个 excel 文件的列值与 Python 中另一个 excel 文件的列值进行比较? - How to compare column values of one excel file to the column values of another excel file in Python using openpyxl?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM