Python在一個文本文件中搜索值，將它們與另一個文本文件中的值進行比較，然后在匹配時替換值

Question

我有兩個文件。

第一個文件（約400萬個條目）有2列：[標簽] [能量]
第二個文件（~200,000個條目）有2列：[上標簽] [下標簽]

例如：

檔案1：

375677 4444.5              
375678 6890.4        
375679  786.0

文件2：

375677 375679      
375678 375679

我想用文件1中的'energy'值替換文件2中的'label'值，使文件2變為：

檔案2（新）：

4444.5 786.0   
6890.4 786.0

或者將'energy'值添加到文件2，這樣文件2變為：

文件2（替代）：

375677 375679 4444.5 786.0  
375678 375679 6890.4 786.0

必須有一種方法可以在python中執行此操作，但我的大腦無法正常工作。

到目前為止我寫的

from sys import argv   
from scanfile import scanner   
class UnknownCommand(Exception): pass   

def processLine(line):       
  if line.startswith('23'):   
    print line[0:-1]

filename = 'test1.txt'   
if len(argv) == 2: filename = argv[1]   
scanner (filename, processLine)   

where scanfile is:

def scanner(name, function):   
  file = open(name, 'r')   
  while True:   
    line = file.readline()   
    if not line: break   
    function(line)   
  file.close()

這允許我通過從文件2（例如23）手動插入標簽來搜索和打印文件1中的標簽+值。 毫無意義且耗時。

我需要編寫一個部分，從文件2中讀取標簽並將它們連續放入'line.startswith（'lable'），直到文件2結束。

有什么建議么？

謝謝您的幫助。

Answer 1

假設file1中的標簽是唯一的，我首先會將該文件讀入字典：

with open('file1') as fd:
    data1 = dict(line.strip().split()
                 for line in fd if line.strip())

這為字典data1提供了如下內容：

{
  '375677': '4444.5',
  '375678': '6890.4',
  '375679': '786.0',
}

現在，通讀file2 ，在遍歷文件時執行適當的修改：

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print data1[data[0]], data1[data[1]]

或者，替代方案：

with open('file2') as fd:
    for line in fd:
        data = line.strip().split()
        print ' '.join(data), data1[data[0]], data1[data[1]]

Answer 2

這種方法值得一提，只有4M條目對你的記憶太多了

從所有File2 ID（上部和下部）創建一個集合
循環遍歷大文件（File1）並僅使用地圖中的條目創建一個dict
再次在File2上循環並構建輸出文件

一些代碼來演示它：

s = set()
with open('File2') as file2:
    for line in file2:
        for i in line.split():
            s.add(i)
d = {}
with open('File1') as file1:
    for line in file1:
        k,v = line.split()
        if k in s:
            d[k] = v
with open('NewFile2', 'w') as out_file:
    with open('File2') as file2:
        for line in file2:
            k1,k2 = line.split()
            out_file.write(' '.join([k1,k2,d[k1],d[k2]]))

Python在一個文本文件中搜索值，將它們與另一個文本文件中的值進行比較，然后在匹配時替換值

問題描述

2 個解決方案

解決方案1
1 已采納 2014-01-20 01:51:21

解決方案2
1 2014-01-20 01:59:15

Python在一個文本文件中搜索值，將它們與另一個文本文件中的值進行比較，然后在匹配時替換值

問題描述

2 個解決方案

解決方案1 1 已采納 2014-01-20 01:51:21

解決方案2 1 2014-01-20 01:59:15

解決方案1
1 已采納 2014-01-20 01:51:21

解決方案2
1 2014-01-20 01:59:15