简体   繁体   English

用Python中的通用列联接文件

[英]Join Files with common columns in Python

I have a problem joining two large files with 5 common columns and returning the results, which are the identical 5 tuples... Here is what I exactly mean: 我在连接具有5个公共列的两个大文件并返回结果(这是相同的5个元组)时遇到问题...这正是我的意思:

File1: 文件1:

132.227 49202 107.21 80
132.227 49202 107.21 80
132.227 49200 107.220 80
132.227 49200 107.220 80
132.227 49222 207.171 80
132.227 49339 184.730 80
132.227 49291 930.184 80
............
............
............

The file contains a lot of lines not just those... 该文件包含很多行,而不仅仅是那些行。

File 2: 档案2:

46.109498000 132.227 49200 107.220 80 17 48 
46.927339000 132.227 49291 930.184 80 17 48 
47.422919000 253.123 1985 224.300 1985 17 48
48.412761000 132.253 1985 224.078 1985 17 48
48.638454000 132.127 1985 232.123 1985 17 48
48.909658000 132.227 49291 930.184 80 17 65
48.911360000 132.227 49200 107.220 80 17 231
............
............
............

Output File: 输出文件:

46.109498000 132.227 49200 107.220 80 17 48 
46.927339000 132.227 49291 930.184 80 17 48 
48.909658000 132.227 49291 930.184 80 17 65
48.911360000 132.227 49200 107.220 80 17 231
............
............
............

Here is the code I wrote: 这是我写的代码:

with open('log1', 'r') as fl1:
    f1 = [i.split(' ') for i in fl1.read().split('\n')]

with open('log2', 'r') as fl2:
    f2 = [i.split(' ') for i in fl2.read().split('\n')]

def merging(x,y):
    list=[]
    for i in x:
        for j in range(len(i)-1):
            while i[j]==[a[b] for a in y]:
                list.append(i)
                j=j+1
    return list

f3=merging(f1,f2)

for i in f3:
    print i

I think it's file2 is filtered by file1 . 我认为它是由file1过滤的file2 Right? 对?

I assume that the file1 is not ordered. 我假设file1没有排序。 (If it's ordered, there is another efficient solution) (如果订购的话,还有另一种有效的解决方案)

with open('file1') as file1, open('file2') as file2:
    my_filter = [line.strip().split() for line in file1]
    f3 = [line.strip() for line in filter(lambda x: x.strip().split()[1:5] in my_filter, file2)]

# to see f3
for line in f3:
    print line

First, build filter my_filter = [line.strip().split() for line in file1] which contains 首先,构建过滤器my_filter = [line.strip().split() for line in file1] ,其中包含

[['132.227', '49202', '107.21', '80'], ['132.227', '49202', '107.21', '80'], ['132.227', '49200', '107.220', '80'], ['132.227', '49200', '107.220', '80'], ['132.227', '49222', '207.171', '80'], ['132.227', '49339', '184.730', '80'], ['132.227', '49291', '930.184', '80']]

then using filter , filter the data. 然后使用filter ,过滤数据。 This code works on Python 2.7 + 这段代码适用于Python 2.7 +

I wrote this lines and they seem working: 我写了以下几行,它们似乎起作用了:

with open('file1', 'r') as fl1:
    f1 = [i.split(' ') for i in fl1.read().split('\n')]

with open('file2', 'r') as fl2:
    f2 = [i.split(' ') for i in fl2.read().split('\n')]

for i in f2:
    for j in f1:
        if i[1]==j[0] and i[2]==j[1] and i[3]==j[2] and i[4]==j[3]:
            print i

I tried to replace 我试图更换

if i[1]==j[0] and i[2]==j[1] and i[3]==j[2] and i[4]==j[3]:

with: 与:

for k in range(4):
    if i[k+1]==j[k]:
        print i

but it gave me this error: 但这给了我这个错误:

Traceback (most recent call last): File "MERGE.py", line 10, in if i[k+1]==j[k]: IndexError: list index out of range 追溯(最近一次调用最近):文件“ MERGE.py”,第10行,如果i [k + 1] == j [k]:IndexError:列表索引超出范围

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM