简体   繁体   中英

Compare specific fields in two files -Python

I want to compare two files(file1 and file2) with different columns but have the first 4 columns in common, the output should be the lines of file2 existing in file1:

file 1:

132.227.127.170 49163   173.194.40.110  443
132.227.127.170 49164   31.13.86.65 443
132.227.127.170 49165   193.51.224.40   443
132.227.127.170 49166   193.51.224.40   443
132.227.127.170 49167   193.51.224.40   443
......

file 2:

132.227.127.170 49155 17.172.232.150 5223 3 4500.1587 106
132.227.127.170 49155 17.172.232.150 5223 3 8100.3275 106
132.227.127.170 49163 173.194.40.110 443 5 0.405 53
132.227.127.170 49164 31.13.86.65 443 7 0.018600000000000002 53
132.227.127.170 49165 193.51.224.40 443 417 42.5117 32362
132.227.127.170 49166 193.51.224.40 443 34 33.382 1236
132.227.127.170 49167 193.51.224.40 443 8 37.067099999999996 458
132.227.127.170 49168 193.51.224.40 443 5 0.0008 53
132.227.127.170 49169 193.51.224.40 443 5 0.0009 53
132.227.127.170 49170 31.13.86.65 443 937 30.7529 117540
......

Output:

132.227.127.170 49163 173.194.40.110 443 5 0.405 53
132.227.127.170 49164 31.13.86.65 443 7 0.018600000000000002 53
132.227.127.170 49165 193.51.224.40 443 417 42.5117 32362
132.227.127.170 49166 193.51.224.40 443 34 33.382 1236
132.227.127.170 49167 193.51.224.40 443 8 37.067099999999996 458
....

So I tried this code, it normally HAS to work, I already tried it in other cases and worked very well, but I don t know what went wrong this time !!

import string 

tstFile1=open("output","w+")
with open('file1') as file1, open('file2') as file2:
    myf=[line.strip().split() for line in file1]
    f1=[line.strip() for line in filter(lambda x: x.strip().split()[0:3] in myf, file2)]
for i in f1:
    tstFile1.write("%s\n" %i)
tstFile1.close()

So what do you suggest me to change in it? ANY help pleaase I tried to use an AWK command, but still the same problem

The problem is that you are trying to be too fancy. There are too many steps in one, so you can easily miss small detail.

The file1 contains 4 columns, but you are only extracting the first 3 columns from file2.

Your problem will be fixed if you change the following row:

    f1=[line.strip() for line in filter(lambda x: x.strip().split()[0:4] in myf, file2)]

and

   f2=[line.strip() for line in filter(lambda x: x.strip().split()[0:4] not in myf, file2)]

Changing [0:3] to [0:4] (remember that python indexes are between elements)

But please split up this logic, it will make debugging much easier!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM