简体   繁体   中英

Comparing two columns of CSV files

I have a csv file with two columns. first column contains 2676 entries of host names and second column has 964 entries of host names.I want to compare these columns and print the data which is there in column 2 but not in column 1 Here is the code

import re
from csv import DictReader

with open("devices.csv") as f:
     a1 = [row["Device Name"] for row in DictReader(f)]
#print a1
#print len(a1)

## the code below given me the data for column 2

with open('dump_data', 'r') as f:
    for line in f:
        line = re.split(': |, |\*|\n', line)

listOdd = line[1::2]
for i in listOdd:
    print i
result[]
# print listOdd
for i in a1:
    for j in listOdd:
        if i != j:
            result.append(i)
        # print i
        break
else:
    pass
print result
print len(result)

I did try other approaches like using sets and pandas

The output is not accurate, basically each element in column 2 have to be compared with each element with column 1 . I am getting few duplicate entries as differences

Sets would appear to be the obvious solution. The following approach reads each column into its own set() . It then simply uses the difference() function to give you entries that are in col1 but not in col2 (which is the same as simply using the - operator):

import csv

col1 = set()
col2 = set()

with open('input.csv') as f_input:
    for row in csv.reader(f_input):
        if len(row) == 2:
            col1.add(row[0])
            col2.add(row[1])
        elif len(row) == 1:
            col1.add(row[0])

print col1
print col2

print sorted(col2 - col1)

So if your CSV file had the following entries:

aaa,aaa
bbb,111
ccc,bbb
ddd,222
eee
fff

The required output would be:

['111', '222']

The data in your CSV file might need sanitizing before being added to the set, for example EXAMPLE.COM and example.com would currently be considered different.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM