I have a csv file with two columns. first column contains 2676 entries of host names and second column has 964 entries of host names.I want to compare these columns and print the data which is there in column 2 but not in column 1 Here is the code
import re
from csv import DictReader
with open("devices.csv") as f:
a1 = [row["Device Name"] for row in DictReader(f)]
#print a1
#print len(a1)
## the code below given me the data for column 2
with open('dump_data', 'r') as f:
for line in f:
line = re.split(': |, |\*|\n', line)
listOdd = line[1::2]
for i in listOdd:
print i
result[]
# print listOdd
for i in a1:
for j in listOdd:
if i != j:
result.append(i)
# print i
break
else:
pass
print result
print len(result)
I did try other approaches like using sets and pandas
The output is not accurate, basically each element in column 2 have to be compared with each element with column 1 . I am getting few duplicate entries as differences
Sets would appear to be the obvious solution. The following approach reads each column into its own set()
. It then simply uses the difference()
function to give you entries that are in col1
but not in col2
(which is the same as simply using the -
operator):
import csv
col1 = set()
col2 = set()
with open('input.csv') as f_input:
for row in csv.reader(f_input):
if len(row) == 2:
col1.add(row[0])
col2.add(row[1])
elif len(row) == 1:
col1.add(row[0])
print col1
print col2
print sorted(col2 - col1)
So if your CSV file had the following entries:
aaa,aaa
bbb,111
ccc,bbb
ddd,222
eee
fff
The required output would be:
['111', '222']
The data in your CSV file might need sanitizing before being added to the set, for example EXAMPLE.COM
and example.com
would currently be considered different.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.