I have a csv file.
table1 table2 table3 table4 table5
paper paper pen book book
pen pencil pencil charger apple
apple pen charger beatroot sandle
beatroot mobile apple pen paper
sandle book paper paper
I need to find similar entries among all columns. In this case output will be :
paper
It may happen that columns may increase or decrease.
For 2 column similarities can be done using :
# reading csv file and converting it to dictionary
with open(input_file, 'r') as csvin:
reader=csv.DictReader(csvin)
data={k.strip():[v] for k,v in reader.next().items()}
for line in reader:
for k,v in line.items():
k=k.strip()
data[k].append(v)
# iterating the dictionary for each 2 columns
for a, b in itertools.combinations(data, 2):
# to get common species names
common = set(data[a]) & set(data[b])
But, I do not understand how to get similar values from all columns.
You can use the csv.reader
with skipinitialspace=True
to skip the spaces, then zip the rows to get the columns, we use itertools.izip_longest
because a value in the last column is missing. Convert the columns in set and take the intersection using set.intersection
:
from itertools import izip_longest
import csv
with open('test') as f:
reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
cols = map(set, izip_longest(*reader))
print set.intersection(*cols)
Watch out that your file is not properly a csv, and if you have missing values in a column that is not the last one this will interpret your input not properly. Consider at least using a delimiter that is not space.
Using StringIO
to parse a string and show that it works for the test case:
from itertools import izip_longest
import csv
import StringIO
data='''table1 table2 table3 table4 table5
paper paper pen book book
pen pencil pencil charger apple
apple pen charger beatroot sandle
beatroot mobile apple pen paper
sandle book paper paper'''
f = StringIO.StringIO(data)
reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
cols = map(set, izip_longest(*reader))
print set.intersection(*cols)
Output
set(['paper'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.