common between all csv columns

Question

I have a csv file.

table1    table2    table3  table4   table5
paper     paper     pen     book     book
pen       pencil    pencil  charger  apple
apple     pen       charger beatroot sandle
beatroot  mobile    apple   pen      paper
sandle    book      paper   paper

I need to find similar entries among all columns. In this case output will be :

paper

It may happen that columns may increase or decrease.

For 2 column similarities can be done using :

# reading csv file and converting it to dictionary
with open(input_file, 'r') as csvin:
    reader=csv.DictReader(csvin)
    data={k.strip():[v] for k,v in reader.next().items()}
    for line in reader:
        for k,v in line.items():
            k=k.strip()
            data[k].append(v)

# iterating the dictionary for each 2 columns
for a, b in itertools.combinations(data, 2):
    # to get common species names
    common = set(data[a]) & set(data[b])

But, I do not understand how to get similar values from all columns.

Answer 1

You can use the csv.reader with skipinitialspace=True to skip the spaces, then zip the rows to get the columns, we use itertools.izip_longest because a value in the last column is missing. Convert the columns in set and take the intersection using set.intersection :

from itertools import izip_longest
import csv

with open('test') as f:
    reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
    cols = map(set, izip_longest(*reader))

print set.intersection(*cols)

Watch out that your file is not properly a csv, and if you have missing values in a column that is not the last one this will interpret your input not properly. Consider at least using a delimiter that is not space.

Example

Using StringIO to parse a string and show that it works for the test case:

from itertools import izip_longest
import csv
import StringIO

data='''table1    table2    table3  table4   table5
paper     paper     pen     book     book
pen       pencil    pencil  charger  apple
apple     pen       charger beatroot sandle
beatroot  mobile    apple   pen      paper
sandle    book      paper   paper'''

f = StringIO.StringIO(data)
reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
cols = map(set, izip_longest(*reader))

print set.intersection(*cols)

Output

set(['paper'])

common between all csv columns

Question

1 answers

solution1
4 ACCPTED 2014-08-18 13:10:27

Example

common between all csv columns

Question

1 answers

solution1 4 ACCPTED 2014-08-18 13:10:27

Example

solution1
4 ACCPTED 2014-08-18 13:10:27