简体   繁体   中英

common between all csv columns

I have a csv file.

table1    table2    table3  table4   table5
paper     paper     pen     book     book
pen       pencil    pencil  charger  apple
apple     pen       charger beatroot sandle
beatroot  mobile    apple   pen      paper
sandle    book      paper   paper

I need to find similar entries among all columns. In this case output will be :

paper

It may happen that columns may increase or decrease.

For 2 column similarities can be done using :

# reading csv file and converting it to dictionary
with open(input_file, 'r') as csvin:
    reader=csv.DictReader(csvin)
    data={k.strip():[v] for k,v in reader.next().items()}
    for line in reader:
        for k,v in line.items():
            k=k.strip()
            data[k].append(v)

# iterating the dictionary for each 2 columns
for a, b in itertools.combinations(data, 2):
    # to get common species names
    common = set(data[a]) & set(data[b])

But, I do not understand how to get similar values from all columns.

You can use the csv.reader with skipinitialspace=True to skip the spaces, then zip the rows to get the columns, we use itertools.izip_longest because a value in the last column is missing. Convert the columns in set and take the intersection using set.intersection :

from itertools import izip_longest
import csv

with open('test') as f:
    reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
    cols = map(set, izip_longest(*reader))

print set.intersection(*cols)

Watch out that your file is not properly a csv, and if you have missing values in a column that is not the last one this will interpret your input not properly. Consider at least using a delimiter that is not space.

Example

Using StringIO to parse a string and show that it works for the test case:

from itertools import izip_longest
import csv
import StringIO

data='''table1    table2    table3  table4   table5
paper     paper     pen     book     book
pen       pencil    pencil  charger  apple
apple     pen       charger beatroot sandle
beatroot  mobile    apple   pen      paper
sandle    book      paper   paper'''

f = StringIO.StringIO(data)
reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
cols = map(set, izip_longest(*reader))

print set.intersection(*cols)

Output

set(['paper'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM