简体   繁体   中英

How to use python to read excel column data and print column duplicates

Two columns ("Name" & "Value") in excel.

There are duplicates (eg. "xxa","xxf") in the Value column and the python script needs to find what are the duplicates cell values and put them into an array

在此处输入图像描述

The output should be "xxa": ["aaa","bbb","ccc","hhh"]

                  "xxf": ["fff","jjj"]

How to improve the current script?

file = open('columnData.csv')
csvreader = csv.reader(file)
next(csvreader)

for row in csvreader:
    name = row[0]
    value = row[1]
    value_col.append(value)
    name_value_col.append(name+","+value)
file.close()
count={}
names=[]

for item in value_col:
    if value_col.count(item)>1:
        count[item]=value_col.count(item)

for name,value in count.items():
    names.append(name) 
total=[]

for item in name_value_col:
    item_name=item.split(",")     
    if item_name[1] in names:  
        total.append(item_name[0])
print(total)

I'd recommend using defaultdict , and while you're at it using csv.DictReader makes for more legible code:

import csv
from collections import defaultdict

data = defaultdict(list)
with open('columnData.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        data[row['Value']].append(row['Name'])

and then regarding duplicate finding you can EITHER take the destructive approach (pruning non-duplicates)

# Remove non-duplicates here
for key in list(data.keys()):  # note need to take a copy of the keys
    if len(data[key]) == 1:  # only one value in the list
        del data[key]

print(dict(data))

>>> {"xxa": ["aaa","bbb","ccc","hhh"], "xxf": ["fff","jjj"]}

or if you prefer a non-destructive approach to finding duplicates:

def _filter_duplicates(data):
    for key, value in data.items():
        if len(value) > 1:
            yield key, value

def find_duplicates(data):
    return dict(_filter_duplicates(data))

print(find_duplicates(data))

>>> {"xxa": ["aaa","bbb","ccc","hhh"], "xxf": ["fff","jjj"]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM