How to use python to read excel column data and print column duplicates

Question

Two columns ("Name" & "Value") in excel.

There are duplicates (eg. "xxa","xxf") in the Value column and the python script needs to find what are the duplicates cell values and put them into an array

The output should be "xxa": ["aaa","bbb","ccc","hhh"]

                  "xxf": ["fff","jjj"]

How to improve the current script?

file = open('columnData.csv')
csvreader = csv.reader(file)
next(csvreader)

for row in csvreader:
    name = row[0]
    value = row[1]
    value_col.append(value)
    name_value_col.append(name+","+value)
file.close()
count={}
names=[]

for item in value_col:
    if value_col.count(item)>1:
        count[item]=value_col.count(item)

for name,value in count.items():
    names.append(name) 
total=[]

for item in name_value_col:
    item_name=item.split(",")     
    if item_name[1] in names:  
        total.append(item_name[0])
print(total)

Answer 1

I'd recommend using defaultdict , and while you're at it using csv.DictReader makes for more legible code:

import csv
from collections import defaultdict

data = defaultdict(list)
with open('columnData.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        data[row['Value']].append(row['Name'])

and then regarding duplicate finding you can EITHER take the destructive approach (pruning non-duplicates)

# Remove non-duplicates here
for key in list(data.keys()):  # note need to take a copy of the keys
    if len(data[key]) == 1:  # only one value in the list
        del data[key]

print(dict(data))

>>> {"xxa": ["aaa","bbb","ccc","hhh"], "xxf": ["fff","jjj"]}

or if you prefer a non-destructive approach to finding duplicates:

def _filter_duplicates(data):
    for key, value in data.items():
        if len(value) > 1:
            yield key, value

def find_duplicates(data):
    return dict(_filter_duplicates(data))

print(find_duplicates(data))

>>> {"xxa": ["aaa","bbb","ccc","hhh"], "xxf": ["fff","jjj"]}

How to use python to read excel column data and print column duplicates

Question

1 answers

solution1
1 2021-12-07 13:52:34

How to use python to read excel column data and print column duplicates

Question

1 answers

solution1 1 2021-12-07 13:52:34

solution1
1 2021-12-07 13:52:34