简体   繁体   中英

How to count each items unique sub-items in python?

In this question, ı can only use numpy and in particular ı have a csv list of drugs and their side efects, ı would like the find each unique drugs's number of side effects. ı have file like this

  1. [Drug one, something, something, backpain

  2. Drug one ,something,something , eye pain

  3. Drug two, something,something, skin disorder

  4. Drug three, something,something, backpain

  5. Drug three, something,something, shock

The critical thing is create a list that contains the drug itself and then number of side effects in near ı write a code like this

listall = [ ]

filename = "druglist.csv"

file_to_open = open(filename,"r")
counter = 0
for line in file_to_open:
    
line = line.strip()
line = line.split(",")
if line[0] not in listall:
  listall.append(line[0])
  counter = counter+1
  
  listall.append(counter)
else:
  counter = counter+1
  listall.append(counter)

print(listall)
print(counter)

ı know that this is not either count the unique side effects for each nor succesfully add them for each. I could not find any why to create a list like this in order to create the distribution in histogram.

You need data structures. For this job, can you use a dict, with the drug name as the key, and a Set of side effects as the value.

This code is an example, and you'll probably need to adjust it, but it should illustrate the idea.

from collections import defaultdict

filename = "druglist.csv"
file_to_open = open(filename,"r")
drug_side_effects = defaultdict(set)

for line in file_to_open:
    line = line.strip().split(",")
    drug = line[0]
    side_efffects = line[1:]
    drug_side_effects[drug].union(set(side_effects))

for drug, side_effects in drug_side_effects.items():
    print(f"{drug} has {len(side_effects)}"

So what's happening here? We create a defaultdict mapping drug name to Set of side effects. We open the file and read in each line. Then we split each line where the first element of the list is the drug name, and the rest are the side effects. Now we access the defaultdict by drug name: if the key is already in there, we get the set of side effects. if it isn't in there, we get an empty set. That set, empty or not, is union ed with the set of side effects from this line , and the result, a set with all the unique elements of both lists, is then stored back in the dict under the key drug . So we end up with a dict of {drug:set_of_side_effects}. Then we can print out each key (drug name) and the number of side effects.

Depending on how the lists are set up, you may need to do some cleanup on the data, since 'Back Pain' and 'backpain' would show up as two different side effects, but that's left as an exercise for the reader.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM