简体   繁体   中英

counting efficiently across list of dictionaries

I have a list of ~10,000 dictionaries. Each item has (one or more) labels and (0 or 1) types. Example:

mydata = [{'item': 'item1', 'label': ['history','politics'], 'type': 'paper'},
     {'item': 'item2', 'label': ['sport','politics'], 'type': 'magazine'},
     {'item': 'item3', 'label': ['science','politics'], 'type': 'paper'},
     {'item': 'item4', 'label': ['science'], 'type': 'book'},
     {'item': 'item5', 'label': ['science','fun']}
     ]

I want to count how many items have a particular label and, within those, how many there are of each type.

For the mydata object above, my output should look like this:

{'fun': {'magazine': 0, 'paper': 0, 'book': 0}, 
'science': {'magazine': 0, 'paper': 1, 'book': 1}, 
'sport': {'magazine': 1, 'paper': 0, 'book': 0}, 
'history': {'magazine': 0, 'paper': 1, 'book': 0}, 
'politics': {'magazine': 1, 'paper': 2, 'book': 0}}

this code below works, but it is ugly and probably inefficient. Any recommendations on how to improve it? I read that collections.Counter() is relevant, and I learned how to use for labels , but I couldn't get it to work for types within labels .

### creating lists of unique labels, types
myLabelList=[]
myTypeList=[]

for myitem in mydata:
    for myCurrLabel in myitem['label']: #to account for multiple labels
        myLabelList.append(myCurrLabel) 
    if 'type' in myitem: #checking that type exists
        myTypes = myitem['type']
    myTypeList.append(myTypes)
    

myUniqueLabel=list(set(myLabelList))
myUniqueType=list(set(myTypeList))



myOutput = {}
for eachLabel in myUniqueLabel:
    myOutput[eachLabel] = {}
    for eachItem in myUniqueType:
        n = 0  # number of matches
        for k in mydata:
            if (eachLabel in k['label']) and (k.get('type') == eachItem):
                n += 1  
            else: n += 0
        myOutput[eachLabel][eachItem]=n


print (myOutput)

You can build that dictionary in one go with a nested loop:

mydata = [{'item': 'item1', 'label': ['history','politics'], 'type': 'paper'},
     {'item': 'item2', 'label': ['sport','politics'], 'type': 'magazine'},
     {'item': 'item3', 'label': ['science','politics'], 'type': 'paper'},
     {'item': 'item4', 'label': ['science'], 'type': 'book'},
     {'item': 'item5', 'label': ['science','fun']}
     ]


counters = {'magazine': 0, 'paper': 0, 'book': 0}
# counters = {d['type']:0 for d in mydata if 'type' in d} # if types not fixed

result = dict()
for d in mydata:                                 # go through dictionary list
    itemType  = d.get('type',None)               # get the type
    for label in d['label']:                     # go through labels list
        labelCounts = result.setdefault(label,{**counters}) # add/get a label
        if itemType : labelCounts[itemType] += 1 # count items for type if any
        
print(result)

{'history': {'magazine': 0, 'paper': 1, 'book': 0},
 'politics': {'magazine': 1, 'paper': 2, 'book': 0},
 'sport': {'magazine': 1, 'paper': 0, 'book': 0},
 'science': {'magazine': 0, 'paper': 1, 'book': 1},
 'fun': {'magazine': 0, 'paper': 0, 'book': 0}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM