I have a list of ~10,000 dictionaries. Each item has (one or more) labels and (0 or 1) types. Example:
mydata = [{'item': 'item1', 'label': ['history','politics'], 'type': 'paper'},
{'item': 'item2', 'label': ['sport','politics'], 'type': 'magazine'},
{'item': 'item3', 'label': ['science','politics'], 'type': 'paper'},
{'item': 'item4', 'label': ['science'], 'type': 'book'},
{'item': 'item5', 'label': ['science','fun']}
]
I want to count how many items have a particular label and, within those, how many there are of each type.
For the mydata
object above, my output should look like this:
{'fun': {'magazine': 0, 'paper': 0, 'book': 0},
'science': {'magazine': 0, 'paper': 1, 'book': 1},
'sport': {'magazine': 1, 'paper': 0, 'book': 0},
'history': {'magazine': 0, 'paper': 1, 'book': 0},
'politics': {'magazine': 1, 'paper': 2, 'book': 0}}
this code below works, but it is ugly and probably inefficient. Any recommendations on how to improve it? I read that collections.Counter()
is relevant, and I learned how to use for labels
, but I couldn't get it to work for types
within labels
.
### creating lists of unique labels, types
myLabelList=[]
myTypeList=[]
for myitem in mydata:
for myCurrLabel in myitem['label']: #to account for multiple labels
myLabelList.append(myCurrLabel)
if 'type' in myitem: #checking that type exists
myTypes = myitem['type']
myTypeList.append(myTypes)
myUniqueLabel=list(set(myLabelList))
myUniqueType=list(set(myTypeList))
myOutput = {}
for eachLabel in myUniqueLabel:
myOutput[eachLabel] = {}
for eachItem in myUniqueType:
n = 0 # number of matches
for k in mydata:
if (eachLabel in k['label']) and (k.get('type') == eachItem):
n += 1
else: n += 0
myOutput[eachLabel][eachItem]=n
print (myOutput)
You can build that dictionary in one go with a nested loop:
mydata = [{'item': 'item1', 'label': ['history','politics'], 'type': 'paper'},
{'item': 'item2', 'label': ['sport','politics'], 'type': 'magazine'},
{'item': 'item3', 'label': ['science','politics'], 'type': 'paper'},
{'item': 'item4', 'label': ['science'], 'type': 'book'},
{'item': 'item5', 'label': ['science','fun']}
]
counters = {'magazine': 0, 'paper': 0, 'book': 0}
# counters = {d['type']:0 for d in mydata if 'type' in d} # if types not fixed
result = dict()
for d in mydata: # go through dictionary list
itemType = d.get('type',None) # get the type
for label in d['label']: # go through labels list
labelCounts = result.setdefault(label,{**counters}) # add/get a label
if itemType : labelCounts[itemType] += 1 # count items for type if any
print(result)
{'history': {'magazine': 0, 'paper': 1, 'book': 0},
'politics': {'magazine': 1, 'paper': 2, 'book': 0},
'sport': {'magazine': 1, 'paper': 0, 'book': 0},
'science': {'magazine': 0, 'paper': 1, 'book': 1},
'fun': {'magazine': 0, 'paper': 0, 'book': 0}}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.