简体   繁体   English

更新字典中的嵌套计数器

[英]Update Nested Counter in a Dictionary

I am going through a large CSV file line by line. 我将逐行浏览一个大型CSV文件。 What I want to do is count occurrences of the strings in a certain column. 我想要做的是计算某列中字符串的出现次数。 Where I am running into trouble is that I would like the counter to be nested inside of a dictionary, where the keys for the outer dictionary is the value from another column. 我遇到麻烦的地方是我希望计数器嵌套在字典中,其中外部字典的键是来自另一列的值。 I need to do this or else the data will be processed incorrectly as there are duplicates. 我需要这样做,否则数据将被错误地处理,因为有重复。

imagine my CSV: 想象我的CSV:

outerDictKey    CounterKey
apple     purple
apple     blue
pear    purple

So basically I want: 基本上我想要:

dictionary = { apple:
                    counter({blue: 1
                     purple: 1})
                pear:
                   counter({purple: 1})
             }

I wasnt sure how to do this. 我不知道该怎么做。

myCounter = Counter()
myKey = 'barbara'
counterKey = 'streisand'
largeDict = defaultdict(dict)       
largeDict[myKey] = {myCounter[counterKey] += 1}

Intuitively this looks like it wouldnt work, and of course it gives a syntax error. 直观地看起来它看起来不会起作用,当然它会产生语法错误。

I also tried 我也试过了

largeDict[myKey][myCounter][counterKey]+=1

Which throws a "TypeError: unhashable type: 'Counter'" error. 这会抛出“TypeError:unhashable type:'Counter'”错误。

Finally 最后

>>> largeDict[myKey]=Counter()
>>> largeDict[myKey][myCounter][counterKey]+=1

Which still gives a type error. 哪个仍然会出现类型错误。 So how do I increment a Counter nested in a dictionary? 那么如何增加嵌套在字典中的Counter呢?

This will work: 这将有效:

myCounter = Counter()
largedict = { myKey:
                    {counterKey: myCounter
                     anotherKey: Value2}
             }

largedict[myKey][counterKey]['somethingyouwanttocount']+=1

Counter is just a dict with some extra functionality. Counter只是一个带有一些额外功能的字典。 However, as a dict, it cannot be a key in a dict, nor an entry in a set, which explains the unhashable exception. 但是,作为一个词典,它不能成为词典中的键,也不能成为集合中的条目,这解释了不可避免的异常。

Alternatively, if you're keeping track of information about coherent entities, rather than using nested dicts , you could store the information (including the counter) in objects, and put the objects in a dict as necessary. 或者,如果您要跟踪有关相干实体的信息,而不是使用嵌套的dicts ,则可以将信息(包括计数器)存储在对象中,并根据需要将对象放入dict中。

If every value is a counter, then just use defaultdict: 如果每个值都是一个计数器,那么只需使用defaultdict:

from collections import defaultdict, Counter
largedict = defaultdict(Counter)
largedict['apple']['purple']+=1

If you just want to count occurrences of the strings in a certain column , wouldnt this be enough 如果您只想count occurrences of the strings in a certain column这就足够了

import collections
data = "Welcome to stack overflow. To give is to get."

print collections.Counter(data.split())

Output 产量

Counter({'to': 2, 'give': 1, 'get.': 1, 'is': 1, 'Welcome': 1, 'To': 1, 'overflow.': 1, 'stack': 1})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM