简体   繁体   English

如何解释python中不存在的值计数?

[英]How to account for value counts that doesn't exist in python?

I have the following dataframe:我有以下数据框:

     Name
----------
0    Blue
1    Blue
2    Blue
3     Red
4     Red
5    Blue
6    Blue
7     Red
8     Red
9    Blue

I want to count the number of times "Name" = "Blue" and "Name" = "Red" and send that to a dictionary, which for this df would look like:我想计算 "Name" = "Blue" 和 "Name" = "Red" 的次数,并将其发送到字典,对于这个 df,字典如下所示:

print('Dictionary:')
dictionary = df['Name'].value_counts().to_dict()

and output the following:并输出以下内容:

Dictionary:
{'Blue': 5, 'Red': 4}

Ok, straightforward there.好的,直截了当。 So for context, with my data, I KNOW that the only possibilities for "Names" is either "Blue" or "Red".因此,就上下文而言,根据我的数据,我知道“名称”的唯一可能性是“蓝色”或“红色”。 And so I want to account for other dataframes with the same "Name" column, but different frequencies of "Blue" and "Red".因此,我想考虑具有相同“名称”列但“蓝色”和“红色”频率不同的其他数据帧。 Specifically, since the above code works fine, I want to account for instances where there are either NO counts of "Blue" or NO counts of "Red".具体来说,由于上面的代码工作正常,我想说明没有“蓝色”计数或“红色”计数的情况。

And so, if the above df looked like:因此,如果上面的 df 看起来像:

     Name
----------
0    Blue
1    Blue
2    Blue
3    Blue
4    Blue
5    Blue
6    Blue
7    Blue
8    Blue
9    Blue

I would want the output dictionary via:我希望通过以下方式输出字典:

print('Dictionary:')
dictionary = df['Name'].value_counts().to_dict()

to produce:生产:

Dictionary:
{'Blue': 9, 'Red': 0}

However, as the code stands, the following is actually produced:但是,就代码而言,实际上产生了以下内容:

Dictionary:
{'Blue': 9}

I need that 0 value in there for use in another operation.我需要那里的 0 值用于另一个操作。 I would like the same to be true if all of the "Name" names were "Red", and so producing:如果所有“名称”名称都是“红色”,我希望同样如此,因此产生:

Dictionary:
{'Blue': 0, 'Red': 9}

and not:并不是:

Dictionary:
{'Red': 9}

The problem is that I am running into a situation where I face the issue of counting the frequency of a value (a string occurrence here) that just does not exist.问题是我遇到了一种情况,即我面临计算一个不存在的值(此处为字符串出现)的频率的问题。 How can I fix my python code so that if the "Name" blue or red never occur, the dictionary will still include that "Name" in the dictionary, but just mark its value as 0?如何修复我的 python 代码,以便如果“名称”蓝色或红色永远不会出现,字典仍将在字典中包含该“名称”,但只需将其值标记为 0?

In Python 3.9+ you can use PEP 584's Union Operator :在 Python 3.9+ 中,您可以使用PEP 584 的联合运算符

base = {'Blue': 0, 'Red': 0}
counts = df['Name'].value_counts().to_dict()
dictionary = base | counts

# or just
dictionary = {'Blue': 0, 'Red': 0} | df['Name'].value_counts().to_dict()

Before that you could use unpacking and (re)packing:在此之前,您可以使用解包和(重新)打包:

base = {'Blue': 0, 'Red': 0}
counts = df['Name'].value_counts().to_dict()
dictionary = {**base, **counts}

You could also use .update ,您也可以使用.update

dictionary = {'Blue': 0, 'Red': 0}
dictionary.update(df['Name'].value_counts().to_dict())

Or iterate over values and use .setdefault :或迭代值并使用.setdefault

dictionary = df['Name'].value_counts().to_dict()
for k in ['Blue', 'Red']:
    dictionary.setdefault(k, 0)

I'm sure there are other ways as well.我敢肯定还有其他方法。

I think if you change the type of the column in the dataframe to categorical and specify the categories you expect explicitly, you will get the answer you're looking for:我认为,如果您将数据框中的列类型更改为分类并明确指定您期望的类别,您将得到您正在寻找的答案:

df = pd.DataFrame({'Name': ['red', 'red', 'red']})
df['Name'] = pd.Categorical(df['Name'], categories=['red', 'blue'])
df['Name'].value_counts()

Output:输出:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM