[英]Counting occurrences in pandas dataframe with respect to a list
I am trying to create a barchart of element frequencies using matplotlib. 我正在尝试使用matplotlib创建元素频率的条形图。 In order to accomplish this, I need to be able to count the amount of occurrences in a pandas dataframe column with respect to a list of flags.
为了做到这一点,我需要能够计算相对于标志列表的pandas dataframe列中的出现次数。 Below will give a rough sketch of the code I have in my notebook/data:
下面将概述我的笔记本/数据中的代码:
# list of filtered values
filtered = [200, 201, 201, 201, 201, 201,
211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 254]
# list of flags to use for filtering
flags = [200, 201, 211, 237, 239, 250, 254, 255]
# this was just a line to code for testing
flags_dict = {200:0,201:0,211:0,237:0,239:0,250:0,254:0,255:0}
freq = filtered.value_counts()
"""
Expected flags_dict:
200: 1
201: 5
211: 14
237: 38
239: 0
250: 40
254: 1
255: 0
"""
"""
These are the values from the real dataframe but they do not take into
account the other flags in the flags list
freq:
250.0 7682
211.0 3734
200.0 1483
239.0 180
201.0 34
"""
This can be answered fairly straightforward with isin
可以用
isin
相当简单地回答
Assuming filtered
is a Series. 假设
filtered
是一个系列。
In [1]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0)
Out[1]: 200 1
201 5
211 14
237 38
239 0
250 41
254 1
255 0
dtype: int64
To get a dictionary just add to_dict
要获得字典,只需添加
to_dict
In [2]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0).to_dict()
Out[2]: {200: 1, 201: 5, 211: 14, 237: 38, 239: 0, 250: 41, 254: 1, 255: 0}
I came up with this just now, but there has to be a better/faster way to accomplish this 我刚刚想出了这个方法,但是必须有一种更好/更快的方法来完成该任务
#column_data is a list created from a pandas Dataframe column
column_data = list(filtered['C5 Terra'])
flags_dict[200] = column_data.count(200)
flags_dict[201] = column_data.count(201)
flags_dict[211] = column_data.count(211)
flags_dict[237] = column_data.count(237)
flags_dict[239] = column_data.count(239)
flags_dict[250] = column_data.count(250)
flags_dict[254] = column_data.count(254)
flags_dict[255] = column_data.count(255)
flags_dict
If I understood correctly this is what you need: 如果我理解正确,这就是您所需要的:
import pandas as pd
filtered = [200, 201, 201, 201, 201, 201, 211, 211, 211, 211, 211, 211, 211, 211, 211,
211, 211, 211, 211, 211,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 254]
filtered = pd.Series(filtered)
freq = filtered.value_counts(sort=False)
flags = [200, 201, 211, 237, 239, 250, 254, 255]
flags_dict = {}
for flag in flags:
try:
flags_dict[flag] = freq[flag]
except:
flags_dict[flag] = 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.