[英]Counting occurrences in pandas dataframe with respect to a list
我正在嘗試使用matplotlib創建元素頻率的條形圖。 為了做到這一點,我需要能夠計算相對於標志列表的pandas dataframe列中的出現次數。 下面將概述我的筆記本/數據中的代碼:
# list of filtered values
filtered = [200, 201, 201, 201, 201, 201,
211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 254]
# list of flags to use for filtering
flags = [200, 201, 211, 237, 239, 250, 254, 255]
# this was just a line to code for testing
flags_dict = {200:0,201:0,211:0,237:0,239:0,250:0,254:0,255:0}
freq = filtered.value_counts()
"""
Expected flags_dict:
200: 1
201: 5
211: 14
237: 38
239: 0
250: 40
254: 1
255: 0
"""
"""
These are the values from the real dataframe but they do not take into
account the other flags in the flags list
freq:
250.0 7682
211.0 3734
200.0 1483
239.0 180
201.0 34
"""
可以用isin
相當簡單地回答
假設filtered
是一個系列。
In [1]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0)
Out[1]: 200 1
201 5
211 14
237 38
239 0
250 41
254 1
255 0
dtype: int64
要獲得字典,只需添加to_dict
In [2]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0).to_dict()
Out[2]: {200: 1, 201: 5, 211: 14, 237: 38, 239: 0, 250: 41, 254: 1, 255: 0}
我剛剛想出了這個方法,但是必須有一種更好/更快的方法來完成該任務
#column_data is a list created from a pandas Dataframe column
column_data = list(filtered['C5 Terra'])
flags_dict[200] = column_data.count(200)
flags_dict[201] = column_data.count(201)
flags_dict[211] = column_data.count(211)
flags_dict[237] = column_data.count(237)
flags_dict[239] = column_data.count(239)
flags_dict[250] = column_data.count(250)
flags_dict[254] = column_data.count(254)
flags_dict[255] = column_data.count(255)
flags_dict
如果我理解正確,這就是您所需要的:
import pandas as pd
filtered = [200, 201, 201, 201, 201, 201, 211, 211, 211, 211, 211, 211, 211, 211, 211,
211, 211, 211, 211, 211,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237,
237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
250, 250, 250, 250, 254]
filtered = pd.Series(filtered)
freq = filtered.value_counts(sort=False)
flags = [200, 201, 211, 237, 239, 250, 254, 255]
flags_dict = {}
for flag in flags:
try:
flags_dict[flag] = freq[flag]
except:
flags_dict[flag] = 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.