相對於列表計算pandas數據框中的出現次數

Question

我正在嘗試使用matplotlib創建元素頻率的條形圖。 為了做到這一點，我需要能夠計算相對於標志列表的pandas dataframe列中的出現次數。 下面將概述我的筆記本/數據中的代碼：

   # list of filtered values 
   filtered = [200, 201, 201, 201, 201, 201, 
   211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 
   237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
   237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
   237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
   250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
   250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
   250, 250, 250, 250, 254]

   # list of flags to use for filtering 
   flags = [200, 201, 211, 237, 239, 250, 254, 255]
   # this was just a line to code for testing
   flags_dict = {200:0,201:0,211:0,237:0,239:0,250:0,254:0,255:0}

   freq = filtered.value_counts()


   """
   Expected flags_dict:
   200: 1
   201: 5
   211: 14
   237: 38
   239: 0
   250: 40
   254: 1
   255: 0
   """

   """
   These are the values from the real dataframe but they do not take into 
   account the other flags in the flags list
   freq: 
   250.0    7682
   211.0    3734
   200.0    1483
   239.0     180
   201.0      34       
   """

Answer 1

可以用isin相當簡單地回答

假設filtered是一個系列。

In [1]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0)
Out[1]: 200     1
        201     5
        211    14
        237    38
        239     0
        250    41
        254     1
        255     0
        dtype: int64

要獲得字典，只需添加to_dict

In [2]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0).to_dict()

Out[2]: {200: 1, 201: 5, 211: 14, 237: 38, 239: 0, 250: 41, 254: 1, 255: 0}

Answer 2

我剛剛想出了這個方法，但是必須有一種更好/更快的方法來完成該任務

      #column_data is a list created from a pandas Dataframe column 
      column_data = list(filtered['C5 Terra'])
      flags_dict[200] = column_data.count(200)
      flags_dict[201] = column_data.count(201)
      flags_dict[211] = column_data.count(211)
      flags_dict[237] = column_data.count(237)
      flags_dict[239] = column_data.count(239)
      flags_dict[250] = column_data.count(250)
      flags_dict[254] = column_data.count(254)
      flags_dict[255] = column_data.count(255)
      flags_dict

Answer 3

如果我理解正確，這就是您所需要的：

import pandas as pd

filtered = [200, 201, 201, 201, 201, 201, 211, 211, 211, 211, 211, 211, 211, 211, 211,
            211, 211, 211, 211, 211, 
            237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
            237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
            237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
            250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
            250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
            250, 250, 250, 250, 254]


filtered = pd.Series(filtered)

freq = filtered.value_counts(sort=False)
flags = [200, 201, 211, 237, 239, 250, 254, 255]
flags_dict = {}
for flag in flags:
    try:
        flags_dict[flag] = freq[flag]
    except:
        flags_dict[flag] = 0

相對於列表計算pandas數據框中的出現次數

問題描述

3 個解決方案

解決方案1
1 2016-12-06 00:06:55

解決方案2
0 2016-12-05 22:18:35

解決方案3
0 2016-12-05 22:19:21

相對於列表計算pandas數據框中的出現次數

問題描述

3 個解決方案

解決方案1 1 2016-12-06 00:06:55

解決方案2 0 2016-12-05 22:18:35

解決方案3 0 2016-12-05 22:19:21

解決方案1
1 2016-12-06 00:06:55

解決方案2
0 2016-12-05 22:18:35

解決方案3
0 2016-12-05 22:19:21