簡體   English   中英

相對於列表計算pandas數據框中的出現次數

[英]Counting occurrences in pandas dataframe with respect to a list

我正在嘗試使用matplotlib創建元素頻率的條形圖。 為了做到這一點,我需要能夠計算相對於標志列表的pandas dataframe列中的出現次數。 下面將概述我的筆記本/數據中的代碼:

   # list of filtered values 
   filtered = [200, 201, 201, 201, 201, 201, 
   211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 
   237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
   237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
   237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
   250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
   250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
   250, 250, 250, 250, 254]

   # list of flags to use for filtering 
   flags = [200, 201, 211, 237, 239, 250, 254, 255]
   # this was just a line to code for testing
   flags_dict = {200:0,201:0,211:0,237:0,239:0,250:0,254:0,255:0}

   freq = filtered.value_counts()


   """
   Expected flags_dict:
   200: 1
   201: 5
   211: 14
   237: 38
   239: 0
   250: 40
   254: 1
   255: 0
   """

   """
   These are the values from the real dataframe but they do not take into 
   account the other flags in the flags list
   freq: 
   250.0    7682
   211.0    3734
   200.0    1483
   239.0     180
   201.0      34       
   """

可以用isin相當簡單地回答

假設filtered是一個系列。

In [1]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0)
Out[1]: 200     1
        201     5
        211    14
        237    38
        239     0
        250    41
        254     1
        255     0
        dtype: int64

要獲得字典,只需添加to_dict

In [2]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0).to_dict()

Out[2]: {200: 1, 201: 5, 211: 14, 237: 38, 239: 0, 250: 41, 254: 1, 255: 0}

我剛剛想出了這個方法,但是必須有一種更好/更快的方法來完成該任務

      #column_data is a list created from a pandas Dataframe column 
      column_data = list(filtered['C5 Terra'])
      flags_dict[200] = column_data.count(200)
      flags_dict[201] = column_data.count(201)
      flags_dict[211] = column_data.count(211)
      flags_dict[237] = column_data.count(237)
      flags_dict[239] = column_data.count(239)
      flags_dict[250] = column_data.count(250)
      flags_dict[254] = column_data.count(254)
      flags_dict[255] = column_data.count(255)
      flags_dict

如果我理解正確,這就是您所需要的:

import pandas as pd

filtered = [200, 201, 201, 201, 201, 201, 211, 211, 211, 211, 211, 211, 211, 211, 211,
            211, 211, 211, 211, 211, 
            237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
            237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
            237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
            250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
            250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
            250, 250, 250, 250, 254]


filtered = pd.Series(filtered)

freq = filtered.value_counts(sort=False)
flags = [200, 201, 211, 237, 239, 250, 254, 255]
flags_dict = {}
for flag in flags:
    try:
        flags_dict[flag] = freq[flag]
    except:
        flags_dict[flag] = 0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM