简体   繁体   中英

Counting occurrences in pandas dataframe with respect to a list

I am trying to create a barchart of element frequencies using matplotlib. In order to accomplish this, I need to be able to count the amount of occurrences in a pandas dataframe column with respect to a list of flags. Below will give a rough sketch of the code I have in my notebook/data:

   # list of filtered values 
   filtered = [200, 201, 201, 201, 201, 201, 
   211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 
   237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
   237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
   237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
   250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
   250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
   250, 250, 250, 250, 254]

   # list of flags to use for filtering 
   flags = [200, 201, 211, 237, 239, 250, 254, 255]
   # this was just a line to code for testing
   flags_dict = {200:0,201:0,211:0,237:0,239:0,250:0,254:0,255:0}

   freq = filtered.value_counts()


   """
   Expected flags_dict:
   200: 1
   201: 5
   211: 14
   237: 38
   239: 0
   250: 40
   254: 1
   255: 0
   """

   """
   These are the values from the real dataframe but they do not take into 
   account the other flags in the flags list
   freq: 
   250.0    7682
   211.0    3734
   200.0    1483
   239.0     180
   201.0      34       
   """

This can be answered fairly straightforward with isin

Assuming filtered is a Series.

In [1]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0)
Out[1]: 200     1
        201     5
        211    14
        237    38
        239     0
        250    41
        254     1
        255     0
        dtype: int64

To get a dictionary just add to_dict

In [2]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0).to_dict()

Out[2]: {200: 1, 201: 5, 211: 14, 237: 38, 239: 0, 250: 41, 254: 1, 255: 0}

I came up with this just now, but there has to be a better/faster way to accomplish this

      #column_data is a list created from a pandas Dataframe column 
      column_data = list(filtered['C5 Terra'])
      flags_dict[200] = column_data.count(200)
      flags_dict[201] = column_data.count(201)
      flags_dict[211] = column_data.count(211)
      flags_dict[237] = column_data.count(237)
      flags_dict[239] = column_data.count(239)
      flags_dict[250] = column_data.count(250)
      flags_dict[254] = column_data.count(254)
      flags_dict[255] = column_data.count(255)
      flags_dict

If I understood correctly this is what you need:

import pandas as pd

filtered = [200, 201, 201, 201, 201, 201, 211, 211, 211, 211, 211, 211, 211, 211, 211,
            211, 211, 211, 211, 211, 
            237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
            237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
            237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250,
            250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
            250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250,
            250, 250, 250, 250, 254]


filtered = pd.Series(filtered)

freq = filtered.value_counts(sort=False)
flags = [200, 201, 211, 237, 239, 250, 254, 255]
flags_dict = {}
for flag in flags:
    try:
        flags_dict[flag] = freq[flag]
    except:
        flags_dict[flag] = 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM