简体   繁体   中英

Pandas DataFrame Groupby Named Aggregation Using Lambda Results in a KeyError, Dict-of-Dicts Approach Works Fine

Groupby.Agg() with a Dict-of-Dict argument to name the resulting columns is being deprecated in favor of the new Named Aggregation approach. However, I am having trouble applying lambda functions that worked fine previously (using a Dict-of-Dict).

I'm using Python 3.7.4, NumPy 1.16.4, Pandas 0.25.0

import numpy as np
import pandas as pd

data = [['tom', 10, 'blue', 1000, 'a'], ['nick', 15, 'blue', 2000, 'b'], ['julie', 14, 'green', 3000, 'a'], ['bob', 11, 'green', 4000, 'a'], ['cindy', 16, 'red', 5000, 'b']]

df = pd.DataFrame(data, columns = ['Name', 'Age', 'Color', 'Num', 'Letter'])

# Dict-style renaming seems to work fine:
df.groupby(by='Color').agg({'Num': {'SumNum' : np.sum, 'SumNumIfLetterA': lambda x: x[df.iloc[x.index].Letter=='a'].sum()}})

C:\Users\AppData\Local\Continuum\anaconda3\Lib\site-packages\pandas\core\groupby\generic.py:1455: FutureWarning: using a dict with renaming is deprecated and will be removed
in a future version.

For column-specific groupby renaming, use named aggregation

df.groupby(...).agg(name=('column', aggfunc))

  return super().aggregate(arg, *args, **kwargs)

Out[4]: 
         Num                
      SumNum SumNumIfLetterA
Color                       
blue    3000            1000
green   7000            7000
red     5000               0

# Named aggregation throws a KeyError:
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))


Traceback (most recent call last):

  File "<ipython-input-5-9be7b560a3f5>", line 2, in <module>
    df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))

  File "C:\Users\AppData\Local\Continuum\anaconda3\Lib\site-packages\pandas\core\groupby\generic.py", line 1455, in aggregate
    return super().aggregate(arg, *args, **kwargs)

  File "C:\Users\AppData\Local\Continuum\anaconda3\Lib\site-packages\pandas\core\groupby\generic.py", line 264, in aggregate
    result = result[order]

  File "C:\Users\AppData\Local\Continuum\anaconda3\Lib\site-packages\pandas\core\frame.py", line 2981, in __getitem__
    indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)

  File "C:\Users\AppData\Local\Continuum\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 1271, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]

  File "C:\Users\AppData\Local\Continuum\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 1078, in _get_listlike_indexer
    keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing

  File "C:\Users\AppData\Local\Continuum\anaconda3\Lib\site-packages\pandas\core\indexing.py", line 1171, in _validate_read_indexer
    raise KeyError("{} not in index".format(not_found))

KeyError: "[('Num', '<lambda>')] not in index"

I had a very similar problem. After digging a little deeper in to github, I found a workaround by creating a dummy column in the main data frame. So in your code if you do the following it should work

data = [['tom', 10, 'blue', 1000, 'a'], ['nick', 15, 'blue', 2000, 'b'], ['julie', 14, 'green', 3000, 'a'], ['bob', 11, 'green', 4000, 'a'], ['cindy', 16, 'red', 5000, 'b']]

df = pd.DataFrame(data, columns = ['Name', 'Age', 'Color', 'Num', 'Letter'])
#Dummy Columns
df['Num1']=df['Num']
#now your groupby with NamedAgg on Num and Num1
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num1', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))

Output from Ipython Console

df['Num1']=df['Num']

df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num1', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))
Out[46]: 
       SumNum  SumNumIfLetterA
Color                         
blue     3000             1000
green    7000             7000
red      5000                0

Hope this works!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM