简体   繁体   中英

Pandas groupby Series with DataFrame

I would like to group a Series by a DataFrame and then perform a reduction as in the following example:

In [1]: from pandas import DataFrame

In [2]: df = DataFrame([['Alice', 'F', 100, 1],
                        ['Alice', 'F', 100, 3],
                        ['Drew', 'F', 100, 4],
                        ['Drew', 'M', 100, 5],
                        ['Drew', 'M', 200, 5]],
                       columns=['name', 'sex', 'amount', 'id'])

In [3]: df['amount'].groupby(df[['name', 'sex']]).count()

Unfortunately this raises the following TypeError which has me stumped

TypeError: 'DataFrame' object is not callable

I know that I can use the column names directly but I my actual computation needs to be a bit more general than that and thought that this would be doable. What is going on here? What is the proper way to group-and-reduce a series by an arbitrary DataFrame? Or alternatively, does such a way not exist?

One solution is to turn the Series into a DataFrame, join to the grouper DataFrame, then groupby on the columns of the grouper then reselect out the columns of the grouped. Ie

# Example inputs
pregrouped = df['amount']
grouper = df[['name', 'sex']]

# General computation
pregrouped = DataFrame(pregrouped)
grouper = DataFrame(grouper) 

full = grouper.join(pregrouped)
groups = full.groupby(list(grouper.columns))[list(pregrouped.columns)]
result = groups.some_reduction()[list(pregrouped.columns)].reset_index()

Is anything here very wasteful? This approach runs at about the speed of the normal idiomatic computation that's available in common cases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM