Pandas groupby Series with DataFrame

Question

I would like to group a Series by a DataFrame and then perform a reduction as in the following example:

In [1]: from pandas import DataFrame

In [2]: df = DataFrame([['Alice', 'F', 100, 1],
                        ['Alice', 'F', 100, 3],
                        ['Drew', 'F', 100, 4],
                        ['Drew', 'M', 100, 5],
                        ['Drew', 'M', 200, 5]],
                       columns=['name', 'sex', 'amount', 'id'])

In [3]: df['amount'].groupby(df[['name', 'sex']]).count()

Unfortunately this raises the following TypeError which has me stumped

TypeError: 'DataFrame' object is not callable

I know that I can use the column names directly but I my actual computation needs to be a bit more general than that and thought that this would be doable. What is going on here? What is the proper way to group-and-reduce a series by an arbitrary DataFrame? Or alternatively, does such a way not exist?

Answer 1

One solution is to turn the Series into a DataFrame, join to the grouper DataFrame, then groupby on the columns of the grouper then reselect out the columns of the grouped. Ie

# Example inputs
pregrouped = df['amount']
grouper = df[['name', 'sex']]

# General computation
pregrouped = DataFrame(pregrouped)
grouper = DataFrame(grouper) 

full = grouper.join(pregrouped)
groups = full.groupby(list(grouper.columns))[list(pregrouped.columns)]
result = groups.some_reduction()[list(pregrouped.columns)].reset_index()

Is anything here very wasteful? This approach runs at about the speed of the normal idiomatic computation that's available in common cases.

Pandas groupby Series with DataFrame

Question

1 answers

solution1
0 2014-05-29 03:50:45

Pandas groupby Series with DataFrame

Question

1 answers

solution1 0 2014-05-29 03:50:45

solution1
0 2014-05-29 03:50:45