Why does pandas groupby pass only the first row to apply()?

Question

I have a pandas dataframe containing a couple of blank columns to be filled in later and some actual data in the other columns. The top of the dataframe looks like this .

I'm trying to do a groupby and return an entirely new dataframe row-by-row ie for each group, I do some manipulation and return a row, and all of these rows get concatenated into a dataframe. This is the code for my manipulation function:

def get_trade_text(single_trade):
    single_trade.sort_values('Expiry', ascending=False, inplace=True)
    common_denominator = np.gcd.reduce(single_trade.Quantity)
    prem_diff = single_trade['CostBasis'].sum() / 100 / common_denominator
    ticker = single_trade.Symbol.values[0]
    exp = single_trade.Expiry.values[0]

    risk = calc_max_loss(single_trade)

    trade_text = ticker + ' ' + ' / '.join(single_trade.Expiry.dt.strftime('%b-%y')) + ' ' + \
                 ' / '.join(single_trade.Strike.astype(str)) + ' ' + ' / '.join(single_trade.Type) + ' Spread @ $' + \
                 '{:.2f}'.format(abs(prem_diff)) + ' ' + ('Debit' if prem_diff > 0 else 'Credit')

    return pd.Series([trade_text, prem_diff, exp, risk, ticker])

The calc_max_loss function returns a single float, so risk is a float value.

My issue is this: when I call this function on the table using df.groupby('ID').apply(get_trade_text) , I expect one row to be returned for every single ID. However, upon running this code, I see that it returns many rows, but all of them are the output for only the group where ID == 1 . So the output looks like this . Those are the rows I'd expect for an ID of 1, but none of the other IDs show up.

Things I've tried:

1) I rewrote the manipulation function to simply print out whatever is passed to it. Same problem. It prints out only the group relating to ID == 1 .

2) I printed out the groups in the debugger using df.groupby('ID').groups and it shows up correctly ie it shows 76 groups (one for each ID) and each group has exactly the right indices inside its values.

3) I tried changing the column I'm using to group, and it has the exact same issue again ie if I do df.groupby('Symbol').apply(get_trade_text) , it creates groups from the Symbol column, arranges it alphabetically so AAPL is the first group, and then returns rows only for AAPL and not the other symbols.

I'm not sure why this could be happening. I've used groupby on much more complicated data frames and it's generally worked exactly as expected. But for this data, it seems to glitch out.

Any help is appreciated.

Answer 1

The problem is this line:

single_trade.sort_values('Expiry', ascending=False, inplace=True)

You are not supposed the edit the dataframe passed to apply in any way. It's supposed to work like a read only operation. Simply replacing this with:

single_trade = single_trade.sort_values('Expiry', ascending=False)

solves the problem.

Why does pandas groupby pass only the first row to apply()?

Question

1 answers

solution1
0 2020-03-20 20:53:22

Why does pandas groupby pass only the first row to apply()?

Question

1 answers

solution1 0 2020-03-20 20:53:22

solution1
0 2020-03-20 20:53:22