I have a pandas dataframe containing a couple of blank columns to be filled in later and some actual data in the other columns. The top of the dataframe looks like this .
I'm trying to do a groupby and return an entirely new dataframe row-by-row ie for each group, I do some manipulation and return a row, and all of these rows get concatenated into a dataframe. This is the code for my manipulation function:
def get_trade_text(single_trade):
single_trade.sort_values('Expiry', ascending=False, inplace=True)
common_denominator = np.gcd.reduce(single_trade.Quantity)
prem_diff = single_trade['CostBasis'].sum() / 100 / common_denominator
ticker = single_trade.Symbol.values[0]
exp = single_trade.Expiry.values[0]
risk = calc_max_loss(single_trade)
trade_text = ticker + ' ' + ' / '.join(single_trade.Expiry.dt.strftime('%b-%y')) + ' ' + \
' / '.join(single_trade.Strike.astype(str)) + ' ' + ' / '.join(single_trade.Type) + ' Spread @ $' + \
'{:.2f}'.format(abs(prem_diff)) + ' ' + ('Debit' if prem_diff > 0 else 'Credit')
return pd.Series([trade_text, prem_diff, exp, risk, ticker])
The calc_max_loss
function returns a single float, so risk
is a float value.
My issue is this: when I call this function on the table using df.groupby('ID').apply(get_trade_text)
, I expect one row to be returned for every single ID. However, upon running this code, I see that it returns many rows, but all of them are the output for only the group where ID == 1
. So the output looks like this . Those are the rows I'd expect for an ID
of 1, but none of the other IDs show up.
Things I've tried:
1) I rewrote the manipulation function to simply print out whatever is passed to it. Same problem. It prints out only the group relating to ID == 1
.
2) I printed out the groups in the debugger using df.groupby('ID').groups
and it shows up correctly ie it shows 76 groups (one for each ID) and each group has exactly the right indices inside its values.
3) I tried changing the column I'm using to group, and it has the exact same issue again ie if I do df.groupby('Symbol').apply(get_trade_text)
, it creates groups from the Symbol
column, arranges it alphabetically so AAPL
is the first group, and then returns rows only for AAPL
and not the other symbols.
I'm not sure why this could be happening. I've used groupby
on much more complicated data frames and it's generally worked exactly as expected. But for this data, it seems to glitch out.
Any help is appreciated.
The problem is this line:
single_trade.sort_values('Expiry', ascending=False, inplace=True)
You are not supposed the edit the dataframe passed to apply in any way. It's supposed to work like a read only operation. Simply replacing this with:
single_trade = single_trade.sort_values('Expiry', ascending=False)
solves the problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.