为什么pandas groupby 只将第一行传递给apply()？

Question

I have a pandas dataframe containing a couple of blank columns to be filled in later and some actual data in the other columns.我有一个 Pandas 数据框，其中包含几个要稍后填写的空白列以及其他列中的一些实际数据。 The top of the dataframe looks like this .数据框的顶部看起来像这样。

I'm trying to do a groupby and return an entirely new dataframe row-by-row ie for each group, I do some manipulation and return a row, and all of these rows get concatenated into a dataframe.我正在尝试进行分组并逐行返回一个全新的数据帧，即对于每个组，我进行一些操作并返回一行，并且所有这些行都连接成一个数据帧。 This is the code for my manipulation function:这是我的操作函数的代码：

def get_trade_text(single_trade):
    single_trade.sort_values('Expiry', ascending=False, inplace=True)
    common_denominator = np.gcd.reduce(single_trade.Quantity)
    prem_diff = single_trade['CostBasis'].sum() / 100 / common_denominator
    ticker = single_trade.Symbol.values[0]
    exp = single_trade.Expiry.values[0]

    risk = calc_max_loss(single_trade)

    trade_text = ticker + ' ' + ' / '.join(single_trade.Expiry.dt.strftime('%b-%y')) + ' ' + \
                 ' / '.join(single_trade.Strike.astype(str)) + ' ' + ' / '.join(single_trade.Type) + ' Spread @ $' + \
                 '{:.2f}'.format(abs(prem_diff)) + ' ' + ('Debit' if prem_diff > 0 else 'Credit')

    return pd.Series([trade_text, prem_diff, exp, risk, ticker])

The calc_max_loss function returns a single float, so risk is a float value. calc_max_loss函数返回单个浮点数，因此risk是一个浮点值。

My issue is this: when I call this function on the table using df.groupby('ID').apply(get_trade_text) , I expect one row to be returned for every single ID.我的问题是：当我使用df.groupby('ID').apply(get_trade_text)在表上调用此函数时，我希望为每个 ID 返回一行。 However, upon running this code, I see that it returns many rows, but all of them are the output for only the group where ID == 1 .但是，在运行此代码时，我看到它返回了许多行，但所有这些行都只是ID == 1组的输出。 So the output looks like this .所以输出看起来像这样。 Those are the rows I'd expect for an ID of 1, but none of the other IDs show up.这些是我期望ID为 1 的行，但没有显示其他 ID。

Things I've tried:我尝试过的事情：

1) I rewrote the manipulation function to simply print out whatever is passed to it. 1）我重新编写了操作函数以简单地打印出传递给它的任何内容。 Same problem.同样的问题。 It prints out only the group relating to ID == 1 .它只打印出与ID == 1相关的组。

2) I printed out the groups in the debugger using df.groupby('ID').groups and it shows up correctly ie it shows 76 groups (one for each ID) and each group has exactly the right indices inside its values. 2) 我使用df.groupby('ID').groups在调试器中打印出df.groupby('ID').groups ，它显示正确，即它显示 76 个组（每个 ID 一个），每个组在其值中都有正确的索引。

3) I tried changing the column I'm using to group, and it has the exact same issue again ie if I do df.groupby('Symbol').apply(get_trade_text) , it creates groups from the Symbol column, arranges it alphabetically so AAPL is the first group, and then returns rows only for AAPL and not the other symbols. 3）我尝试更改我用来分组的列，它再次出现完全相同的问题，即如果我执行df.groupby('Symbol').apply(get_trade_text) ，它会从Symbol列创建组，对其进行排列按字母顺序排列，因此AAPL是第一组，然后仅返回AAPL行，而不返回其他符号的行。

I'm not sure why this could be happening.我不确定为什么会发生这种情况。 I've used groupby on much more complicated data frames and it's generally worked exactly as expected.我在更复杂的数据帧上使用了groupby ，它通常完全按预期工作。 But for this data, it seems to glitch out.但是对于这个数据，它似乎出现了故障。

Any help is appreciated.任何帮助表示赞赏。

Answer 1

The problem is this line:问题是这一行：

single_trade.sort_values('Expiry', ascending=False, inplace=True)

You are not supposed the edit the dataframe passed to apply in any way.您不应该以任何方式对传递的数据框进行编辑。 It's supposed to work like a read only operation.它应该像只读操作一样工作。 Simply replacing this with:只需将其替换为：

single_trade = single_trade.sort_values('Expiry', ascending=False)

solves the problem.解决了这个问题。

为什么pandas groupby 只将第一行传递给apply()？

问题描述

1 个解决方案

解决方案1
0 2020-03-20 20:53:22

为什么pandas groupby 只将第一行传递给apply()？

问题描述

1 个解决方案

解决方案1 0 2020-03-20 20:53:22

解决方案1
0 2020-03-20 20:53:22