简体   繁体   English

将 function 应用于 pandas groupby dataframe 中的所有列

[英]apply function to all columns in pandas groupby dataframe

I have the following dataframe (let's call it data ):我有以下 dataframe (我们称之为data ):

id | type | val1 | val2 |
-------------------------
1  |  A   | 10.1 | 11.0 |
1  |  B   | 10.5 | 11.2 |
2  |  A   | 10.7 | 10.9 |
2  |  B   | 10.6 | 11.1 |
3  |  A   | 10.3 | 10.5 |
3  |  B   | 10.4 | 11.3 |

and I want to obtain the difference between the types A and B ( A - B ) for each id for each valX column, ie I want the result to be:我想获得每个valX列的每个id的类型AB ( A - B ) 之间的差异,即我希望结果是:

id | val1 | val2 |
------------------
1  | -0.4 | -0.2 |
2  |  0.1 | -0.2 |
3  | -0.1 | -0.8 |

The only way I could get this done was to define a function:我能做到这一点的唯一方法是定义一个 function:

def getDelta(df, valName):
    return df[ df['type']=='A' ][valName].values[0] - df[ df['type']=='B' ][valName].values[0]

and apply it for each column separately:并将其分别应用于每一列:

data.groupby('id').apply(getDelta,valName='val1')

and then merge the results to obtain what I was looking for.然后合并结果以获得我正在寻找的内容。

Is there a more efficient way to do it?有没有更有效的方法来做到这一点? In the end I want to apply a function to a subset of columns of the grouped dataframe, but this function has to take into account the values of another column.最后,我想将 function 应用于分组 dataframe 的列的子集,但是这个 function 必须考虑到另一列的值。

As it currently is you can use np.subtract.reduce , assuming 'A' comes before 'B' in all cases, and there are no duplicates:就目前而言,您可以使用np.subtract.reduce ,假设在所有情况下 'A' 都在 'B' 之前,并且没有重复:

df.groupby("id", sort = False).agg(np.subtract.reduce).reset_index()

   id  val1  val2
0   1  -0.4  -0.2
1   2   0.1  -0.2
2   3  -0.1  -0.8

You can pivot the dataframe:您可以 pivot dataframe:

x = df.pivot(index="id", columns="type", values="val1")
y = df.pivot(index="id", columns="type", values="val2")

df = pd.concat([x["A"] - x["B"], y["A"] - y["B"]], axis=1).rename(
    columns={0: "val1", 1: "val2"}
)
print(df)

Prints:印刷:

    val1  val2
id            
1   -0.4  -0.2
2    0.1  -0.2
3   -0.1  -0.8

You can groupby() your ID column and use diff(-1) on your valX columns.您可以groupby()您的 ID 列并在您的valX列上使用diff(-1) Wrapping the operation in concat() , will you give you your desired outcome.将操作包装在concat()中,您会给您想要的结果。

df.set_index('id',inplace=True)
pd.concat([df.groupby(['id'])[df.filter(like='val').columns.tolist()].diff(-1).dropna()]).reset_index()

   id  val1  val2  val3
0   1  -0.4  -0.2  -3.1
1   2   0.1  -0.2  17.0
2   3  -0.1  -0.8   1.5

I have added an extra valX just for illustration purposes.出于说明目的,我添加了一个额外的 valX。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM