![](/img/trans.png)
[英]Aggregating different sets of columns with different functions after groupby in Pandas
[英]Python Pandas: efficiently aggregating different functions on different columns and combining the resulting columns together
到目前為止,我對標題中描述的任務的處理方式非常簡單,但似乎效率低下/難以理解。 我通常做的一個例子如下:
原始的Pandas DataFrame df
有6列: 'open', 'high', 'low', 'close', 'volume', 'new dt'
import pandas as pd
df_gb = df.groupby('new dt')
arr_high = df_gb['high'].max()
arr_low = df_gb['low'].min()
arr_open = df_gb['open'].first()
arr_close = df_gb['close'].last()
arr_volumne = df_gb['volume'].sum()
df2 = pd.concat([arr_open,
arr_high,
arr_low,
arr_close,
arr_volumne], axis = 'columns')
乍一看似乎已經很有效,但是當我有20個函數等待應用到20個不同的列時,它很快變得不合邏輯/效率低下。
有什么辦法可以使其更高效/ Pythonic? 先感謝您
如果您有20種不同的功能,則無論如何都必須正確地將列與功能進行匹配。 術語pythonic可能是主觀的,因此這不是正確的答案,但可能有用。 我認為您的方法是pythonic的,它詳細說明了正在發生的事情
# as long as the columns are ordered with the proper functions
# you may have to change the ordering here
columns_to_agg = (column for column in df.columns if column != 'new dt')
# if the functions are all methods of pandas.Series just use strings
agg_methods = ['first', 'max', 'min', 'last', 'sum']
# construct a dictionary and use it as aggregator
agg_dict = dict((el[0], el[1]) for el in zip(columns_to_agg, agg_methods))
df_gb = df.groupby('new dt', as_index=False).agg(agg_dict)
如果您有要應用的自定義功能(例如音量),則可以
def custom_f(series):
return pd.notnull(series).sum()
agg_methods = ['first', 'max', 'min', 'last', custom_f]
一切都會好起來的。 您甚至可以執行此操作以將sum和custom_f應用於音量列
agg_methods = ['first', 'max', 'min', 'last', ['sum', custom_f]]
In [3]: import pandas as pd
In [4]: import numpy as np
In [5]: df = pd.DataFrame([[1, 2, 3],[4, 5, 6],[7, 8, 9],
...: [np.nan, np.nan, np.nan]],columns=['A', 'B', 'C'])
In [6]: df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
Out[6]:
A B
max NaN 8.0
min 1.0 2.0
sum 12.0 NaN
對於列功能:
In [11]: df.agg({'A' : ['sum'], 'B' : ['min', 'max']}).T
Out[11]:
max min sum
A NaN NaN 12.0
B 8.0 2.0 NaN
對於使用自定義函數,您可以這樣:
In [12]: df.agg({'A' : ['sum',lambda x:x.mean()], 'B' : ['min', 'max']}).T
Out[12]:
<lambda> max min sum
A 4.0 NaN NaN 12.0
B NaN 8.0 2.0 NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.