So far my approach to the task described in the title is quite straightforward, yet it seems somewhat inefficient/unpythonic. An example of what I usually do is as follows:
The original Pandas DataFrame df
has 6 columns: 'open', 'high', 'low', 'close', 'volume', 'new dt'
import pandas as pd
df_gb = df.groupby('new dt')
arr_high = df_gb['high'].max()
arr_low = df_gb['low'].min()
arr_open = df_gb['open'].first()
arr_close = df_gb['close'].last()
arr_volumne = df_gb['volume'].sum()
df2 = pd.concat([arr_open,
arr_high,
arr_low,
arr_close,
arr_volumne], axis = 'columns')
It may seem already efficient at first glance, but when I have 20 functions waiting to apply on 20 different columns, it quickly becomes unpythonic/inefficient.
Is there any way to make it more efficient/pythonic? Thank you in advance
If you have 20 different functions you will have to properly match columns with functions anyways. The term pythonic can be subjective so this is not the correct answer but potentially useful. Your approach is pythonic in my opinion and it kinda details what is happening properly
# as long as the columns are ordered with the proper functions
# you may have to change the ordering here
columns_to_agg = (column for column in df.columns if column != 'new dt')
# if the functions are all methods of pandas.Series just use strings
agg_methods = ['first', 'max', 'min', 'last', 'sum']
# construct a dictionary and use it as aggregator
agg_dict = dict((el[0], el[1]) for el in zip(columns_to_agg, agg_methods))
df_gb = df.groupby('new dt', as_index=False).agg(agg_dict)
If you have custom functions you wanted to apply to, say volume, you could do
def custom_f(series):
return pd.notnull(series).sum()
agg_methods = ['first', 'max', 'min', 'last', custom_f]
Everything else will be fine. You could even do this to apply sum and custom_f to your volume column
agg_methods = ['first', 'max', 'min', 'last', ['sum', custom_f]]
In [3]: import pandas as pd
In [4]: import numpy as np
In [5]: df = pd.DataFrame([[1, 2, 3],[4, 5, 6],[7, 8, 9],
...: [np.nan, np.nan, np.nan]],columns=['A', 'B', 'C'])
In [6]: df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
Out[6]:
A B
max NaN 8.0
min 1.0 2.0
sum 12.0 NaN
For functions as column:
In [11]: df.agg({'A' : ['sum'], 'B' : ['min', 'max']}).T
Out[11]:
max min sum
A NaN NaN 12.0
B 8.0 2.0 NaN
For using custom functions you can do like this:
In [12]: df.agg({'A' : ['sum',lambda x:x.mean()], 'B' : ['min', 'max']}).T
Out[12]:
<lambda> max min sum
A 4.0 NaN NaN 12.0
B NaN 8.0 2.0 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.