简体   繁体   中英

Pandas - passing a custom aggregation function from configParser

with dataframe having data like below,

Time,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11,Col12,Col13
05:17:55.703,,,,,,21,,3,    89,891,11,
05:17:55.703,,,,,,21,,3,   217,891,12,
05:17:55.703,,,,,,21,,3,   217,891,13,
05:17:55.703,,,,,,21,,3,   217,891,15,
05:17:55.703,,,,,,21,,3,   217,891,16,
05:17:55.703,,,,,,21,,3,   217,891,17,
05:17:55.703,,,,,,21,,3,   217,891,18,
05:17:55.707,,,,,,18,,3,   185,892,0,
05:17:55.707,,,,,,21,,3,   185,892,1,
05:17:55.707,,,,,,17,,3,    73,892,5,
05:17:55.707,,,,,,17,,3,   185,892,6,
05:17:55.707,,,,,,21,,3,    73,892,7,
05:17:55.708,268,4,28,-67.60,13,,2,,,,,2
05:17:55.711,,,,,,18,,3,    57,892,10,
05:17:55.711,,,,,,21,,3,   201,892,11,
05:17:55.711,,,,,,21,,3,    25,892,12,
05:17:55.723,,,,,,21,,3,   217,893,11,
05:17:55.723,,,,,,21,,3,   217,893,15,
05:17:55.723,,,,,,21,,3,   217,893,16,
05:17:55.726,268,4,,-67.80,,,,,,,,
05:17:55.728,,,28,,12,31,2,3,   185,894,0,1

Need to do aggregation on each column with a different agg function. That is done like below.

df['Time'] = pd.to_timedelta(df['Time'])
d = {'Col2':'mean', 'Col3':'max', 'Col5':'median'}
df2 = df.groupby(pd.Grouper(freq='40L', key='Time')).agg(d)

Now, for another column, say Col1 I need to pass a custom mode function like below

def mode1(x):
    m = pd.Series.mode(x)
    return m.values[0] if not m.empty else np.nan

I can add mode1 to the dictionary like below and the aggregation works.

aggDict = {'Col1': mode1, 'Col2':'mean', 'Col3':'max', 'Col5':'median'}
d = {'Col2':'mean', 'Col3':'max', 'Col5':'median'}
df2 = df.groupby(pd.Grouper(freq='40L', key='Time')).agg(aggDict)

Further to this, I need read this dictionary from a config file so as to use it with different data frames with diff column names and agg methods respectively.

So I create a config file say config.ini like below and use it with ConfigParser

config.ini
[Config1]
# for PDSCH and CSF info Apex custom grid
Col1 = mode1
Col2 = mean
Col3 = max
Col4 = median

read the config file

from configparser import ConfigParser
cfgparser = ConfigParser()
cfgparser.optionxform = str # to keep case sensitive keys
cfgparser.read('config.ini')
aggDict = dict(cfgparser.items('Config1'))

When I pass the aggDict to .agg() function like df2 = df.groupby(pd.Grouper(freq='40L', key='Time')).agg(aggDict) it complains 'SeriesGroupBy' object has no attribute 'mode1' .

I know the problem here -it is that aggDict looks like below (and rightly so)

{'Col1': 'mode1',
 'Col2': 'mean',
 'Col3': 'max',
 'Col4': 'median'}

When mode1 passed as a string, SeriesGroupBy cannot find it. How to go about this such that SeriesGroupBy can find the user defined mode1 function when passed from configParser ?

I think you need to call it from your globals or locals depending on the scope. So that would mean:

aggDict = {'Col1': globals()['mode1'], 'Col2':'mean', 'Col3':'max', 'Col5':'median'}

What you are doing is that you are calling a custom function to pass using the globals() . This is assuming that you have the function in the same class or file. When you parse it into the aggDict dictionary, use the format in the code above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM