简体   繁体   English

Pandas - 从 configParser 传递自定义聚合 function

[英]Pandas - passing a custom aggregation function from configParser

with dataframe having data like below, dataframe 具有如下数据,

Time,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11,Col12,Col13
05:17:55.703,,,,,,21,,3,    89,891,11,
05:17:55.703,,,,,,21,,3,   217,891,12,
05:17:55.703,,,,,,21,,3,   217,891,13,
05:17:55.703,,,,,,21,,3,   217,891,15,
05:17:55.703,,,,,,21,,3,   217,891,16,
05:17:55.703,,,,,,21,,3,   217,891,17,
05:17:55.703,,,,,,21,,3,   217,891,18,
05:17:55.707,,,,,,18,,3,   185,892,0,
05:17:55.707,,,,,,21,,3,   185,892,1,
05:17:55.707,,,,,,17,,3,    73,892,5,
05:17:55.707,,,,,,17,,3,   185,892,6,
05:17:55.707,,,,,,21,,3,    73,892,7,
05:17:55.708,268,4,28,-67.60,13,,2,,,,,2
05:17:55.711,,,,,,18,,3,    57,892,10,
05:17:55.711,,,,,,21,,3,   201,892,11,
05:17:55.711,,,,,,21,,3,    25,892,12,
05:17:55.723,,,,,,21,,3,   217,893,11,
05:17:55.723,,,,,,21,,3,   217,893,15,
05:17:55.723,,,,,,21,,3,   217,893,16,
05:17:55.726,268,4,,-67.80,,,,,,,,
05:17:55.728,,,28,,12,31,2,3,   185,894,0,1

Need to do aggregation on each column with a different agg function.需要使用不同的 agg function 对每一列进行聚合。 That is done like below.如下所示。

df['Time'] = pd.to_timedelta(df['Time'])
d = {'Col2':'mean', 'Col3':'max', 'Col5':'median'}
df2 = df.groupby(pd.Grouper(freq='40L', key='Time')).agg(d)

Now, for another column, say Col1 I need to pass a custom mode function like below现在,对于另一列,说Col1我需要传递一个自定义模式 function 如下所示

def mode1(x):
    m = pd.Series.mode(x)
    return m.values[0] if not m.empty else np.nan

I can add mode1 to the dictionary like below and the aggregation works.我可以将mode1添加到字典中,如下所示,并且聚合有效。

aggDict = {'Col1': mode1, 'Col2':'mean', 'Col3':'max', 'Col5':'median'}
d = {'Col2':'mean', 'Col3':'max', 'Col5':'median'}
df2 = df.groupby(pd.Grouper(freq='40L', key='Time')).agg(aggDict)

Further to this, I need read this dictionary from a config file so as to use it with different data frames with diff column names and agg methods respectively.除此之外,我需要从配置文件中读取这个字典,以便将它分别用于具有不同列名和 agg 方法的不同数据帧。

So I create a config file say config.ini like below and use it with ConfigParser所以我创建了一个配置文件,如下所示config.ini并将其与ConfigParser一起使用

config.ini
[Config1]
# for PDSCH and CSF info Apex custom grid
Col1 = mode1
Col2 = mean
Col3 = max
Col4 = median

read the config file读取配置文件

from configparser import ConfigParser
cfgparser = ConfigParser()
cfgparser.optionxform = str # to keep case sensitive keys
cfgparser.read('config.ini')
aggDict = dict(cfgparser.items('Config1'))

When I pass the aggDict to .agg() function like df2 = df.groupby(pd.Grouper(freq='40L', key='Time')).agg(aggDict) it complains 'SeriesGroupBy' object has no attribute 'mode1' .当我将 aggDict 传递给.agg() function 时,例如df2 = df.groupby(pd.Grouper(freq='40L', key='Time')).agg(aggDict)它抱怨'SeriesGroupBy' object has no attribute 'mode1'

I know the problem here -it is that aggDict looks like below (and rightly so)我知道这里的问题 - 它是 aggDict 看起来像下面(正确如此)

{'Col1': 'mode1',
 'Col2': 'mean',
 'Col3': 'max',
 'Col4': 'median'}

When mode1 passed as a string, SeriesGroupBy cannot find it.mode1作为字符串传递时, SeriesGroupBy找不到它。 How to go about this such that SeriesGroupBy can find the user defined mode1 function when passed from configParser ?如何 go 关于这个,以便SeriesGroupByconfigParser mode1

I think you need to call it from your globals or locals depending on the scope.我认为您需要根据 scope 从全局或本地调用它。 So that would mean:所以这意味着:

aggDict = {'Col1': globals()['mode1'], 'Col2':'mean', 'Col3':'max', 'Col5':'median'}

What you are doing is that you are calling a custom function to pass using the globals() .您正在做的是调用自定义 function 以使用globals()传递。 This is assuming that you have the function in the same class or file.这是假设您在同一个 class 或文件中有 function。 When you parse it into the aggDict dictionary, use the format in the code above.当你把它解析成aggDict字典时,使用上面代码中的格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM