简体   繁体   中英

Applying Different Pandas GroupBy Functions on multiple list of columns

I am looking for a way to apply different Pandas groupby functions (like "mean", "min" or "max") on columns, depending on the beginning of their names.

The way I proceed at the present time is described as follows:

  • dynamically create lists of columns that start with X, Y or...
  • dynamically create lists of functions to apply to each group of columns
  • merge the lists of columns with their corresponding functions into dictionaries
  • merge the dictionaries into the "agg" function:
data = np.random.randint(0, 5, (4, 10))
cols = [f"X{i}" if i % 2 == 0 else f"Y{i}" for i in range(10)]

df = pd.DataFrame(data=data, columns=cols)
df["group"] = ["A", "A", "B", "B"]
print(df)

'''
    X0  Y1  X2  Y3  X4  Y5  X6  Y7  X8  Y9 group
0   2   2   1   2   0   4   2   3   0   3     A
1   0   2   1   0   4   2   3   4   4   3     A
2   4   0   1   3   1   3   0   1   2   4     B
3   0   2   1   2   4   0   0   0   4   0     B
'''

col_list_1 = df.filter(like="X").columns
col_list_2 = df.filter(like="Y").columns

list_of_functions_1 = ["mean" for i in range(len(col_list_1))]
list_of_functions_2 = ["min" for i in range(len(col_list_2))]

dict_1 = dict(zip(col_list_1, list_of_functions_1))
dict_2 = dict(zip(col_list_2, list_of_functions_2))

print(df.groupby("group").agg(dict_1 | dict_2))

'''
        X0   X2   X4   X6   X8  Y1  Y3  Y5  Y7  Y9
group
A      1.0  1.0  2.0  2.5  2.0   2   0   2   3   3
B      2.0  1.0  2.5  0.0  3.0   0   2   0   0   0
'''

Is there a more "Pythonic" way to do this? Maybe something like:

df.groupby("group").agg({col_list_1: "mean",
                         col_list_2: "min"})

Thanks,

Pierre-Louis

The way you do it is quite pythonic, to be honest. You might do it with nested dict comprehension, if you want to compress and automatise it:

functions_map = {"X": "mean",
                 "Y": "min"}

df.groupby("group")\
    .agg({variable: stat for prefix, stat in functions_map.items() \
        for variable in df.filter(like=prefix).columns })

'''
        X0   X2   X4   X6   X8  Y1  Y3  Y5  Y7  Y9
group
A      1.0  1.0  2.0  2.5  2.0   2   0   2   3   3
B      2.0  1.0  2.5  0.0  3.0   0   2   0   0   0
'''

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM