I am looking for a way to apply different Pandas groupby functions (like "mean", "min" or "max") on columns, depending on the beginning of their names.
The way I proceed at the present time is described as follows:
data = np.random.randint(0, 5, (4, 10))
cols = [f"X{i}" if i % 2 == 0 else f"Y{i}" for i in range(10)]
df = pd.DataFrame(data=data, columns=cols)
df["group"] = ["A", "A", "B", "B"]
print(df)
'''
X0 Y1 X2 Y3 X4 Y5 X6 Y7 X8 Y9 group
0 2 2 1 2 0 4 2 3 0 3 A
1 0 2 1 0 4 2 3 4 4 3 A
2 4 0 1 3 1 3 0 1 2 4 B
3 0 2 1 2 4 0 0 0 4 0 B
'''
col_list_1 = df.filter(like="X").columns
col_list_2 = df.filter(like="Y").columns
list_of_functions_1 = ["mean" for i in range(len(col_list_1))]
list_of_functions_2 = ["min" for i in range(len(col_list_2))]
dict_1 = dict(zip(col_list_1, list_of_functions_1))
dict_2 = dict(zip(col_list_2, list_of_functions_2))
print(df.groupby("group").agg(dict_1 | dict_2))
'''
X0 X2 X4 X6 X8 Y1 Y3 Y5 Y7 Y9
group
A 1.0 1.0 2.0 2.5 2.0 2 0 2 3 3
B 2.0 1.0 2.5 0.0 3.0 0 2 0 0 0
'''
Is there a more "Pythonic" way to do this? Maybe something like:
df.groupby("group").agg({col_list_1: "mean",
col_list_2: "min"})
Thanks,
Pierre-Louis
The way you do it is quite pythonic, to be honest. You might do it with nested dict comprehension, if you want to compress and automatise it:
functions_map = {"X": "mean",
"Y": "min"}
df.groupby("group")\
.agg({variable: stat for prefix, stat in functions_map.items() \
for variable in df.filter(like=prefix).columns })
'''
X0 X2 X4 X6 X8 Y1 Y3 Y5 Y7 Y9
group
A 1.0 1.0 2.0 2.5 2.0 2 0 2 3 3
B 2.0 1.0 2.5 0.0 3.0 0 2 0 0 0
'''
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.