简体   繁体   中英

How to build a statement to perform a groupby operation during runtime on a Pandas DataFrame?

I have a Pandas DataFrame dfs and a list headers

The list headers is assigned the column names of the DataFrame dfs during runtime.

For ex, let us consider the list gets assigned with dfs 's column names as:

["Information_type", "Interface", "Type_of_Interface", "Connection_Mechanism"]

I want to perform the below groupby and agg operation on the DataFrame dfs without explicitly mentioning the column names in the groupby operation ie "Information_type": " ".join , "Interface": " ".join , "Type_of_Interface": " ".join , "Connection_Mechanism": " ".join :

dfs[0]=dfs[0].groupby("grp").agg({"Information_type": " ".join, "Interface": " ".join, "Type_of_Interface": " ".join, "Connection_Mechanism": " ".join})

Basically write "Information_type": " ".join, "Interface": " ".join, "Type_of_Interface": " ".join, "Connection_Mechanism": " ".join to the above line in runtime.

It would be great if such a thing would be possible, else I would have to manually edit the column names and execute the groupby and agg operation for each table!

Appreciate your help. Thanks in advance!

IIUC this is what you want:

#setup
df = pd.DataFrame({'a':np.random.randint(0,5,25),
                   'b':np.random.randint(0,5,25),
                   'c':np.random.randint(0,5,25), 
                   'd':np.random.randint(0,5,25)}, dtype = str)

cols = ['b','c']

df.groupby('a').agg({col: " ".join for col in cols})

Output

               b              c
a                              
0  0 0 3 3 4 2 3  3 3 4 0 4 3 2
1      2 4 1 2 1      3 0 2 1 3
2        0 0 4 2        1 3 1 3
3    2 2 4 1 3 0    3 1 1 1 2 0
4          4 2 0          2 0 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM