I working with a dataframe that have some rows that needed to be groupped (with a join) using a key.
Basically I have this dataframe:
d = {'process': [1, 2, 2, 3, 3], 'notes_txt': ['TESTE 1', 'TESTE A ', 'TESTE A ', 'TESTE B ', 'TESTE B '],'notes_cont': ['Process 1: 0 errors', 'Process 1:', '0 errors', 'Process 2:', '5 errors'], 'notes_cont_pt2': ['via script', 'via script 1', 'script 2', 'via script 2', 'script 5']}
df = pd.DataFrame(data=d)
And my desired output is:
I am trying with this code for one column (and it works fine):
import pandas as pd
d = {'process': [1, 2, 2, 3, 3], 'notes_txt': ['TESTE 1', 'TESTE A ', 'TESTE A ', 'TESTE B ', 'TESTE B '],'notes_cont': ['Process 1: 0 errors', 'Process 1:', '0 errors', 'Process 2:', '5 errors'], 'notes_cont_pt2': ['via script', 'via script 1', 'script 2', 'via script 2', 'script 5']}
df = pd.DataFrame(data=d)
df = df.groupby(['process','notes_txt'])['notes_cont'].apply(' '.join).reset_index()
print(df)
Grouping with one column I have the solution, but if I have to do it using two columns I getting erros:
Traceback (most recent call last):
df = df.groupby(['process','notes_txt'])['notes_cont']['notes_cont_pt2'].apply(' '.join).reset_index()
File "base.py", line 258, in __getitem__
.format(selection=self._selection))
IndexError: Column(s) notes_cont already selected
I've tried with this:
df = df.groupby(['process','notes_txt'])['notes_cont', 'notes_cont_pt2'].apply(' '.join).reset_index()
But it gives me this output:
IIUC, GroupBy.agg
df.groupby(['process','notes_txt'],as_index = False).agg({'notes_cont':''.join,
'notes_cont_pt2':','.join})
process notes_txt notes_cont notes_cont_pt2
0 1 TESTE 1 Process 1: 0 errors via script
1 2 TESTE A Process 1:0 errors via script 1,script 2
2 3 TESTE B Process 2:5 errors via script 2,script 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.