colum 1, colum2 a,b,c 30 b,c,f 40 a,g,z 50 . . .
Using above dataframe with col1,2 I'd like to have dataframe as below dataframe with col3, 4. Additionally, col1 consists of values with commas. col4 consists of sum of col2 following col3. column3, column4 a 80 b 70 c 70 f 40 g 50 z 50
Use:
df = (df.set_index('colum2')['colum1']
.str.split(',', expand=True)
.stack()
.reset_index(name='column3')
.groupby('column3', as_index=False)['colum2']
.sum()
.rename(columns={'colum2':'column4'})
)
print (df)
column3 column4
0 a 80
1 b 70
2 c 70
3 f 40
4 g 50
5 z 50
Explanation :
set_index
by column colum2
DataFrame
by split
stack
reset_index
groupby
and aggregate sum
Another solution:
from itertools import chain
a = df['colum1'].str.split(',')
lens = a.str.len()
df = pd.DataFrame({
'column3' : list(chain.from_iterable(a)),
'column4' : df['colum2'].repeat(lens)
}).groupby('column3', as_index=False)['column4'].sum()
print (df)
column3 column4
0 a 80
1 b 70
2 c 70
3 f 40
4 g 50
5 z 50
Explanation :
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.