[英]more than one value with comma in dataframe pandas
colum 1, colum2 a,b,c 30 b,c,f 40 a,g,z 50 . . .
Using above dataframe with col1,2 I'd like to have dataframe as below dataframe with col3, 4. Additionally, col1 consists of values with commas. 使用带有col1,2的上述数据帧,我希望将数据帧与col3,4一起放在数据帧下面。此外,col1由带逗号的值组成。 col4 consists of sum of col2 following col3. col4由col3之后的col2之和组成。 column3, column4 a 80 b 70 c 70 f 40 g 50 z 50
Use: 使用:
df = (df.set_index('colum2')['colum1']
.str.split(',', expand=True)
.stack()
.reset_index(name='column3')
.groupby('column3', as_index=False)['colum2']
.sum()
.rename(columns={'colum2':'column4'})
)
print (df)
column3 column4
0 a 80
1 b 70
2 c 70
3 f 40
4 g 50
5 z 50
Explanation : 说明 :
set_index
by column colum2
第一个set_index
按列colum2
DataFrame
by split
通过split
创建DataFrame
stack
通过stack
重塑 reset_index
按reset_index
按列创建索引 groupby
and aggregate sum
groupby
和sum
Another solution: 另一种方案:
from itertools import chain
a = df['colum1'].str.split(',')
lens = a.str.len()
df = pd.DataFrame({
'column3' : list(chain.from_iterable(a)),
'column4' : df['colum2'].repeat(lens)
}).groupby('column3', as_index=False)['column4'].sum()
print (df)
column3 column4
0 a 80
1 b 70
2 c 70
3 f 40
4 g 50
5 z 50
Explanation : 说明 :
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.