more than one value with comma in dataframe pandas

Question

colum 1, colum2 a,b,c 30 b,c,f 40 a,g,z 50 . . . Using above dataframe with col1,2 I'd like to have dataframe as below dataframe with col3, 4. Additionally, col1 consists of values with commas. col4 consists of sum of col2 following col3. column3, column4 a 80 b 70 c 70 f 40 g 50 z 50

Answer 1

Use:

df = (df.set_index('colum2')['colum1']
        .str.split(',', expand=True)
        .stack()
        .reset_index(name='column3')
        .groupby('column3', as_index=False)['colum2']
        .sum()
        .rename(columns={'colum2':'column4'})
      )
print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation :

First set_index by column colum2
Create DataFrame by split
Reshape by stack
Create index by columns by reset_index
groupby and aggregate sum
Last rename column if necessary

Another solution:

from itertools import chain

a = df['colum1'].str.split(',')
lens = a.str.len()

df = pd.DataFrame({
    'column3' : list(chain.from_iterable(a)), 
    'column4' : df['colum2'].repeat(lens)
}).groupby('column3', as_index=False)['column4'].sum()

print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation :

Create lists by split
Get lengths of lsits by len
Last repeat columns and flatten colum1
groupby and aggregate sum

more than one value with comma in dataframe pandas

Question

1 answers

solution1
6 ACCPTED 2018-07-26 05:41:31

more than one value with comma in dataframe pandas

Question

1 answers

solution1 6 ACCPTED 2018-07-26 05:41:31

solution1
6 ACCPTED 2018-07-26 05:41:31