简体   繁体   中英

more than one value with comma in dataframe pandas

colum 1, colum2 a,b,c 30 b,c,f 40 a,g,z 50 . . . Using above dataframe with col1,2 I'd like to have dataframe as below dataframe with col3, 4. Additionally, col1 consists of values with commas. col4 consists of sum of col2 following col3. column3, column4 a 80 b 70 c 70 f 40 g 50 z 50

Use:

df = (df.set_index('colum2')['colum1']
        .str.split(',', expand=True)
        .stack()
        .reset_index(name='column3')
        .groupby('column3', as_index=False)['colum2']
        .sum()
        .rename(columns={'colum2':'column4'})
      )
print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation :

  1. First set_index by column colum2
  2. Create DataFrame by split
  3. Reshape by stack
  4. Create index by columns by reset_index
  5. groupby and aggregate sum
  6. Last rename column if necessary

Another solution:

from itertools import chain

a = df['colum1'].str.split(',')
lens = a.str.len()

df = pd.DataFrame({
    'column3' : list(chain.from_iterable(a)), 
    'column4' : df['colum2'].repeat(lens)
}).groupby('column3', as_index=False)['column4'].sum()

print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation :

  1. Create lists by split
  2. Get lengths of lsits by len
  3. Last repeat columns and flatten colum1
  4. groupby and aggregate sum

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM