Merge columns with more than one value in pandas dataframe

Question

I've got this DataFrame in Python using pandas:

Column 1	Column 2	Column 3
hello	a,b,c	1,2,3
hi	b,c,a	4,5,6

The values in column 3 belong to the categories in column 2. Is there a way to combine columns 2 and 3 that I get this output?

Column 1	a	b	c
hello	1	2	3
hi	6	4	5

Any advise will be very helpful! Thank you!

Answer 1

You can use pd.crosstab after exploding the commas:

new_df = ( df.assign(t=df['Column 2'].str.split(','), a=df['Column 3'].str.split(',')).
                      explode(['t', 'a']) )

output = ( pd.crosstab(index=new_df['Column 1'], columns=new_df['t'], 
                     values=new_df['a'], aggfunc='sum').reset_index() )

Output:

t   Column 1    a   b   c
0   hello       1   2   3
1   hi          4   5   6

Answer 2

df.apply(lambda x: pd.Series(x['Column 3'].split(','), index=x['Column2'].split(',')), axis=1)

output:

    a   b   c
0   1   2   3
1   4   5   6

result make to df1 and concat

df1 = df.apply(lambda x: pd.Series(x['Column 3'].split(','), index=x['Column2'].split(',')), axis=1)

pd.concat([df['Column 1'], df1], axis=1)

output:

    col1    a   b   c
0   hello   1   2   3
1   hi      4   5   6

Answer 3

Efficiency wise, I'd say do all the wrangling in vanilla python and create a new dataframe:

from collections import defaultdict
outcome = defaultdict(list)
for column, row in zip(df['Column 2'], df['Column 3']):
    column = column.split(',')
    row = row.split(',')
    for first, last in zip(column, row):
        outcome[first].append(last)
pd.DataFrame(outcome).assign(Column = df['Column 1'])
   a  b  c Column
0  1  2  3  hello
1  6  4  5     hi

Merge columns with more than one value in pandas dataframe

Question

3 answers

solution1
2 2022-11-17 11:05:04

solution2
2 ACCPTED 2022-11-17 11:14:11

solution3
1 2022-11-17 12:01:53

Merge columns with more than one value in pandas dataframe

Question

3 answers

solution1 2 2022-11-17 11:05:04

solution2 2 ACCPTED 2022-11-17 11:14:11

solution3 1 2022-11-17 12:01:53

solution1
2 2022-11-17 11:05:04

solution2
2 ACCPTED 2022-11-17 11:14:11

solution3
1 2022-11-17 12:01:53