How to append a column to a DataFrame that collects values of another DataFrame in Python?

Question

I have two tables (as Pandas' DataFrame), one is like

name	val
name1	0
name2	1

the other is

name	tag
name1	tg1
name1	tg2
name1	tg3
name1	tg3
name2	kg1
name2	kg1
name3	other

and I want to append a column to the first DataFrame collecting all values of the second table by name, ie

name	val	new_column
name1	0	[tg1, tg2, tg3, tg3]
name2	1	[kg1, kg1]

I know I can use row-wise operation to achieve this, but is there a way that I can use inbuilt Pandas' methods to do this? If I want to remove duplicates of the collected array in new_column at the same time, what method should I use?

Answer 1

Use DataFrame.join with aggregate list s:

df = df1.join(df2.groupby('name')['tag'].agg(list).rename('new_column'), on='name')
print (df)
    name  val            new_column
0  name1    0  [tg1, tg2, tg3, tg3]
1  name2    1            [kg1, kg1]

How to append a column to a DataFrame that collects values of another DataFrame in Python?

Question

1 answers

solution1
0 ACCPTED 2022-12-01 06:07:10

How to append a column to a DataFrame that collects values of another DataFrame in Python?

Question

1 answers

solution1 0 ACCPTED 2022-12-01 06:07:10

solution1
0 ACCPTED 2022-12-01 06:07:10