简体   繁体   中英

How to append a column to a DataFrame that collects values of another DataFrame in Python?

I have two tables (as Pandas' DataFrame), one is like

name val
name1 0
name2 1

the other is

name tag
name1 tg1
name1 tg2
name1 tg3
name1 tg3
name2 kg1
name2 kg1
name3 other

and I want to append a column to the first DataFrame collecting all values of the second table by name, ie

name val new_column
name1 0 [tg1, tg2, tg3, tg3]
name2 1 [kg1, kg1]

I know I can use row-wise operation to achieve this, but is there a way that I can use inbuilt Pandas' methods to do this? If I want to remove duplicates of the collected array in new_column at the same time, what method should I use?

Use DataFrame.join with aggregate list s:

df = df1.join(df2.groupby('name')['tag'].agg(list).rename('new_column'), on='name')
print (df)
    name  val            new_column
0  name1    0  [tg1, tg2, tg3, tg3]
1  name2    1            [kg1, kg1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM