I have two tables (as Pandas' DataFrame), one is like
name | val |
---|---|
name1 | 0 |
name2 | 1 |
the other is
name | tag |
---|---|
name1 | tg1 |
name1 | tg2 |
name1 | tg3 |
name1 | tg3 |
name2 | kg1 |
name2 | kg1 |
name3 | other |
and I want to append a column to the first DataFrame collecting all values of the second table by name, ie
name | val | new_column |
---|---|---|
name1 | 0 | [tg1, tg2, tg3, tg3] |
name2 | 1 | [kg1, kg1] |
I know I can use row-wise operation to achieve this, but is there a way that I can use inbuilt Pandas' methods to do this? If I want to remove duplicates of the collected array in new_column at the same time, what method should I use?
Use DataFrame.join
with aggregate list
s:
df = df1.join(df2.groupby('name')['tag'].agg(list).rename('new_column'), on='name')
print (df)
name val new_column
0 name1 0 [tg1, tg2, tg3, tg3]
1 name2 1 [kg1, kg1]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.