[英]How to append a column to a DataFrame that collects values of another DataFrame in Python?
I have two tables (as Pandas' DataFrame), one is like我有两个表(作为 Pandas 的 DataFrame),一个就像
name![]() |
val![]() |
---|---|
name1![]() |
0 ![]() |
name2![]() |
1 ![]() |
the other is另一个是
name![]() |
tag![]() |
---|---|
name1![]() |
tg1 ![]() |
name1![]() |
tg2 ![]() |
name1![]() |
tg3 ![]() |
name1![]() |
tg3 ![]() |
name2![]() |
kg1![]() |
name2![]() |
kg1![]() |
name3![]() |
other![]() |
and I want to append a column to the first DataFrame collecting all values of the second table by name, ie我想 append 一列到第一个 DataFrame 按名称收集第二个表的所有值,即
name![]() |
val![]() |
new_column![]() |
---|---|---|
name1![]() |
0 ![]() |
[tg1, tg2, tg3, tg3] ![]() |
name2![]() |
1 ![]() |
[kg1, kg1] ![]() |
I know I can use row-wise operation to achieve this, but is there a way that I can use inbuilt Pandas' methods to do this?我知道我可以使用逐行操作来实现这一点,但是有没有一种方法可以使用内置的 Pandas 方法来做到这一点? If I want to remove duplicates of the collected array in new_column at the same time, what method should I use?
如果我想同时去除new_column中collected数组的重复项,应该用什么方法呢?
Use DataFrame.join
with aggregate list
s:将
DataFrame.join
与聚合list
一起使用:
df = df1.join(df2.groupby('name')['tag'].agg(list).rename('new_column'), on='name')
print (df)
name val new_column
0 name1 0 [tg1, tg2, tg3, tg3]
1 name2 1 [kg1, kg1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.