[英]Add new column to long dataframe from another dataframe?
Say that I have two dataframes. 假设我有两个数据框。 I have one that lists the names of soccer players, teams that they have played for, and the number of goals that they have scored on each team.
我有一个列出足球运动员的姓名,他们参加过的球队以及他们在每个球队进球的进球数的人。 Then I also have a dataframe that contains the soccer players ages and their names.
然后,我还有一个数据框,其中包含足球运动员的年龄及其姓名。 How do I add an "names_age" column to the goal dataframe that is the age column for the players in the first column "names", not for "teammates_names"?
如何在目标数据框的“年龄”列中向目标数据框添加“ names_age”列,而不在“ teammates_names”列中添加球员的年龄列? How do I add an additional column that is the teammates' ages column?
如何添加队友的“年龄”列? In short, I'd like two age columns: one for the first set of players and one for the second set.
简而言之,我想要两个年龄列:一个用于第一组球员,一个用于第二组球员。
> AGE_DF
names age
1 Sam 20
2 Jon 21
3 Adam 22
4 Jason 23
5 Jones 24
6 Jermaine 25
> GOALS_DF
names goals team teammates_names teammates_goals teammates_team
1 Sam 1 USA Jason 1 HOLLAND
2 Sam 2 ENGLAND Jason 2 PORTUGAL
3 Sam 3 BRAZIL Jason 3 GHANA
4 Sam 4 GERMANY Jason 4 COLOMBIA
5 Sam 5 ARGENTINA Jason 5 CANADA
6 Jon 1 USA Jones 1 HOLLAND
7 Jon 2 ENGLAND Jones 2 PORTUGAL
8 Jon 3 BRAZIL Jones 3 GHANA
9 Jon 4 GERMANY Jones 4 COLOMBIA
10 Jon 5 ARGENTINA Jones 5 CANADA
11 Adam 1 USA Jermaine 1 HOLLAND
12 Adam 1 ENGLAND Jermaine 1 PORTUGAL
13 Adam 4 BRAZIL Jermaine 4 GHANA
14 Adam 3 GERMANY Jermaine 3 COLOMBIA
15 Adam 2 ARGENTINA Jermaine 2 CANADA
What I have tried: I've successfully got this to work using a for loop. 我尝试过的事情:我已经成功使用for循环使它工作了。 The actual data that I am working with have thousands of rows, and this takes a long time.
我正在使用的实际数据有数千行,这需要很长时间。 I would like a vectorized approach but I'm having trouble coming up with a way to do that.
我想要向量化的方法,但是在想办法做到这一点时遇到了麻烦。
Try merge
or match
. 尝试
merge
或match
。
Here's merge
(which is likely to screw up your row ordering and can sometimes be slow): 这是
merge
(可能会加重您的行顺序,有时可能会很慢):
merge(AGE_DF, GOALS_DF, all = TRUE)
Here's match
, which makes use of basic indexing and subsetting. 这是
match
,它利用基本的索引和子集。 Assign the result to a new column, of course. 当然,将结果分配给新列。
AGE_DF$age[match(GOALS_DF$names, AGE_DF$names)]
Here's another option to consider: Convert your dataset into a long format first, and then do the merge. 这是要考虑的另一个选项:首先将数据集转换为长格式,然后进行合并。 Here, I've done it with
melt
and "data.table": 在这里,我已经完成了
melt
和“ data.table”:
library(reshape2)
library(data.table)
setkey(melt(as.data.table(GOALS_DF, keep.rownames = TRUE),
measure.vars = c("names", "teammates_names"),
value.name = "names"), names)[as.data.table(AGE_DF)]
# rn goals team teammates_goals teammates_team variable names age
# 1: 1 1 USA 1 HOLLAND names Sam 20
# 2: 2 2 ENGLAND 2 PORTUGAL names Sam 20
# 3: 3 3 BRAZIL 3 GHANA names Sam 20
# 4: 4 4 GERMANY 4 COLOMBIA names Sam 20
# 5: 5 5 ARGENTINA 5 CANADA names Sam 20
# 6: 6 1 USA 1 HOLLAND names Jon 21
## <<SNIP>>
# 28: 13 4 BRAZIL 4 GHANA teammates_names Jermaine 25
# 29: 14 3 GERMANY 3 COLOMBIA teammates_names Jermaine 25
# 30: 15 2 ARGENTINA 2 CANADA teammates_names Jermaine 25
# rn goals team teammates_goals teammates_team variable names age
I've added the rownames so you can you can use dcast
to get back to the wide format and retain the row ordering if it's important. 我已经添加了行名,以便您可以使用
dcast
返回宽格式并保留行顺序(如果重要的话)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.