将新列添加到来自另一个数据框的长数据框？

Question

Say that I have two dataframes. 假设我有两个数据框。 I have one that lists the names of soccer players, teams that they have played for, and the number of goals that they have scored on each team. 我有一个列出足球运动员的姓名，他们参加过的球队以及他们在每个球队进球的进球数的人。 Then I also have a dataframe that contains the soccer players ages and their names. 然后，我还有一个数据框，其中包含足球运动员的年龄及其姓名。 How do I add an "names_age" column to the goal dataframe that is the age column for the players in the first column "names", not for "teammates_names"? 如何在目标数据框的“年龄”列中向目标数据框添加“ names_age”列，而不在“ teammates_names”列中添加球员的年龄列？ How do I add an additional column that is the teammates' ages column? 如何添加队友的“年龄”列？ In short, I'd like two age columns: one for the first set of players and one for the second set. 简而言之，我想要两个年龄列：一个用于第一组球员，一个用于第二组球员。

> AGE_DF

  names age
1   Sam  20
2   Jon  21
3  Adam  22
4  Jason 23
5  Jones 24
6  Jermaine 25

> GOALS_DF
   names goals      team teammates_names teammates_goals teammates_team
1    Sam     1       USA           Jason               1        HOLLAND
2    Sam     2   ENGLAND           Jason               2       PORTUGAL
3    Sam     3    BRAZIL           Jason               3          GHANA
4    Sam     4   GERMANY           Jason               4       COLOMBIA
5    Sam     5 ARGENTINA           Jason               5         CANADA
6    Jon     1       USA           Jones               1        HOLLAND
7    Jon     2   ENGLAND           Jones               2       PORTUGAL
8    Jon     3    BRAZIL           Jones               3          GHANA
9    Jon     4   GERMANY           Jones               4       COLOMBIA
10   Jon     5 ARGENTINA           Jones               5         CANADA
11  Adam     1       USA        Jermaine               1        HOLLAND
12  Adam     1   ENGLAND        Jermaine               1       PORTUGAL
13  Adam     4    BRAZIL        Jermaine               4          GHANA
14  Adam     3   GERMANY        Jermaine               3       COLOMBIA
15  Adam     2 ARGENTINA        Jermaine               2         CANADA

What I have tried: I've successfully got this to work using a for loop. 我尝试过的事情：我已经成功使用for循环使它工作了。 The actual data that I am working with have thousands of rows, and this takes a long time. 我正在使用的实际数据有数千行，这需要很长时间。 I would like a vectorized approach but I'm having trouble coming up with a way to do that. 我想要向量化的方法，但是在想办法做到这一点时遇到了麻烦。

Answer 1

Try merge or match . 尝试merge或match 。

Here's merge (which is likely to screw up your row ordering and can sometimes be slow): 这是merge （可能会加重您的行顺序，有时可能会很慢）：
```
 merge(AGE_DF, GOALS_DF, all = TRUE) 
```
Here's match , which makes use of basic indexing and subsetting. 这是match ，它利用基本的索引和子集。 Assign the result to a new column, of course. 当然，将结果分配给新列。
```
 AGE_DF$age[match(GOALS_DF$names, AGE_DF$names)] 
```

Here's another option to consider: Convert your dataset into a long format first, and then do the merge. 这是要考虑的另一个选项：首先将数据集转换为长格式，然后进行合并。 Here, I've done it with melt and "data.table": 在这里，我已经完成了melt和“ data.table”：

library(reshape2)
library(data.table)
setkey(melt(as.data.table(GOALS_DF, keep.rownames = TRUE), 
            measure.vars = c("names", "teammates_names"), 
            value.name = "names"), names)[as.data.table(AGE_DF)]
#     rn goals      team teammates_goals teammates_team        variable    names age
#  1:  1     1       USA               1        HOLLAND           names      Sam  20
#  2:  2     2   ENGLAND               2       PORTUGAL           names      Sam  20
#  3:  3     3    BRAZIL               3          GHANA           names      Sam  20
#  4:  4     4   GERMANY               4       COLOMBIA           names      Sam  20
#  5:  5     5 ARGENTINA               5         CANADA           names      Sam  20
#  6:  6     1       USA               1        HOLLAND           names      Jon  21
## <<SNIP>>
# 28: 13     4    BRAZIL               4          GHANA teammates_names Jermaine  25
# 29: 14     3   GERMANY               3       COLOMBIA teammates_names Jermaine  25
# 30: 15     2 ARGENTINA               2         CANADA teammates_names Jermaine  25
#     rn goals      team teammates_goals teammates_team        variable    names age

I've added the rownames so you can you can use dcast to get back to the wide format and retain the row ordering if it's important. 我已经添加了行名，以便您可以使用dcast返回宽格式并保留行顺序（如果重要的话）。

将新列添加到来自另一个数据框的长数据框？

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-11-20 16:29:08

将新列添加到来自另一个数据框的长数据框？

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-11-20 16:29:08

解决方案1
1 已采纳 2014-11-20 16:29:08