简体   繁体   中英

Add new column to long dataframe from another dataframe?

Say that I have two dataframes. I have one that lists the names of soccer players, teams that they have played for, and the number of goals that they have scored on each team. Then I also have a dataframe that contains the soccer players ages and their names. How do I add an "names_age" column to the goal dataframe that is the age column for the players in the first column "names", not for "teammates_names"? How do I add an additional column that is the teammates' ages column? In short, I'd like two age columns: one for the first set of players and one for the second set.

> AGE_DF

  names age
1   Sam  20
2   Jon  21
3  Adam  22
4  Jason 23
5  Jones 24
6  Jermaine 25

> GOALS_DF
   names goals      team teammates_names teammates_goals teammates_team
1    Sam     1       USA           Jason               1        HOLLAND
2    Sam     2   ENGLAND           Jason               2       PORTUGAL
3    Sam     3    BRAZIL           Jason               3          GHANA
4    Sam     4   GERMANY           Jason               4       COLOMBIA
5    Sam     5 ARGENTINA           Jason               5         CANADA
6    Jon     1       USA           Jones               1        HOLLAND
7    Jon     2   ENGLAND           Jones               2       PORTUGAL
8    Jon     3    BRAZIL           Jones               3          GHANA
9    Jon     4   GERMANY           Jones               4       COLOMBIA
10   Jon     5 ARGENTINA           Jones               5         CANADA
11  Adam     1       USA        Jermaine               1        HOLLAND
12  Adam     1   ENGLAND        Jermaine               1       PORTUGAL
13  Adam     4    BRAZIL        Jermaine               4          GHANA
14  Adam     3   GERMANY        Jermaine               3       COLOMBIA
15  Adam     2 ARGENTINA        Jermaine               2         CANADA

What I have tried: I've successfully got this to work using a for loop. The actual data that I am working with have thousands of rows, and this takes a long time. I would like a vectorized approach but I'm having trouble coming up with a way to do that.

Try merge or match .

  1. Here's merge (which is likely to screw up your row ordering and can sometimes be slow):

     merge(AGE_DF, GOALS_DF, all = TRUE) 
  2. Here's match , which makes use of basic indexing and subsetting. Assign the result to a new column, of course.

     AGE_DF$age[match(GOALS_DF$names, AGE_DF$names)] 

Here's another option to consider: Convert your dataset into a long format first, and then do the merge. Here, I've done it with melt and "data.table":

library(reshape2)
library(data.table)
setkey(melt(as.data.table(GOALS_DF, keep.rownames = TRUE), 
            measure.vars = c("names", "teammates_names"), 
            value.name = "names"), names)[as.data.table(AGE_DF)]
#     rn goals      team teammates_goals teammates_team        variable    names age
#  1:  1     1       USA               1        HOLLAND           names      Sam  20
#  2:  2     2   ENGLAND               2       PORTUGAL           names      Sam  20
#  3:  3     3    BRAZIL               3          GHANA           names      Sam  20
#  4:  4     4   GERMANY               4       COLOMBIA           names      Sam  20
#  5:  5     5 ARGENTINA               5         CANADA           names      Sam  20
#  6:  6     1       USA               1        HOLLAND           names      Jon  21
## <<SNIP>>
# 28: 13     4    BRAZIL               4          GHANA teammates_names Jermaine  25
# 29: 14     3   GERMANY               3       COLOMBIA teammates_names Jermaine  25
# 30: 15     2 ARGENTINA               2         CANADA teammates_names Jermaine  25
#     rn goals      team teammates_goals teammates_team        variable    names age

I've added the rownames so you can you can use dcast to get back to the wide format and retain the row ordering if it's important.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM