简体   繁体   English

将新列添加到来自另一个数据框的长数据框?

[英]Add new column to long dataframe from another dataframe?

Say that I have two dataframes. 假设我有两个数据框。 I have one that lists the names of soccer players, teams that they have played for, and the number of goals that they have scored on each team. 我有一个列出足球运动员的姓名,他们参加过的球队以及他们在每个球队进球的进球数的人。 Then I also have a dataframe that contains the soccer players ages and their names. 然后,我还有一个数据框,其中包含足球运动员的年龄及其姓名。 How do I add an "names_age" column to the goal dataframe that is the age column for the players in the first column "names", not for "teammates_names"? 如何在目标数据框的“年龄”列中向目标数据框添加“ names_age”列,而不在“ teammates_names”列中添加球员的年龄列? How do I add an additional column that is the teammates' ages column? 如何添加队友的“年龄”列? In short, I'd like two age columns: one for the first set of players and one for the second set. 简而言之,我想要两个年龄列:一个用于第一组球员,一个用于第二组球员。

> AGE_DF

  names age
1   Sam  20
2   Jon  21
3  Adam  22
4  Jason 23
5  Jones 24
6  Jermaine 25

> GOALS_DF
   names goals      team teammates_names teammates_goals teammates_team
1    Sam     1       USA           Jason               1        HOLLAND
2    Sam     2   ENGLAND           Jason               2       PORTUGAL
3    Sam     3    BRAZIL           Jason               3          GHANA
4    Sam     4   GERMANY           Jason               4       COLOMBIA
5    Sam     5 ARGENTINA           Jason               5         CANADA
6    Jon     1       USA           Jones               1        HOLLAND
7    Jon     2   ENGLAND           Jones               2       PORTUGAL
8    Jon     3    BRAZIL           Jones               3          GHANA
9    Jon     4   GERMANY           Jones               4       COLOMBIA
10   Jon     5 ARGENTINA           Jones               5         CANADA
11  Adam     1       USA        Jermaine               1        HOLLAND
12  Adam     1   ENGLAND        Jermaine               1       PORTUGAL
13  Adam     4    BRAZIL        Jermaine               4          GHANA
14  Adam     3   GERMANY        Jermaine               3       COLOMBIA
15  Adam     2 ARGENTINA        Jermaine               2         CANADA

What I have tried: I've successfully got this to work using a for loop. 我尝试过的事情:我已经成功使用for循环使它工作了。 The actual data that I am working with have thousands of rows, and this takes a long time. 我正在使用的实际数据有数千行,这需要很长时间。 I would like a vectorized approach but I'm having trouble coming up with a way to do that. 我想要向量化的方法,但是在想办法做到这一点时遇到了麻烦。

Try merge or match . 尝试mergematch

  1. Here's merge (which is likely to screw up your row ordering and can sometimes be slow): 这是merge (可能会加重您的行顺序,有时可能会很慢):

     merge(AGE_DF, GOALS_DF, all = TRUE) 
  2. Here's match , which makes use of basic indexing and subsetting. 这是match ,它利用基本的索引和子集。 Assign the result to a new column, of course. 当然,将结果分配给新列。

     AGE_DF$age[match(GOALS_DF$names, AGE_DF$names)] 

Here's another option to consider: Convert your dataset into a long format first, and then do the merge. 这是要考虑的另一个选项:首先将数据集转换为长格式,然后进行合并。 Here, I've done it with melt and "data.table": 在这里,我已经完成了melt和“ data.table”:

library(reshape2)
library(data.table)
setkey(melt(as.data.table(GOALS_DF, keep.rownames = TRUE), 
            measure.vars = c("names", "teammates_names"), 
            value.name = "names"), names)[as.data.table(AGE_DF)]
#     rn goals      team teammates_goals teammates_team        variable    names age
#  1:  1     1       USA               1        HOLLAND           names      Sam  20
#  2:  2     2   ENGLAND               2       PORTUGAL           names      Sam  20
#  3:  3     3    BRAZIL               3          GHANA           names      Sam  20
#  4:  4     4   GERMANY               4       COLOMBIA           names      Sam  20
#  5:  5     5 ARGENTINA               5         CANADA           names      Sam  20
#  6:  6     1       USA               1        HOLLAND           names      Jon  21
## <<SNIP>>
# 28: 13     4    BRAZIL               4          GHANA teammates_names Jermaine  25
# 29: 14     3   GERMANY               3       COLOMBIA teammates_names Jermaine  25
# 30: 15     2 ARGENTINA               2         CANADA teammates_names Jermaine  25
#     rn goals      team teammates_goals teammates_team        variable    names age

I've added the rownames so you can you can use dcast to get back to the wide format and retain the row ordering if it's important. 我已经添加了行名,以便您可以使用dcast返回宽格式并保留行顺序(如果重要的话)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一个 dataframe 的多个列向 dataframe 添加新列 - Add a new column to a dataframe based on multiple columns from another dataframe 根据 R 中 dataframe 的另一列的相等值,在新列(在第一个数据帧中)中添加值(来自第二个数据帧) - Add value (from 2nd dataframe) in new column (in 1st dataframe) based on equality value of another column from both dataframe in R R - 使用另一个数据框的匹配值向数据框添加新列 - R - Add a new column to a dataframe using matching values of another dataframe 根据另一个数据框中的列在一个数据框中创建新列 - Creating new column in one dataframe based on column from another dataframe 如何将 1 个数据框中的值分配给另一个数据框中的新列 - how to assign values from 1 dataframe to a new column in another dataframe 在数据框中添加新列,计算来自另一个数据帧的行 - Adding new column in dataframe counting rows from another dataframe 通过匹配来自另一个 dataframe 的 id 添加一列 - Add a column by matching id from another dataframe 根据来自另一个 dataframe 的值向 dataframe 添加一列 - Add a column to dataframe based on values from another dataframe 将数据框的列名或从 R 对象添加到另一个数据框 - Add column names of a dataframe or from an R object to another dataframe 基于另一列在 r dataframe 中添加新的计算列 - Add new calculated column in a r dataframe based on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM