简体   繁体   English

如何在 R 中创建一个新列来匹配来自两个不同数据框中的多个值

[英]How to create a new column in R which matches multiple values from two different data frames

I have 2 data frames with thousands of variables.我有 2 个包含数千个变量的数据框。

One has students of different ages and the different teachers that evaluated them.其中一个有不同年龄的学生和不同的老师对他们进行评估。 All teachers evaluated multiple different students but not every student.所有老师都评估了多个不同的学生,但不是每个学生。

Teacher Student Age
0123    1       7
0145    1       7
0163    1       7
0175    2       8
0123    2       8
0194    2       8
0123    3       7 
0145    3       7

Then I have the teacher's ratings for specific stereotypes regarding the different ages.然后我有老师对不同年龄的特定刻板印象的评分。 Each teacher made on rating for each age group stereotype.每位教师对每个年龄组的刻板印象进行评分。 The data frame looks like this.数据框看起来像这样。

Teacher Age 7   Age 8  Age 9
0123    1       1      1
0145    5       7      3
0163    4       7      1
0175    6       8      1
0183    3       8      1
0194    2       8      1
0120    3       7      4

I want to create a new column in the first data frame where the teachers in each row are matched, and the values are their stereotype response depending on the age of each student.我想在第一个数据框中创建一个新列,其中每一行中的教师都匹配,并且值是他们根据每个学生的年龄而定的刻板印象。 For example, in this new column, the value in the first row would be teacher 123's stereotype response for 7 year olds.例如,在这个新列中,第一行中的值将是 123 老师对 7 岁儿童的刻板印象。 In this case that is a 1.在这种情况下,它是 1。

Thank you so much for your help.非常感谢你的帮助。 I'm new to R and I have no idea where to start with this.我是 R 的新手,我不知道从哪里开始。

Edit: I would like the output to look like this:编辑:我希望输出看起来像这样:

Teacher Student Age AgeStereotype
0123    1       7   1
0145    1       7   5
0163    1       7   4
0175    2       8   8
0123    2       8   1
0194    2       8   8
0123    3       7   1
0145    3       7   5
AS <- apply(DF1[,c("Teacher", "Age")], 1, function(x) {
    DF2[which(DF2$Teacher == x[1]), which(grepl(x[2], names(DF2)))]
    })
DF1["AgeStereotype"] <- AS

with DF1 and DF2 your first and second data frames, respectively. DF1DF2您的第一个和第二个数据帧。

Output:输出:

  Teacher Student Age AgeStereotype
1     123       1   7             1
2     145       1   7             5
3     163       1   7             4
4     175       2   8             8
5     123       2   8             1
6     194       2   8             8
7     123       3   7             1
8     145       3   7             5

You could use [ ie:你可以使用[即:

transform(df1,AgeStereotype = `rownames<-`(df2,df2$Teacher)[cbind(Teacher,paste("Age",Age))])

  Teacher Student Age AgeStereotype
1     123       1   7             1
2     145       1   7             5
3     163       1   7             4
4     175       2   8             8
5     123       2   8             1
6     194       2   8             8
7     123       3   7             1
8     145       3   7             5

This task is best solved by transforming your second dataframe to a long dataframe and then joining it to your first dataframe.最好通过将第二个数据帧转换为长数据帧,然后将其加入第一个数据帧来解决此任务。 Many ways exist to accomplish this in R , here is a clean way to do it within the tidyverse , specifically with dplyr and tidyr functions.R很多方法可以实现这一点,这里有一种在tidyverse实现的简洁方法,特别是使用dplyrtidyr函数。

# Recreating your data
df1 <- tibble::tribble(
  ~Teacher, ~Student, ~Age,
   "0123",    1,       7,
   "0145",    1,       7,
   "0163",    1,       7,
   "0175",    2,       8,
   "0123",    2,       8,
   "0194",    2,       8,
   "0123",    3,       7,
   "0145",    3,       7
  )

df2 <- tibble::tribble(
  ~Teacher, ~Age.7, ~Age.8, ~Age.9,
     "0123",    1,       1,      1,
     "0145",    5,       7,      3,
     "0163",    4,       7,      1,
     "0175",    6,       8,      1,
     "0183",    3,       8,      1,
     "0194",    2,       8,      1,
     "0120",    3,       7,      4
  )

# Load necessary libs
library(dplyr, warn.conflicts = FALSE)
library(tidyr)

tidyr::pivot_longer() transforms df2 to a long format and dplyr::mutate() with gsub() and as.numeric() are used to shave of the residue from the variable names and convert to a dbl. tidyr::pivot_longer()将 df2 转换为长格式, dplyr::mutate()gsub()as.numeric()用于从变量名称中as.numeric()残基并转换为dbl.

df2_long <-
  df2 %>%
  pivot_longer(Age.7:Age.9,
               names_to = "Age",
               values_to = "AgeStereotype") %>%
  mutate(Age = as.numeric(gsub("Age.", "", Age)))

dplyr::left_join() combines the datasets, only keeping those teachers, that have a row in df1 . dplyr::left_join()组合数据集,只保留那些在df1中有一行的教师。

left_join(df1, df2_long)
#> Joining, by = c("Teacher", "Age")
#> # A tibble: 8 x 4
#>   Teacher Student   Age AgeStereotype
#>   <chr>     <dbl> <dbl>         <dbl>
#> 1 0123          1     7             1
#> 2 0145          1     7             5
#> 3 0163          1     7             4
#> 4 0175          2     8             8
#> 5 0123          2     8             1
#> 6 0194          2     8             8
#> 7 0123          3     7             1
#> 8 0145          3     7             5

Another base way:另一种base方式:

merge(
  df1,
  data.frame(Teacher = df2$Teacher, 
             Age = gsub("[[:alpha:]]", "", stack(df2[,-1])[,2]),
             AgeStereotype = stack(df2[,-1])[,1]
  )
)

Output:输出:

  Teacher Age Student AgeStereotype
1     123   7       1             1
2     123   7       3             1
3     123   8       2             1
4     145   7       1             5
5     145   7       3             5
6     163   7       1             4
7     175   8       2             8
8     194   8       2             8

This does change the original order though, and even if it is possible to amend this without additional packages, if it is important perhaps easiest is to just let dplyr do the join:这确实改变了原始顺序,即使可以在没有附加包的情况下修改它,如果重要的话,也许最简单的就是让dplyr进行连接:

dplyr::left_join(
  df1,
  data.frame(Teacher = df2$Teacher, 
             Age = as.integer(gsub("[[:alpha:]]", "", stack(df2[,-1])[,2])),
             AgeStereotype = as.integer(stack(df2[,-1])[,1]), stringsAsFactors = FALSE
  )
)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在R中多个数据框中的一列中创建唯一值列表 - Create a list of unique values from a column in multiple data frames in R 如何比较r中两个数据帧的值,并在R中创建一个新的数据帧 - how to compare the values of two data frames in r and create a new data frame as a result in R R:匹配两个数据框并在其中创建新列 - R: match two data frames and create new column in one 根据来自两个 data.frames 的匹配值在 R 中创建新列表 - Create new list in R based on matching values from two data.frames 如何比较R中不同数据帧中的值(按列值)? - How to compare values in different data frames in R (by column values)? R-从现有列值创建和命名数据框 - R - create and name data frames from existing column values R:如何从其他两个数据帧创建新的数据帧 - R: How to create a new data frame from two other data frames 如何从两个不同的数据框创建 plot(或者如何组合具有相同列名的数据框) - How do you create a plot from two different data frames (or how do you combine data frames with identical column names) 我如何 将三个数据帧统一为一个,b。 创建一个新列以标识来自不同数据帧的变量? - How do I a. unify three data frames into one, b. create a new column to identify variables from different data frames? 如何在 R 中组合来自不同数据帧的两个变量? - How to combine two variables from different data frames in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM