[英]How to create a new column in R which matches multiple values from two different data frames
I have 2 data frames with thousands of variables.我有 2 个包含数千个变量的数据框。
One has students of different ages and the different teachers that evaluated them.其中一个有不同年龄的学生和不同的老师对他们进行评估。 All teachers evaluated multiple different students but not every student.
所有老师都评估了多个不同的学生,但不是每个学生。
Teacher Student Age
0123 1 7
0145 1 7
0163 1 7
0175 2 8
0123 2 8
0194 2 8
0123 3 7
0145 3 7
Then I have the teacher's ratings for specific stereotypes regarding the different ages.然后我有老师对不同年龄的特定刻板印象的评分。 Each teacher made on rating for each age group stereotype.
每位教师对每个年龄组的刻板印象进行评分。 The data frame looks like this.
数据框看起来像这样。
Teacher Age 7 Age 8 Age 9
0123 1 1 1
0145 5 7 3
0163 4 7 1
0175 6 8 1
0183 3 8 1
0194 2 8 1
0120 3 7 4
I want to create a new column in the first data frame where the teachers in each row are matched, and the values are their stereotype response depending on the age of each student.我想在第一个数据框中创建一个新列,其中每一行中的教师都匹配,并且值是他们根据每个学生的年龄而定的刻板印象。 For example, in this new column, the value in the first row would be teacher 123's stereotype response for 7 year olds.
例如,在这个新列中,第一行中的值将是 123 老师对 7 岁儿童的刻板印象。 In this case that is a 1.
在这种情况下,它是 1。
Thank you so much for your help.非常感谢你的帮助。 I'm new to R and I have no idea where to start with this.
我是 R 的新手,我不知道从哪里开始。
Edit: I would like the output to look like this:编辑:我希望输出看起来像这样:
Teacher Student Age AgeStereotype
0123 1 7 1
0145 1 7 5
0163 1 7 4
0175 2 8 8
0123 2 8 1
0194 2 8 8
0123 3 7 1
0145 3 7 5
AS <- apply(DF1[,c("Teacher", "Age")], 1, function(x) {
DF2[which(DF2$Teacher == x[1]), which(grepl(x[2], names(DF2)))]
})
DF1["AgeStereotype"] <- AS
with DF1
and DF2
your first and second data frames, respectively. DF1
和DF2
您的第一个和第二个数据帧。
Output:输出:
Teacher Student Age AgeStereotype
1 123 1 7 1
2 145 1 7 5
3 163 1 7 4
4 175 2 8 8
5 123 2 8 1
6 194 2 8 8
7 123 3 7 1
8 145 3 7 5
You could use [
ie:你可以使用
[
即:
transform(df1,AgeStereotype = `rownames<-`(df2,df2$Teacher)[cbind(Teacher,paste("Age",Age))])
Teacher Student Age AgeStereotype
1 123 1 7 1
2 145 1 7 5
3 163 1 7 4
4 175 2 8 8
5 123 2 8 1
6 194 2 8 8
7 123 3 7 1
8 145 3 7 5
This task is best solved by transforming your second dataframe to a long dataframe and then joining it to your first dataframe.最好通过将第二个数据帧转换为长数据帧,然后将其加入第一个数据帧来解决此任务。 Many ways exist to accomplish this in
R
, here is a clean way to do it within the tidyverse
, specifically with dplyr
and tidyr
functions.在
R
很多方法可以实现这一点,这里有一种在tidyverse
实现的简洁方法,特别是使用dplyr
和tidyr
函数。
# Recreating your data
df1 <- tibble::tribble(
~Teacher, ~Student, ~Age,
"0123", 1, 7,
"0145", 1, 7,
"0163", 1, 7,
"0175", 2, 8,
"0123", 2, 8,
"0194", 2, 8,
"0123", 3, 7,
"0145", 3, 7
)
df2 <- tibble::tribble(
~Teacher, ~Age.7, ~Age.8, ~Age.9,
"0123", 1, 1, 1,
"0145", 5, 7, 3,
"0163", 4, 7, 1,
"0175", 6, 8, 1,
"0183", 3, 8, 1,
"0194", 2, 8, 1,
"0120", 3, 7, 4
)
# Load necessary libs
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
tidyr::pivot_longer()
transforms df2 to a long format and dplyr::mutate()
with gsub()
and as.numeric()
are used to shave of the residue from the variable names and convert to a dbl.
tidyr::pivot_longer()
将 df2 转换为长格式, dplyr::mutate()
与gsub()
和as.numeric()
用于从变量名称中as.numeric()
残基并转换为dbl.
df2_long <-
df2 %>%
pivot_longer(Age.7:Age.9,
names_to = "Age",
values_to = "AgeStereotype") %>%
mutate(Age = as.numeric(gsub("Age.", "", Age)))
dplyr::left_join()
combines the datasets, only keeping those teachers, that have a row in df1
. dplyr::left_join()
组合数据集,只保留那些在df1
中有一行的教师。
left_join(df1, df2_long)
#> Joining, by = c("Teacher", "Age")
#> # A tibble: 8 x 4
#> Teacher Student Age AgeStereotype
#> <chr> <dbl> <dbl> <dbl>
#> 1 0123 1 7 1
#> 2 0145 1 7 5
#> 3 0163 1 7 4
#> 4 0175 2 8 8
#> 5 0123 2 8 1
#> 6 0194 2 8 8
#> 7 0123 3 7 1
#> 8 0145 3 7 5
Another base
way:另一种
base
方式:
merge(
df1,
data.frame(Teacher = df2$Teacher,
Age = gsub("[[:alpha:]]", "", stack(df2[,-1])[,2]),
AgeStereotype = stack(df2[,-1])[,1]
)
)
Output:输出:
Teacher Age Student AgeStereotype
1 123 7 1 1
2 123 7 3 1
3 123 8 2 1
4 145 7 1 5
5 145 7 3 5
6 163 7 1 4
7 175 8 2 8
8 194 8 2 8
This does change the original order though, and even if it is possible to amend this without additional packages, if it is important perhaps easiest is to just let dplyr
do the join:这确实改变了原始顺序,即使可以在没有附加包的情况下修改它,如果重要的话,也许最简单的就是让
dplyr
进行连接:
dplyr::left_join(
df1,
data.frame(Teacher = df2$Teacher,
Age = as.integer(gsub("[[:alpha:]]", "", stack(df2[,-1])[,2])),
AgeStereotype = as.integer(stack(df2[,-1])[,1]), stringsAsFactors = FALSE
)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.