简体   繁体   中英

match value to a column name to create new variable in R

I have a dataset that looks like this

    students <- data.frame(name = c("student1", "student2", "student3", "student4"),
                   test1 = c(50, 30, 20, 6),
                   test2 = c(30, 20, 15, 10),
                   select = c("test2", "test1", "test2", "test1"))

Is there a way to create a new variable called 'grade' that will contain the test score of whatever variable appears in 'select'?

Sample output here:

    students <- data.frame(name = c("student1", "student2", "student3", "student4"),
                   test1 = c(50, 30, 20, 6),
                   test2 = c(30, 20, 15, 10),
                   select = c("test2", "test1", "test2", "test1"),
                   grade = c(30, 30, 15, 6))

Here is a Base R solution:

students$value = with(students, ifelse(select == 'test1', test1, test2))

or with case_when from dplyr :

library(dplyr)

students %>%
  mutate(value = case_when(
    select == 'test1' ~ test1, 
    TRUE ~ test2))

This solution also extends to n tests.

Result:

      name test1 test2 select value
1 student1    50    30  test2    30
2 student2    30    20  test1    30
3 student3    20    15  test2    15
4 student4     6    10  test1     6

If you want to do this in base R, and you only have a small number of tests to select from, you can do this with the code:

students$Grade[students$select=="test1"] <- as.numeric(students$test1[students$select=="test1"])
students$Grade[students$select=="test2"] <- as.numeric(students$test2[students$select=="test2"])

Result:

      name test1 test2 select Grade
1 student1    50    30  test2    30
2 student2    30    20  test1    30
3 student3    20    15  test2    15
4 student4     6    10  test1     6

Here is a simple base-R solution...

students$grade <- sapply(1:nrow(students),
                         function(i) students[i, as.character(students$select[i])])

students
      name test1 test2 select grade
1 student1    50    30  test2    30
2 student2    30    20  test1    30
3 student3    20    15  test2    15
4 student4     6    10  test1     6

Or, to answer the follow up question in the comments - to cope with entries like "test1, test2" , you could do

students$grade <- sapply(1:nrow(students),
                  function(i) paste(students[i,                                   
                                   trimws(unlist(strsplit(students$select[i], ",")))],
                                    collapse=", "))

This basically takes each row and splits select at commas, trims whitespace, and then pastes the resulting grade values together.

So, if students$select[1] is "test1, test2" in the above, this produces

students
      name test1 test2       select  grade
1 student1    50    30 test1, test2 50, 30
2 student2    30    20        test1     30
3 student3    20    15        test2     15
4 student4     6    10        test1      6

Note that the grade column will now be forced to character format

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM