简体   繁体   English

根据R中的查找值将值分配给不同的列

[英]Assign value to different columns based on lookup values in R

I'm trying to assign values to different columns, separately for each row, based on lookup values. 我试图基于查找值将值分配给不同的列,分别为每一行。 I'm working in R. Here's a minimal working example: 我正在R中工作。这是一个最小的工作示例:

#Item scores
item1 <- c(NA, 1, NA, 4)
item2 <- c(NA, 2, NA, 3)
item3 <- c(NA, 3, NA, NA)
item57 <- c(NA, 4, 4, 1)

mydata <- data.frame(item1, item2, item3, item57)

#Lookup values based on item score
lookup <- data.frame(score = 1:4, value=c(6, 7, 8, 10))

I have many participants (ie, rows) assessed with a score on each of many items (ie, columns). 我有很多参与者(即,行)以许多项目(即,列)中的每一项得分来评估。 I'd like to create variables in my data frame for the values that are tied to the item scores (based on the lookup table). 我想在数据框中为与项目得分相关的值创建变量(基于查找表)。 Here's my desired output: 这是我想要的输出:

#Desired output (adding value that is tied to item score to the original data)
desiredOutput <- cbind(mydata,
                   value1 = c(NA, 6, NA, 10),
                   value2 = c(NA, 7, NA, 8),
                   value3 = c(NA, 8, NA, NA),
                   value57 = c(NA, 10, 10, 6))

I have a fairly large dataset and would like to stay away from loops, if possible. 我有一个相当大的数据集,如果可能的话,我想远离循环。 Also, we can skip rows with all NAs, if it's faster to process. 此外,如果处理速度更快,我们可以跳过所有NA的行。

here's a tidyverse method. 这是一个整理方法。 The basis of it is that you want to first gather the score columns and left_join the lookup table so that you have your values matched to scores. 其基础是您要首先gather分数列,并left_join查找表,以便您的值与分数匹配。 Then the rest is just manipulation to get back back to the desired output format. 然后剩下的就是操纵,以返回到所需的输出格式。 To do this, we need to create the column names that we want with gather and unite , and then finally spread back out. 为此,我们需要使用gatherunite创建所需的列名,然后再将其spread Note that you need rowid_to_column at the beginning so that spread will know what observations to place on what rows. 请注意,您一开始需要rowid_to_column ,以便spread会知道将哪些观察值放置在哪些行上。 If you want to exactly get your output column names, you can mix in some stringr . 如果要精确获取输出列名称,可以混入一些stringr

item1 <- c(NA, 1, NA, 4)
item2 <- c(NA, 2, NA, 3)
item3 <- c(NA, 3, NA, NA)
item57 <- c(NA, 4, 4, 1)

mydata <- data.frame(item1, item2, item3, item57)

#Lookup values based on item score
lookup <- data.frame(score = 1:4, value=c(6, 7, 8, 10))

library(tidyverse)
mydata %>%
  rowid_to_column(var = "participant") %>%
  gather(items, score, starts_with("item")) %>%
  left_join(lookup) %>%
  gather(coltype, val, score:value) %>%
  unite(colname, coltype, items) %>%
  spread(colname, val)
#> Joining, by = "score"
#>   participant score_item1 score_item2 score_item3 score_item57 value_item1
#> 1           1          NA          NA          NA           NA          NA
#> 2           2           1           2           3            4           6
#> 3           3          NA          NA          NA            4          NA
#> 4           4           4           3          NA            1          10
#>   value_item2 value_item3 value_item57
#> 1          NA          NA           NA
#> 2           7           8           10
#> 3          NA          NA           10
#> 4           8          NA            6

Created on 2018-06-19 by the reprex package (v0.2.0). reprex软件包 (v0.2.0)于2018-06-19创建。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM