简体   繁体   English

基于查找向量的条件重编码

[英]Conditional recode based on lookup vector

I need to conditionally recode my dataframe d according to a lookup vector. 我需要根据查找向量有条件地重新编码我的数据帧d

dput(lookup)
structure(c("Apple", "Apple", "Banana", "Carrot"), .Names = c("101", "102", "102", "103"))
dput(d)
structure(list(pat = c(101, 101, 101, 102, 102, 103), gene = structure(1:6, .Label = c("a", 
"b", "c", "d", "e", "f"), class = "factor"), Apple = c(0.1, 0.2, 
0.3, 0.4, NA, NA), Banana = c(NA, NA, NA, NA, 0.55, NA), Carrot = c(NA, 
NA, NA, NA, NA, 0.6)), .Names = c("pat", "gene", "Apple", "Banana", 
"Carrot"), row.names = c(NA, -6L), class = "data.frame")

d is a wide dataframe that I got through reshape . d是我通过reshape获得的宽数据帧。 I need to recode any NAs within each of the columns Apple , Banana and Carrot to 0 if pat matches that column according to the lookup table. 如果pat根据查找表匹配该列,我需要将每个AppleBananaCarrot列中的任何NAs重新编码为0 In this case, d$Apple[5] and d$Banana[4] would be recoded to 0 . 在这种情况下, d$Apple[5]d$Banana[4]将被重新编码为0

I've been toying with recode from dplyr but I have no idea how to get it to lookup and recode, not to mention that it has to be done on multiple columns... There was another related post on recoding variables in R with a lookup table but it can't seem to apply to my problem. 我一直在dplyr recode ,但是我不知道如何让它进行查找和重新编码,更不用说它必须在多个列上完成...还有一个关于在R中重新编码变量的相关帖子查找表,但它似乎无法应用于我的问题。 Can anyone help me pls? 任何人都可以帮我吗? Thank you! 谢谢!

Edit 编辑

I tried the following:. 我尝试了以下几点:

e <- melt(d, id.vars=c("pat", "gene"))
e %>% mutate(test=ifelse(lookup[as.character(pat)] == variable, replace(value, is.na(value), 0), value))

My code works partially. 我的代码部分工作。 It succeeded in recoding the NA in d$Apple[5] but not in d$Banana[4] because the lookup can only give the first value: 它成功地重新编码了d$Apple[5]NA d$Apple[5]而不是d$Banana[4]因为查找只能给出第一个值:

lookup["102"]
    102 
"Apple" 

whereas I need my lookup to be able to output both "Apple" and "Banana" and be able to convert NAs fulfilling each condition accordingly. 而我需要我的查找能够输出“Apple”和“Banana”,并能够相应地转换满足每个条件的NAs Any ideas? 有任何想法吗?

Sorry, no dplyr here but code is rather straightforward. 对不起,这里没有dplyr ,但代码相当简单。

for(i in unique(lookup)){
    need_to_replace = is.na(d[[i]]) & (d$pat %in% names(lookup[lookup %in% i]))
    d[[i]][need_to_replace] = 0
}

d

   pat gene Apple Banana Carrot
1 101    a   0.1     NA     NA
2 101    b   0.2     NA     NA
3 101    c   0.3     NA     NA
4 102    d   0.4   0.00     NA
5 102    e   0.0   0.55     NA
6 103    f    NA     NA    0.6

Maybe a bit patchy but I've managed to create a possible solution by looping 也许有点不完整,但我已经设法通过循环创建一个可能的解决方案

for(i in 1:nrow(d)){
  mtch <- lookup[which(d$pat[i] == names(lookup))] # Get lookup matches for row i
  colnum <- which(colnames(d) %in% mtch) # Get column nr that matches lookup value
  newval<-ifelse(is.na(d[i,colnum]),0,d[i,colnum]) # if it contains NA replace with 0
  d[i,colnum]<-unlist(newval) # replace the values

}

Output 产量

  pat gene Apple Banana Carrot
1 101    a   0.1     NA     NA
2 101    b   0.2     NA     NA
3 101    c   0.3     NA     NA
4 102    d   0.4   0.00     NA
5 102    e   0.0   0.55     NA
6 103    f    NA     NA    0.6

Hope it helps 希望能帮助到你

I would work with the long format and use joins from dplyr . 我会使用长格式并使用dplyr连接。

I'd first get back to long format like the following: 我首先回到如下所示的长格式:

library(tidyverse)
long_format <- d %>% 
  gather(fruit, value, -pat, -gene) 

Then I would create the lookup as a data_frame , so we can use joins. 然后我将创建查找作为data_frame ,因此我们可以使用连接。

lookup <- tribble(~pat, ~fruit,
                  101, "Apple",
                  102, "Apple",
                  102, "Banana",
                  103, "Carrot")

Using the right_join means, we preserve all combinations from the lookup. 使用right_join意味着,我们保留查找中的所有组合。 We then replace missing values with 0 and spread back to wide format, in case you need that. 然后,我们将缺少的值替换为0并扩展回宽格式,以备您需要时使用。

long_format %>% 
  right_join(lookup) %>% 
  replace_na(replace = list(value = 0)) %>%
  spread(fruit, value)
#> Joining, by = c("pat", "fruit")
#> pat gene Apple Banana Carrot
#> 1 101    a   0.1     NA     NA
#> 2 101    b   0.2     NA     NA
#> 3 101    c   0.3     NA     NA
#> 4 102    d   0.4   0.00     NA
#> 5 102    e   0.0   0.55     NA
#> 6 103    f    NA     NA    0.6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM