简体   繁体   中英

Conditional recode based on lookup vector

I need to conditionally recode my dataframe d according to a lookup vector.

dput(lookup)
structure(c("Apple", "Apple", "Banana", "Carrot"), .Names = c("101", "102", "102", "103"))
dput(d)
structure(list(pat = c(101, 101, 101, 102, 102, 103), gene = structure(1:6, .Label = c("a", 
"b", "c", "d", "e", "f"), class = "factor"), Apple = c(0.1, 0.2, 
0.3, 0.4, NA, NA), Banana = c(NA, NA, NA, NA, 0.55, NA), Carrot = c(NA, 
NA, NA, NA, NA, 0.6)), .Names = c("pat", "gene", "Apple", "Banana", 
"Carrot"), row.names = c(NA, -6L), class = "data.frame")

d is a wide dataframe that I got through reshape . I need to recode any NAs within each of the columns Apple , Banana and Carrot to 0 if pat matches that column according to the lookup table. In this case, d$Apple[5] and d$Banana[4] would be recoded to 0 .

I've been toying with recode from dplyr but I have no idea how to get it to lookup and recode, not to mention that it has to be done on multiple columns... There was another related post on recoding variables in R with a lookup table but it can't seem to apply to my problem. Can anyone help me pls? Thank you!

Edit

I tried the following:.

e <- melt(d, id.vars=c("pat", "gene"))
e %>% mutate(test=ifelse(lookup[as.character(pat)] == variable, replace(value, is.na(value), 0), value))

My code works partially. It succeeded in recoding the NA in d$Apple[5] but not in d$Banana[4] because the lookup can only give the first value:

lookup["102"]
    102 
"Apple" 

whereas I need my lookup to be able to output both "Apple" and "Banana" and be able to convert NAs fulfilling each condition accordingly. Any ideas?

Sorry, no dplyr here but code is rather straightforward.

for(i in unique(lookup)){
    need_to_replace = is.na(d[[i]]) & (d$pat %in% names(lookup[lookup %in% i]))
    d[[i]][need_to_replace] = 0
}

d

   pat gene Apple Banana Carrot
1 101    a   0.1     NA     NA
2 101    b   0.2     NA     NA
3 101    c   0.3     NA     NA
4 102    d   0.4   0.00     NA
5 102    e   0.0   0.55     NA
6 103    f    NA     NA    0.6

Maybe a bit patchy but I've managed to create a possible solution by looping

for(i in 1:nrow(d)){
  mtch <- lookup[which(d$pat[i] == names(lookup))] # Get lookup matches for row i
  colnum <- which(colnames(d) %in% mtch) # Get column nr that matches lookup value
  newval<-ifelse(is.na(d[i,colnum]),0,d[i,colnum]) # if it contains NA replace with 0
  d[i,colnum]<-unlist(newval) # replace the values

}

Output

  pat gene Apple Banana Carrot
1 101    a   0.1     NA     NA
2 101    b   0.2     NA     NA
3 101    c   0.3     NA     NA
4 102    d   0.4   0.00     NA
5 102    e   0.0   0.55     NA
6 103    f    NA     NA    0.6

Hope it helps

I would work with the long format and use joins from dplyr .

I'd first get back to long format like the following:

library(tidyverse)
long_format <- d %>% 
  gather(fruit, value, -pat, -gene) 

Then I would create the lookup as a data_frame , so we can use joins.

lookup <- tribble(~pat, ~fruit,
                  101, "Apple",
                  102, "Apple",
                  102, "Banana",
                  103, "Carrot")

Using the right_join means, we preserve all combinations from the lookup. We then replace missing values with 0 and spread back to wide format, in case you need that.

long_format %>% 
  right_join(lookup) %>% 
  replace_na(replace = list(value = 0)) %>%
  spread(fruit, value)
#> Joining, by = c("pat", "fruit")
#> pat gene Apple Banana Carrot
#> 1 101    a   0.1     NA     NA
#> 2 101    b   0.2     NA     NA
#> 3 101    c   0.3     NA     NA
#> 4 102    d   0.4   0.00     NA
#> 5 102    e   0.0   0.55     NA
#> 6 103    f    NA     NA    0.6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM