简体   繁体   中英

recode multiple values in shared column across many dataframes with a grouping variable

EDIT:I have slightly changed the repex from the original question because it wasn't producing an analogous example to my real use.

This is an extension of a previous question, recode/replace multiple values in a shared data column to a single value across data frames , which worked great for a more simple application. I have tried, to no avail, to extend the solution to a slightly more complicated case. I have many different data frames all of which have a few shared columns ('site' and 'grp' in the repex below). In each of the data frames, there are multiple errors, some shared some not, in the 'grp' variable. In the previous question this was solved using the tidyverse and recode functions, by creating a list of key/value elements and recoding them with

keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))
out <- map(df_list, ~ .x %>% 
                  mutate(grp = recode_factor(grp, !!! keyval)))

However, I want to do this when the key/val list is dependent on the value of the other shared variable, 'site'. For example, grp = a1 should be recoded to grp = a when site = s1 and grp = f when site = s2. I have tried extending the above code using map() with a nested call to pmap() in the example below:

#example data frames
library(tidyverse)
df1 = data.frame(site = c(rep("s1",5), rep("s2",5), rep("s3",5)),grp = c("a1","a.","a.",rep("b",4),"b2","b-","bq",rep("a1",5)), measure = rnorm(15))

df2 = data.frame(site = c(rep("s1",10), rep("s2",16), rep("s3",5)), grp = c(rep("as", 3), "b2",rep("a",22),rep("a1",5)), measure2 = rnorm(31))

df3 = data.frame(site = c(rep("s1",3), rep("s2",6), rep("s3",5)),grp = c(rep("b-",3),rep("bq",2),"a", rep("a.", 3),rep("a1",5)), measure3 = 1:14)

df_list = list(df1, df2, df3)

site_list = c("s1","s2","s3")
bad_values = list(c("a1","a.","as", "b2", "b-", "bq"),
                  c("a1","a.","as","b", "b2", "b-", "bq"),
                  c("a1"))
good_values = list(c("a", "a1","a2","b","b1","b2"),
                   c("f","f1","f2","g","g","g1","g2"),
                   c("t"))
#put dfs into list to `map` over
df_list = list(df1, df2, df3)

#what I tried.
#nested pmap() within map()
dfs_mod = map(df_list, ~.x %>%
              pmap(list(site_list,bad_values,good_values),
                   ~mutate(.x, grp = ifelse(site == ..1,recode(grp, !!!setNames(as.list(..2),..3)),grp))))

This currently throws the error code "Error: Don't know how to pluck from a language". Searching on this error I haven't been able to understand what the error is or how to accomplish the task.

EDIT: I have tried a also tried

keyval = map2(good_values, bad_values, ~setNames(as.list(..1),unlist(..2)))
#this creates 3 lists of key/val elements to recode grp on for each site

dfs_mod = map(df_list, function(x){
  map2(site_list, keyval, ~mutate(x, grp = ifelse(site == ..1, recode_factor(grp, !!!..2), grp)))
})

This doesn't throw an error, but doesn't quite accomplish what I desire either. It has a couple undesirable side effects: 1) it creates 3 lists of 3 dataframes, one df recoded for each of the key/val lists and 2) it recodes the factor 'grp' to an integer (which boggles me). It is becoming clearer that I am misunderstanding what map*() is meant to do and am not wedded to using it. So any other ways to iteratively accomplish this would be welcome.

I imagine the expected output would likely be a list the same length as df_list (3 in this case). The 'grp' variables = 'bad_values' should be recoded to 'good_values' depending on the list element location and depending on the 'site' (eg bad_values[[1]][1] -> good_values[[1]][1], bad_values[[1]][2] -> good_values[[1]][2], etc. for site = site_list[[1]]). The first data frame in the 'dfs_mod' list , should be something like:

dfs_mod[[1]]

   site grp    measure
1    s1  a -1.2169476
2    s1  a1  1.0644877
3    s1  a1  0.2007733
4    s1   b  0.8613291
5    s1   b -0.3682463
6    s2   g  1.2535321
7    s2   g  0.7622614
8    s2   g  1.4022664
9    s2   g1 -0.8234464
10   s2   g2 -1.0000354 
11   s3   t  1.34320583
12   s3   t  1.33950010
13   s3   t -1.12670074
14   s3   t  1.59890652
15   s3   t  0.23932814

Thanks for any help.

#old repex data from original question that has been edited above
library(tidyverse)
#create example dfs
df1 = data.frame(site = c(rep("s1",5), rep("s2",5)),grp = c("a1","a.","a.",rep("b",4),"b2","b-","bq"), measure = rnorm(10))

df2 = data.frame(site = c(rep("s1",10), rep("s2",16)), grp = c(rep("as", 3), "b2",rep("a",22)), measure2 = rnorm(26))

df3 = data.frame(site = c(rep("s1",3), rep("s2",6)),grp = c(rep("b-",3),rep("bq",2),"a", rep("a.", 3)), measure3 = 1:9)

site_list = list("s1","s2")
bad_values = list(c("a1","a.","as", "b2", "b-", "bq"),
                   c("a1","a.","as","b", "b2", "b-", "bq"))
good_values = list(c("a", "a1","a2","b","b1","b2"),
                   c("f","f1","f2","g","g","g1","g2"))

I have found a way to accomplish this task that works pretty quickly for my use with a couple of for loops and using answer to previous (linked) question. So simple, embarrassed it took me so long.

library(tidyverse)
keys = map2(good_values, bad_values, ~setNames(as.list(..1),unlist(..2)))

# how to accomplish
for(i in 1:length(site_list)){
  for(df in 1:length(df_list)){
    df_list[[df]] <- pluck(df_list, df) %>%
      mutate(grp = if_else(site == pluck(site_list,i), recode(grp, !!!pluck(keys,i)),grp))
  }
}

df_list[[1]]
   site grp     measure
1    s1   a  0.60083152
2    s1  a1 -0.56181835
3    s1  a1  1.31789556
4    s1   b -2.06659322
5    s1   b  1.21575623
6    s2   g -1.05263188
7    s2   g  1.68731655
8    s2   g -0.59827489
9    s2  g1 -2.22322604
10   s2  g2  0.22577945
11   s3   t -0.08614122
12   s3   t  0.74511934
13   s3   t  1.29782596
14   s3   t -1.87684060
15   s3   t -0.90672568

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM