简体   繁体   中英

R - How to convert this nested for loop into an lapply function that can mutate a list

I have data that looks like this

aList <- list(a1 = c("apple", "banana", "orange", "strawberry", "cherry"),
              a2 = c("banana", "cherry", "apple"),
              a3 = c("apple", "strawberry", "pineapple"),
              a4 = c("raspberry", "strawberry", "apple"),
              a5 = c("pineapple", "lemon", "orange", "banana", "apple"),
              a6 = c("lemon", "apple", "blueberry"),
              a7 = c("watermelon", "apple", "banana", "mango"),
              a8 = c("mango", "cherry", "apple", "lemon"),
              a9 = c("orange", "banana", "strawberry"),
              a10 = c("mango", "strawberry"))

I'd like to get it into a vertical format, like what happens when you run this code:

vertical_data <- list()
for (x in names(aList)) {
  for (y in aList[[x]]) {
    if (is.null(vertical_data[[y]])) {
      vertical_data[[y]] <- x
    } else {
      vertical_data[[y]] <- c(x, vertical_data[[y]])
    }
  }
}
vertical_data

I'd like each entry to tell me where the particular fruit occurs.

This was easy enough to do with a double for loop. But when I do the same thing with a nested lapply function, it looks like it doesn't modify the list (ie vertical_data) at all. Why is that? The reason I'd like to do this with an apply function is because it's faster. My actual dataset will have thousands of items, and "fruits". It'll take way too long with for loops.

I'd really appreciate the help.

Thanks

We can use split on the unlist ed data

split(rep(names(aList), lengths(aList)), unlist(aList))

Or another option would be to stack to a two column 'data.frame' and then do the split

with(stack(aList), split(as.character(ind), values))
#$apple
#[1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8"

#$banana
#[1] "a1" "a2" "a5" "a7" "a9"

#$blueberry
#[1] "a6"

#$cherry
#[1] "a1" "a2" "a8"

#$lemon
#[1] "a5" "a6" "a8"

#$mango
#[1] "a7"  "a8"  "a10"

#$orange
#[1] "a1" "a5" "a9"

#$pineapple
#[1] "a3" "a5"

#$raspberry
#[1] "a4"

#$strawberry
#[1] "a1"  "a3"  "a4"  "a9"  "a10"

#$watermelon
#[1] "a7"

Or as @rawr mentioned

unstack(stack(aList)[2:1])

Regarding the assignment within the lapply and the for loop, it is based on the environment. In the for loop, the assignment modifies the object in the global env, but in lapply , it is a self-contained env or else have to do <<- (not advisable) or specify the env as the global env

vertical_data <- list()
lapply(names(aList), function(x) lapply(aList[[x]], 
      function(y) if (is.null(vertical_data[[y]])) {
         vertical_data[[y]] <<- x
         } else {vertical_data[[y]] <<- c(x, vertical_data[[y]])
         }))

We can use enframe to convert names list to dataframe and then split name based on value .

tibble::enframe(aList) %>% tidyr::unnest(value) %>% {split(.$name, .$value)}

#$apple
#[1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8"

#$banana
#[1] "a1" "a2" "a5" "a7" "a9"

#$blueberry
#[1] "a6"

#$cherry
#[1] "a1" "a2" "a8"

#$lemon
#[1] "a5" "a6" "a8"

#$mango
#[1] "a7"  "a8"  "a10"

#$orange
#[1] "a1" "a5" "a9"

#$pineapple
#[1] "a3" "a5"

#$raspberry
#[1] "a4"

#$strawberry
#[1] "a1"  "a3"  "a4"  "a9"  "a10"

#$watermelon
#[1] "a7"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM