简体   繁体   中英

Is it possible to return the name of a data frame in a list when passed to lapply?

I have a data frame with factors that I want to split and then apply a function to the data frame (ultimately using purrr::map() , by simplified in reprex with lapply() ). When the data frame is split using multiple factors there are times when the data frame in the list is <0 rows>. In such instances, I would like to store the name of the list item so that I can return it. The behavior can be replicated by filtering a data frame to remove values with one of the factor levels. In the reproducible example below I want to capture "fizz" when <0 rows> ultimately to pass "fizz" to the message such that I can get a message that has data frame fizz has 0 rows .

# create data frame
A = c(rep("foo", 3), rep("bar", 5), rep("fizz", 1))
B = 1:9
C = LETTERS[11:19]
df <- data.frame(A = A, B = B, C = C)
df$A <- as.factor(df$A)

# show expected outcome on full data set 
mylist <- split(df, df$A)
names(mylist)
#> [1] "bar"  "fizz" "foo"

# desired outcome
myresult <- lapply(mylist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
}
)
#> data frame bar has 5 rows 
#> data frame fizz has 1 rows 
#> data frame foo has 3 rows

# now subset to remove row with factor level == fizz
df <- df[df$A != "fizz", ]

# new list still has 3 elements but one has <0 rows>
(mylist <- split(df, df$A))
#> $bar
#>     A B C
#> 4 bar 4 N
#> 5 bar 5 O
#> 6 bar 6 P
#> 7 bar 7 Q
#> 8 bar 8 R
#> 
#> $fizz
#> [1] A B C
#> <0 rows> (or 0-length row.names)
#> 
#> $foo
#>     A B C
#> 1 foo 1 K
#> 2 foo 2 L
#> 3 foo 3 M

# names still exist in the list
names(mylist)
#> [1] "bar"  "fizz" "foo"

# same function obviously doesn't return a vector with "fizz"
# as mylist$fizz has no values to pass to unique()
# In this example I want "data frame fizz has 0 rows"
myresult <- lapply(mylist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
  }
  )
#> data frame bar has 5 rows 
#> data frame  has 0 rows 
#> data frame foo has 3 rows


# is there a function that I can use that is similar to 
# `.id =` option in `bind_rows` that appends the list item name to the data
# such that "fizz" could still be captured when <0 rows>?

newlist <- split(df, list(df$A, df$C))
# when returning to a dataframe with dplyr
df2 <- dplyr::bind_rows(newlist, .id = "id")
levels(df2$A)
#> [1] "bar"  "fizz" "foo"
df2
#>      id   A B C
#> 1 foo.K foo 1 K
#> 2 foo.L foo 2 L
#> 3 foo.M foo 3 M
#> 4 bar.N bar 4 N
#> 5 bar.O bar 5 O
#> 6 bar.P bar 6 P
#> 7 bar.Q bar 7 Q
#> 8 bar.R bar 8 R

# still no "fizz" results using this method either despite:
names(newlist)
#>  [1] "bar.K"  "fizz.K" "foo.K"  "bar.L"  "fizz.L" "foo.L"  "bar.M"  "fizz.M"
#>  [9] "foo.M"  "bar.N"  "fizz.N" "foo.N"  "bar.O"  "fizz.O" "foo.O"  "bar.P" 
#> [17] "fizz.P" "foo.P"  "bar.Q"  "fizz.Q" "foo.Q"  "bar.R"  "fizz.R" "foo.R"

# message output now has many empty names
myresult <- lapply(newlist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
}
)
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows

Created on 2022-04-29 by the reprex package (v2.0.1)

Updated with additional example

The above reprex was a simplified example in hopes that there was a simple function (like the suggestion for deparse(substitute(x)) that I was missing. The following example provides a more realistic application where there is a function (ie lm ) applied to the data and the function provides the user with a message indicating there were no values for the factor level.

suppressMessages(library(dplyr))
library(purrr)
# create data frame
A = c(rep("foo", 3), rep("bar", 5), rep("fizz", 1))
B = 1:9
C = c("X", "X", "Y", rep("Y", 5), "Y")
df <- data.frame(A = A, B = B, C = C)
df %>% mutate(A = as.factor(A), C = as.factor(C))
#>      A B C
#> 1  foo 1 X
#> 2  foo 2 X
#> 3  foo 3 Y
#> 4  bar 4 Y
#> 5  bar 5 Y
#> 6  bar 6 Y
#> 7  bar 7 Y
#> 8  bar 8 Y
#> 9 fizz 9 Y
complicated_function <- function(x) {
  if (nrow(x) >= 1) {
    value_to_print <-
      ifelse(length(unique(x$A)) == 1, 
             as.character(paste(unique(x$A), unique(x$C))), 
             "with multiple factors")
    cat(paste0("data frame ", value_to_print, " has ", nrow(x), " rows \n"))
    x$D <- x$B * 2 # imagine as a complicated process
    return(x)
  } else {
    cat("No data for this factor \n")
    return(x)
  }
}

df_processed <- complicated_function(df)
#> data frame with multiple factors has 9 rows

df %>% 
  split(list(.$A, .$C)) %>% 
  map(complicated_function) %>% 
  bind_rows() -> newdf
#> No data for this factor 
#> No data for this factor 
#> data frame foo X has 2 rows 
#> data frame bar Y has 5 rows 
#> data frame fizz Y has 1 rows 
#> data frame foo Y has 1 rows

Created on 2022-05-01 by the reprex package (v2.0.1)

I would like the output to replicate the output for factor combinations where <0 rows> such that rather than a generic "No data for this factor" is replaced with "data frame fizz X has 0 rows".

Because the function is more complicated than just the original cat() example it is possible (and possibly both necessary and faster) to provide a summary message at the end of the function to report factors that has <0 rows> .

One approach would be to use mapply() instead of lapply to pass the names and the list. Still not deriving the names from x, but it might work for your application:

myresult <- mapply(function(x, y) {cat(paste0("data frame ", y,
         " has ", nrow(x), " rows \n"))}, mylist, names(mylist))
# data frame bar has 5 rows 
# data frame fizz has 0 rows 
# data frame foo has 3 rows

Finally, since lapply does not pass the name along with x, you could attach it as an attribute to each list element:

for (i in seq(mylist)) attr(mylist[[i]], "name") <- names(mylist)[i]
myresult <- lapply(mylist, FUN = function(x) {
   cat(paste0("data frame ", attr(x, "name"), " has ", nrow(x), " rows \n"))
   }
   )
# data frame bar has 5 rows 
# data frame fizz has 0 rows 
# data frame foo has 3 rows 

I have the same exact problem, trying to do a more advanced loop using apply. I've been using for loops instead and pass dataframe names when needed to a list vector that has all the names. Use the current loop iteration i to get the appropriate file name. Though it is not an elegant solution as people have been saying you don't need for loops.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM