简体   繁体   English

传递给 lapply 时是否可以返回列表中数据框的名称?

[英]Is it possible to return the name of a data frame in a list when passed to lapply?

I have a data frame with factors that I want to split and then apply a function to the data frame (ultimately using purrr::map() , by simplified in reprex with lapply() ).我有一个数据框,其中包含我想要拆分的因素,然后将 function 应用于数据框(最终使用purrr::map() ,通过在 reprex 中使用lapply()进行简化)。 When the data frame is split using multiple factors there are times when the data frame in the list is <0 rows>.当使用多个因素拆分数据框时,有时列表中的数据框为 <0 行>。 In such instances, I would like to store the name of the list item so that I can return it.在这种情况下,我想存储列表项的名称,以便我可以返回它。 The behavior can be replicated by filtering a data frame to remove values with one of the factor levels.可以通过过滤数据框以删除具有其中一个因子级别的值来复制该行为。 In the reproducible example below I want to capture "fizz" when <0 rows> ultimately to pass "fizz" to the message such that I can get a message that has data frame fizz has 0 rows .在下面的可重现示例中,我想在 <0 rows> 最终将“fizz”传递给消息时捕获“fizz”,这样我就可以得到一条消息,该消息的data frame fizz has 0 rows

# create data frame
A = c(rep("foo", 3), rep("bar", 5), rep("fizz", 1))
B = 1:9
C = LETTERS[11:19]
df <- data.frame(A = A, B = B, C = C)
df$A <- as.factor(df$A)

# show expected outcome on full data set 
mylist <- split(df, df$A)
names(mylist)
#> [1] "bar"  "fizz" "foo"

# desired outcome
myresult <- lapply(mylist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
}
)
#> data frame bar has 5 rows 
#> data frame fizz has 1 rows 
#> data frame foo has 3 rows

# now subset to remove row with factor level == fizz
df <- df[df$A != "fizz", ]

# new list still has 3 elements but one has <0 rows>
(mylist <- split(df, df$A))
#> $bar
#>     A B C
#> 4 bar 4 N
#> 5 bar 5 O
#> 6 bar 6 P
#> 7 bar 7 Q
#> 8 bar 8 R
#> 
#> $fizz
#> [1] A B C
#> <0 rows> (or 0-length row.names)
#> 
#> $foo
#>     A B C
#> 1 foo 1 K
#> 2 foo 2 L
#> 3 foo 3 M

# names still exist in the list
names(mylist)
#> [1] "bar"  "fizz" "foo"

# same function obviously doesn't return a vector with "fizz"
# as mylist$fizz has no values to pass to unique()
# In this example I want "data frame fizz has 0 rows"
myresult <- lapply(mylist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
  }
  )
#> data frame bar has 5 rows 
#> data frame  has 0 rows 
#> data frame foo has 3 rows


# is there a function that I can use that is similar to 
# `.id =` option in `bind_rows` that appends the list item name to the data
# such that "fizz" could still be captured when <0 rows>?

newlist <- split(df, list(df$A, df$C))
# when returning to a dataframe with dplyr
df2 <- dplyr::bind_rows(newlist, .id = "id")
levels(df2$A)
#> [1] "bar"  "fizz" "foo"
df2
#>      id   A B C
#> 1 foo.K foo 1 K
#> 2 foo.L foo 2 L
#> 3 foo.M foo 3 M
#> 4 bar.N bar 4 N
#> 5 bar.O bar 5 O
#> 6 bar.P bar 6 P
#> 7 bar.Q bar 7 Q
#> 8 bar.R bar 8 R

# still no "fizz" results using this method either despite:
names(newlist)
#>  [1] "bar.K"  "fizz.K" "foo.K"  "bar.L"  "fizz.L" "foo.L"  "bar.M"  "fizz.M"
#>  [9] "foo.M"  "bar.N"  "fizz.N" "foo.N"  "bar.O"  "fizz.O" "foo.O"  "bar.P" 
#> [17] "fizz.P" "foo.P"  "bar.Q"  "fizz.Q" "foo.Q"  "bar.R"  "fizz.R" "foo.R"

# message output now has many empty names
myresult <- lapply(newlist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
}
)
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows

Created on 2022-04-29 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-04-29

Updated with additional example更新了额外的例子

The above reprex was a simplified example in hopes that there was a simple function (like the suggestion for deparse(substitute(x)) that I was missing. The following example provides a more realistic application where there is a function (ie lm ) applied to the data and the function provides the user with a message indicating there were no values for the factor level.上面的 reprex 是一个简化的例子,希望有一个简单的 function(就像我遗漏的 deparse(substitute(x)) 的建议。下面的例子提供了一个更现实的应用程序,其中应用了 function(即lm )到数据和 function 向用户提供一条消息,指示因子水平没有值。

suppressMessages(library(dplyr))
library(purrr)
# create data frame
A = c(rep("foo", 3), rep("bar", 5), rep("fizz", 1))
B = 1:9
C = c("X", "X", "Y", rep("Y", 5), "Y")
df <- data.frame(A = A, B = B, C = C)
df %>% mutate(A = as.factor(A), C = as.factor(C))
#>      A B C
#> 1  foo 1 X
#> 2  foo 2 X
#> 3  foo 3 Y
#> 4  bar 4 Y
#> 5  bar 5 Y
#> 6  bar 6 Y
#> 7  bar 7 Y
#> 8  bar 8 Y
#> 9 fizz 9 Y
complicated_function <- function(x) {
  if (nrow(x) >= 1) {
    value_to_print <-
      ifelse(length(unique(x$A)) == 1, 
             as.character(paste(unique(x$A), unique(x$C))), 
             "with multiple factors")
    cat(paste0("data frame ", value_to_print, " has ", nrow(x), " rows \n"))
    x$D <- x$B * 2 # imagine as a complicated process
    return(x)
  } else {
    cat("No data for this factor \n")
    return(x)
  }
}

df_processed <- complicated_function(df)
#> data frame with multiple factors has 9 rows

df %>% 
  split(list(.$A, .$C)) %>% 
  map(complicated_function) %>% 
  bind_rows() -> newdf
#> No data for this factor 
#> No data for this factor 
#> data frame foo X has 2 rows 
#> data frame bar Y has 5 rows 
#> data frame fizz Y has 1 rows 
#> data frame foo Y has 1 rows

Created on 2022-05-01 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-05-01

I would like the output to replicate the output for factor combinations where <0 rows> such that rather than a generic "No data for this factor" is replaced with "data frame fizz X has 0 rows".我希望 output 为因子组合复制 output,其中<0 rows>这样,而不是通用的“此因子无数据”被替换为“数据框 fizz X 有 0 行”。

Because the function is more complicated than just the original cat() example it is possible (and possibly both necessary and faster) to provide a summary message at the end of the function to report factors that has <0 rows> .因为 function 比原始的cat()示例更复杂,所以可以(并且可能既必要又更快)在 function 的末尾提供摘要消息以报告具有<0 rows>的因素。

One approach would be to use mapply() instead of lapply to pass the names and the list.一种方法是使用mapply()而不是lapply来传递名称和列表。 Still not deriving the names from x, but it might work for your application:仍未从 x 派生名称,但它可能适用于您的应用程序:

myresult <- mapply(function(x, y) {cat(paste0("data frame ", y,
         " has ", nrow(x), " rows \n"))}, mylist, names(mylist))
# data frame bar has 5 rows 
# data frame fizz has 0 rows 
# data frame foo has 3 rows

Finally, since lapply does not pass the name along with x, you could attach it as an attribute to each list element:最后,由于lapply不会将名称与 x 一起传递,您可以将其作为属性附加到每个列表元素:

for (i in seq(mylist)) attr(mylist[[i]], "name") <- names(mylist)[i]
myresult <- lapply(mylist, FUN = function(x) {
   cat(paste0("data frame ", attr(x, "name"), " has ", nrow(x), " rows \n"))
   }
   )
# data frame bar has 5 rows 
# data frame fizz has 0 rows 
# data frame foo has 3 rows 

I have the same exact problem, trying to do a more advanced loop using apply.我有同样的问题,试图使用 apply 做一个更高级的循环。 I've been using for loops instead and pass dataframe names when needed to a list vector that has all the names.我一直在使用for循环,并在需要时将 dataframe 名称传递给包含所有名称的列表向量。 Use the current loop iteration i to get the appropriate file name.使用当前循环迭代i获取适当的文件名。 Though it is not an elegant solution as people have been saying you don't need for loops.尽管这不是一个优雅的解决方案,因为人们一直在说您不需要循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM