簡體   English   中英

傳遞給 lapply 時是否可以返回列表中數據框的名稱?

[英]Is it possible to return the name of a data frame in a list when passed to lapply?

我有一個數據框,其中包含我想要拆分的因素,然后將 function 應用於數據框(最終使用purrr::map() ,通過在 reprex 中使用lapply()進行簡化)。 當使用多個因素拆分數據框時,有時列表中的數據框為 <0 行>。 在這種情況下,我想存儲列表項的名稱,以便我可以返回它。 可以通過過濾數據框以刪除具有其中一個因子級別的值來復制該行為。 在下面的可重現示例中,我想在 <0 rows> 最終將“fizz”傳遞給消息時捕獲“fizz”,這樣我就可以得到一條消息,該消息的data frame fizz has 0 rows

# create data frame
A = c(rep("foo", 3), rep("bar", 5), rep("fizz", 1))
B = 1:9
C = LETTERS[11:19]
df <- data.frame(A = A, B = B, C = C)
df$A <- as.factor(df$A)

# show expected outcome on full data set 
mylist <- split(df, df$A)
names(mylist)
#> [1] "bar"  "fizz" "foo"

# desired outcome
myresult <- lapply(mylist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
}
)
#> data frame bar has 5 rows 
#> data frame fizz has 1 rows 
#> data frame foo has 3 rows

# now subset to remove row with factor level == fizz
df <- df[df$A != "fizz", ]

# new list still has 3 elements but one has <0 rows>
(mylist <- split(df, df$A))
#> $bar
#>     A B C
#> 4 bar 4 N
#> 5 bar 5 O
#> 6 bar 6 P
#> 7 bar 7 Q
#> 8 bar 8 R
#> 
#> $fizz
#> [1] A B C
#> <0 rows> (or 0-length row.names)
#> 
#> $foo
#>     A B C
#> 1 foo 1 K
#> 2 foo 2 L
#> 3 foo 3 M

# names still exist in the list
names(mylist)
#> [1] "bar"  "fizz" "foo"

# same function obviously doesn't return a vector with "fizz"
# as mylist$fizz has no values to pass to unique()
# In this example I want "data frame fizz has 0 rows"
myresult <- lapply(mylist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
  }
  )
#> data frame bar has 5 rows 
#> data frame  has 0 rows 
#> data frame foo has 3 rows


# is there a function that I can use that is similar to 
# `.id =` option in `bind_rows` that appends the list item name to the data
# such that "fizz" could still be captured when <0 rows>?

newlist <- split(df, list(df$A, df$C))
# when returning to a dataframe with dplyr
df2 <- dplyr::bind_rows(newlist, .id = "id")
levels(df2$A)
#> [1] "bar"  "fizz" "foo"
df2
#>      id   A B C
#> 1 foo.K foo 1 K
#> 2 foo.L foo 2 L
#> 3 foo.M foo 3 M
#> 4 bar.N bar 4 N
#> 5 bar.O bar 5 O
#> 6 bar.P bar 6 P
#> 7 bar.Q bar 7 Q
#> 8 bar.R bar 8 R

# still no "fizz" results using this method either despite:
names(newlist)
#>  [1] "bar.K"  "fizz.K" "foo.K"  "bar.L"  "fizz.L" "foo.L"  "bar.M"  "fizz.M"
#>  [9] "foo.M"  "bar.N"  "fizz.N" "foo.N"  "bar.O"  "fizz.O" "foo.O"  "bar.P" 
#> [17] "fizz.P" "foo.P"  "bar.Q"  "fizz.Q" "foo.Q"  "bar.R"  "fizz.R" "foo.R"

# message output now has many empty names
myresult <- lapply(newlist, FUN = function(x) {
  value_to_save <- unique(x$A)
  cat(paste0("data frame ", value_to_save, " has ", nrow(x), " rows \n"))
}
)
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame foo has 1 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows 
#> data frame bar has 1 rows 
#> data frame  has 0 rows 
#> data frame  has 0 rows

reprex package (v2.0.1) 創建於 2022-04-29

更新了額外的例子

上面的 reprex 是一個簡化的例子,希望有一個簡單的 function(就像我遺漏的 deparse(substitute(x)) 的建議。下面的例子提供了一個更現實的應用程序,其中應用了 function(即lm )到數據和 function 向用戶提供一條消息,指示因子水平沒有值。

suppressMessages(library(dplyr))
library(purrr)
# create data frame
A = c(rep("foo", 3), rep("bar", 5), rep("fizz", 1))
B = 1:9
C = c("X", "X", "Y", rep("Y", 5), "Y")
df <- data.frame(A = A, B = B, C = C)
df %>% mutate(A = as.factor(A), C = as.factor(C))
#>      A B C
#> 1  foo 1 X
#> 2  foo 2 X
#> 3  foo 3 Y
#> 4  bar 4 Y
#> 5  bar 5 Y
#> 6  bar 6 Y
#> 7  bar 7 Y
#> 8  bar 8 Y
#> 9 fizz 9 Y
complicated_function <- function(x) {
  if (nrow(x) >= 1) {
    value_to_print <-
      ifelse(length(unique(x$A)) == 1, 
             as.character(paste(unique(x$A), unique(x$C))), 
             "with multiple factors")
    cat(paste0("data frame ", value_to_print, " has ", nrow(x), " rows \n"))
    x$D <- x$B * 2 # imagine as a complicated process
    return(x)
  } else {
    cat("No data for this factor \n")
    return(x)
  }
}

df_processed <- complicated_function(df)
#> data frame with multiple factors has 9 rows

df %>% 
  split(list(.$A, .$C)) %>% 
  map(complicated_function) %>% 
  bind_rows() -> newdf
#> No data for this factor 
#> No data for this factor 
#> data frame foo X has 2 rows 
#> data frame bar Y has 5 rows 
#> data frame fizz Y has 1 rows 
#> data frame foo Y has 1 rows

reprex package (v2.0.1) 創建於 2022-05-01

我希望 output 為因子組合復制 output,其中<0 rows>這樣,而不是通用的“此因子無數據”被替換為“數據框 fizz X 有 0 行”。

因為 function 比原始的cat()示例更復雜,所以可以(並且可能既必要又更快)在 function 的末尾提供摘要消息以報告具有<0 rows>的因素。

一種方法是使用mapply()而不是lapply來傳遞名稱和列表。 仍未從 x 派生名稱,但它可能適用於您的應用程序:

myresult <- mapply(function(x, y) {cat(paste0("data frame ", y,
         " has ", nrow(x), " rows \n"))}, mylist, names(mylist))
# data frame bar has 5 rows 
# data frame fizz has 0 rows 
# data frame foo has 3 rows

最后,由於lapply不會將名稱與 x 一起傳遞,您可以將其作為屬性附加到每個列表元素:

for (i in seq(mylist)) attr(mylist[[i]], "name") <- names(mylist)[i]
myresult <- lapply(mylist, FUN = function(x) {
   cat(paste0("data frame ", attr(x, "name"), " has ", nrow(x), " rows \n"))
   }
   )
# data frame bar has 5 rows 
# data frame fizz has 0 rows 
# data frame foo has 3 rows 

我有同樣的問題,試圖使用 apply 做一個更高級的循環。 我一直在使用for循環,並在需要時將 dataframe 名稱傳遞給包含所有名稱的列表向量。 使用當前循環迭代i獲取適當的文件名。 盡管這不是一個優雅的解決方案,因為人們一直在說您不需要循環。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM