簡體   English   中英

在 lapply 內調用匯總函數返回 NaN 值

[英]Calling summary function inside lapply returning NaN values

假設我有這個數據框:

df <- structure(list(q1 = structure(c(2L, 2L, 4L, 
3L, 1L, 4L), .Label = c("I dont like\na thing", 
"I really dont like\nthat thing", "I like a\nthing", 
"Ambivalent\nabout the thing"), class = "factor"), q2 = structure(c(3L, 
2L, 1L, 1L, 4L, 1L), .Label = c("Neither like\nnor dislike", 
"Somewhat\ndislike", "Somewhat\nlike", "Strongly\ndislike", "Strongly\nlike"
), class = "factor")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

我可以毫無問題地運行以下 dplyr 塊:

df %>%
    summarise(question = 'q1',
              n = sum(!is.na(q1)), 
              mean = mean(as.numeric(q1), na.rm = T), 
              sd = sd(as.numeric(q1), na.rm = T), 
              se = sd/sqrt(n), 
              ci_lo = mean - qnorm(1 - (.05/2))*se,  # qnorm() provides the specified Z-score
              ci_hi = mean + qnorm(1 - (.05/2))*se,
              min = min(as.integer(q1)),
              max = max(as.integer(q1)))


# A tibble: 1 x 9
  question     n  mean    sd    se ci_lo ci_hi   min   max
  <chr>    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 q1           6  2.67  1.21 0.494  1.70  3.64     1     4

但是如果我嘗試將它放在lapply()函數中並在列表中的所有列名上調用它,它會返回一堆NaNNA值。

summary_stats <- function(question){
  df %>%
    summarise(question = question,
              n = sum(!is.na(question)), 
              mean = mean(as.numeric(question), na.rm = T), 
              sd = sd(as.numeric(question), na.rm = T), 
              se = sd/sqrt(n), 
              ci_lo = mean - qnorm(1 - (.05 / 2)) * se,  # qnorm() provides the specified Z-score
              ci_hi = mean + qnorm(1 - (.05 / 2)) * se,
              min = min(as.numeric(question)),
              max = max(as.numeric(question))) 
}

colnames <- 
  df %>% 
  select(starts_with("q")) %>% 
  colnames

lapply(colnames, summary_stats)


[[1]]
# A tibble: 1 x 9
  question     n  mean    sd    se ci_lo ci_hi   min   max
  <chr>    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 q1           1   NaN    NA    NA   NaN   NaN    NA    NA

[[2]]
# A tibble: 1 x 9
  question     n  mean    sd    se ci_lo ci_hi   min   max
  <chr>    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 q2           1   NaN    NA    NA   NaN   NaN    NA    NA

Warning messages:
1: In mean(as.integer(question), na.rm = T) : NAs introduced by coercion
2: In is.data.frame(x) : NAs introduced by coercion
3: In mask$eval_all_summarise(quo) : NAs introduced by coercion
4: In mask$eval_all_summarise(quo) : NAs introduced by coercion
5: In mean(as.integer(question), na.rm = T) : NAs introduced by coercion
6: In is.data.frame(x) : NAs introduced by coercion
7: In mask$eval_all_summarise(quo) : NAs introduced by coercion
8: In mask$eval_all_summarise(quo) : NAs introduced by coercion

有誰知道我哪里出錯了? 我還想返回一個lapply ,每列有一行饋送到lapply函數,而不是每列一個 tbl_df。 那可能嗎?

您將列名傳遞給函數,而函數需要列數據。

這是另一種方式 -

library(dplyr)
library(purrr)

summary_stats <- function(data){
       tibble(n = sum(!is.na(data)), 
              mean = mean(as.numeric(data), na.rm = T), 
              sd = sd(as.numeric(data), na.rm = T), 
              se = sd/sqrt(n), 
              ci_lo = mean - qnorm(1 - (.05 / 2)) * se,
              ci_hi = mean + qnorm(1 - (.05 / 2)) * se,
              min = min(as.numeric(data)),
              max = max(as.numeric(data))) 
}

map_df(df %>% select(starts_with('q')), summary_stats, .id = 'question')

#  question     n  mean    sd    se ci_lo ci_hi   min   max
#  <chr>    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 q1           6  2.67  1.21 0.494 1.70   3.64     1     4
#2 q2           6  2     1.26 0.516 0.988  3.01     1     4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM