[英]Calling summary function inside lapply returning NaN values
假設我有這個數據框:
df <- structure(list(q1 = structure(c(2L, 2L, 4L,
3L, 1L, 4L), .Label = c("I dont like\na thing",
"I really dont like\nthat thing", "I like a\nthing",
"Ambivalent\nabout the thing"), class = "factor"), q2 = structure(c(3L,
2L, 1L, 1L, 4L, 1L), .Label = c("Neither like\nnor dislike",
"Somewhat\ndislike", "Somewhat\nlike", "Strongly\ndislike", "Strongly\nlike"
), class = "factor")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
我可以毫無問題地運行以下 dplyr 塊:
df %>%
summarise(question = 'q1',
n = sum(!is.na(q1)),
mean = mean(as.numeric(q1), na.rm = T),
sd = sd(as.numeric(q1), na.rm = T),
se = sd/sqrt(n),
ci_lo = mean - qnorm(1 - (.05/2))*se, # qnorm() provides the specified Z-score
ci_hi = mean + qnorm(1 - (.05/2))*se,
min = min(as.integer(q1)),
max = max(as.integer(q1)))
# A tibble: 1 x 9
question n mean sd se ci_lo ci_hi min max
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 q1 6 2.67 1.21 0.494 1.70 3.64 1 4
但是如果我嘗試將它放在lapply()
函數中並在列表中的所有列名上調用它,它會返回一堆NaN
和NA
值。
summary_stats <- function(question){
df %>%
summarise(question = question,
n = sum(!is.na(question)),
mean = mean(as.numeric(question), na.rm = T),
sd = sd(as.numeric(question), na.rm = T),
se = sd/sqrt(n),
ci_lo = mean - qnorm(1 - (.05 / 2)) * se, # qnorm() provides the specified Z-score
ci_hi = mean + qnorm(1 - (.05 / 2)) * se,
min = min(as.numeric(question)),
max = max(as.numeric(question)))
}
colnames <-
df %>%
select(starts_with("q")) %>%
colnames
lapply(colnames, summary_stats)
[[1]]
# A tibble: 1 x 9
question n mean sd se ci_lo ci_hi min max
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 q1 1 NaN NA NA NaN NaN NA NA
[[2]]
# A tibble: 1 x 9
question n mean sd se ci_lo ci_hi min max
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 q2 1 NaN NA NA NaN NaN NA NA
Warning messages:
1: In mean(as.integer(question), na.rm = T) : NAs introduced by coercion
2: In is.data.frame(x) : NAs introduced by coercion
3: In mask$eval_all_summarise(quo) : NAs introduced by coercion
4: In mask$eval_all_summarise(quo) : NAs introduced by coercion
5: In mean(as.integer(question), na.rm = T) : NAs introduced by coercion
6: In is.data.frame(x) : NAs introduced by coercion
7: In mask$eval_all_summarise(quo) : NAs introduced by coercion
8: In mask$eval_all_summarise(quo) : NAs introduced by coercion
有誰知道我哪里出錯了? 我還想返回一個lapply
,每列有一行饋送到lapply
函數,而不是每列一個 tbl_df。 那可能嗎?
您將列名傳遞給函數,而函數需要列數據。
這是另一種方式 -
library(dplyr)
library(purrr)
summary_stats <- function(data){
tibble(n = sum(!is.na(data)),
mean = mean(as.numeric(data), na.rm = T),
sd = sd(as.numeric(data), na.rm = T),
se = sd/sqrt(n),
ci_lo = mean - qnorm(1 - (.05 / 2)) * se,
ci_hi = mean + qnorm(1 - (.05 / 2)) * se,
min = min(as.numeric(data)),
max = max(as.numeric(data)))
}
map_df(df %>% select(starts_with('q')), summary_stats, .id = 'question')
# question n mean sd se ci_lo ci_hi min max
# <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 q1 6 2.67 1.21 0.494 1.70 3.64 1 4
#2 q2 6 2 1.26 0.516 0.988 3.01 1 4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.