DPLYR 中匯總的奇怪行為

Question

根據對兒童及其父母進行的一項調查，我有兩張大桌子（~12k x 6）。 這些表格在尺寸、類型/類別上相同，並且被相同地處理成 R。 經過一番爭吵（同樣，對孩子和父母做了同樣的事情），我運行以下代碼：

更新：原來我的問題的根源是變量 C 在Children數據集中只有值 0 和 1。 將summarise與table一起使用時，有什么辦法可以解決此錯誤？

Parents %>% 
  summarise(across(A, ~ table(.x)),
            across(B, ~table(.x)),
            across(C, ~ table(.x)),
            across(D, ~ table(.x)),
            across(E, ~ table(.x)))

Children %>%  
  summarise(across(A, ~ table(.x)),
            across(B, ~table(.x)),
            across(C, ~ table(.x)),
            across(D, ~ table(.x)),
            across(E, ~ table(.x)))

對於Parents ，我得到以下 output （唯一值 D var (1,2,3)，其他 (0,1,2) 的頻率：

        A          B      C           D      E
1   11840      11835  11409       11363    519
2      35         42    436         473   4912
3       3          1     33          42   6447

對於Children ，我收到以下錯誤：

Error: Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Run `rlang::last_error()` to see where the error occurred.

運行rlang::last_error()返回：

<error/dplyr_error>
Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Backtrace:
Run `rlang::last_trace()` to see the full context.

運行rlang::last_trace()返回：

<error/dplyr_error>
Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Backtrace:
     █
  1. ├─`%>%`(...)
  2. ├─dplyr::summarise(...)
  3. ├─dplyr:::summarise.data.frame(...)
  4. │ └─dplyr:::summarise_cols(.data, ...)
  5. │   └─base::withCallingHandlers(...)
  6. ├─dplyr:::abort_glue(...)
  7. │ ├─rlang::exec(abort, class = class, !!!data)
  8. │ └─(function (message = NULL, class = NULL, ..., trace = NULL, parent = NULL, ...
  9. │   └─rlang:::signal_abort(cnd)
 10. │     └─base::signalCondition(cnd)
 11. └─(function (e) ...

有誰知道會發生什么？

為了理智起見，這里是str摘要：

> str(Parents)
'data.frame':   11878 obs. of  6 variables:
 $ ID         : chr  "Parent 1" "Parent 2" "Parent 3" "Parent 4" ...
 $ A          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ B          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ C          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ D          : num  2 2 1 2 3 3 2 3 3 2 ...
 $ E          : num  0 0 0 0 0 0 0 0 0 0 ...
> str(Children)
'data.frame':   11878 obs. of  6 variables:
 $ ID         : chr  "Child 1" "Child 2" "Child 3" "Child 4" ...
 $ A          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ B          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ C          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ D          : num  2 2 1 2 3 3 2 3 3 2 ...
 $ E          : num  0 0 0 0 0 0 0 0 0 0 ...

Answer 1

table不一定總是適合tidyverse管道，因為它返回的值數量不相等。 我認為以長格式獲取數據並使用count會更好。 您將獲得相同的信息，但格式較長。

library(dplyr)
library(tidyr)

Parents %>%  pivot_longer(cols = A:E) %>% count(name, value)

同樣適用於Children數據。

DPLYR 中匯總的奇怪行為

問題描述

1 個解決方案

解決方案1
1 已采納 2021-04-28 02:41:30

DPLYR 中匯總的奇怪行為

問題描述

1 個解決方案

解決方案1 1 已采納 2021-04-28 02:41:30

解決方案1
1 已采納 2021-04-28 02:41:30