[英]why I do not get counts over two numerical columns grouped by other categorical vars with tidyverse only?
I have tried to to counts of two numerical variables and did not succeed.我试图对两个数值变量进行计数,但没有成功。 Without this I cannot achieve the Percentages, which I hope I am able to get with your help.
没有这个我就无法达到百分比,我希望我能在你的帮助下得到。 I am trying to do this only with tidyverse.
我只想用 tidyverse 来做到这一点。
This is the error I got, with the code provided:这是我得到的错误,提供了代码:
test_sum <- test_data_3 %>%
dplyr::group_by(across(where(is.factor))) %>%
dplyr::summarise(across(where(is.numeric())))
Error: Problem with `summarise()` input `..1`.
ℹ `..1 = across(where(is.numeric()))`.
x 0 arguments passed to 'is.numeric' which requires 1
Run `rlang::last_error()` to see where the error occurred.
The I tried another code:我尝试了另一个代码:
test_sum <- test_data_3 %>%
dplyr::group_by(provider_name, type, st_nst) %>%
dplyr::summarise(across(where(is.numeric())))
Error: Problem with `summarise()` input `..1`.
ℹ `..1 = across(where(is.numeric()))`.
x 0 arguments passed to 'is.numeric' which requires 1
ℹ The error occurred in group 1: provider_name = "BLACKB", type = "stri", st_nst = "NST".
And this is the stack overflow source I have inspired to try the previous codes: Group by multiple columns and sum other multiple columns这是我启发尝试以前代码的堆栈溢出源: 按多列分组并对其他多列求和
And this is the type of data I have:这是我拥有的数据类型:
dput(test_data_3)
structure(list(financial_year = c(1920, 1920, 1920, 1920, 1920,
1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920,
1920, 1920, 1920, 1920), provider_name = c("LIVEW", "MANCHE",
"MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE",
"SOUTH", "LANCA", "COUNTY", "BUCKINGT", "BLACKB", "BURNLEY",
"ROYAL", "THE", "LOUTH", "IMPERIAL", "WESTERN"), type = c("non_stringent",
"non_stringent", "non_stringent", "non_stringent", "non_stringent",
"non_stringent", "non_stringent", "non_stringent", "non_stringent",
"non_stringent", "stri", "stri", "stri", "stri", "stri", "stri",
"stri", "stri", "stri", "stri"), eld = c(0, 326, 343, 43, 61,
46, 1, 3, 3, 1, 313, 671, 329, 389, 3, 376, 306, 0, 409, 589),
ed = c(1, 23, 23, 0, 2, 0, 1, 0, 0, 0, 7, 3, 4, 4, 0, 0,
2, 1, 3, 1), st_nst = c("ST", "STI", "ST", "ST", "ST", "ST",
"ST", "ST", "ST", "ST", "NST", "NST", "NSt", "NST", "NST",
"NST", "NST", "NST", "NST", "NST")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list(
cols = list(financial_year = structure(list(), class = c("collector_double",
"collector")), trust_code = structure(list(), class = c("collector_character",
"collector")), provider_name = structure(list(), class = c("collector_character",
"collector")), prim_diag = structure(list(), class = c("collector_character",
"collector")), type = structure(list(), class = c("collector_character",
"collector")), elective_discharge = structure(list(), class = c("collector_double",
"collector")), emergency_admission = structure(list(), class = c("collector_double",
"collector")), st_nst = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Or another way of visualising is as such:或者另一种可视化方式是这样的:
test_data_3
# A tibble: 20 x 6
financial_year provider_name type eld ed st_nst
<dbl> <chr> <chr> <dbl> <dbl> <chr>
1 1920 LIVEW non_stringent 0 1 ST
2 1920 MANCHE non_stringent 326 23 STI
3 1920 MANCHE non_stringent 343 23 ST
4 1920 MANCHE non_stringent 43 0 ST
5 1920 MANCHE non_stringent 61 2 ST
6 1920 MANCHE non_stringent 46 0 ST
7 1920 MANCHE non_stringent 1 1 ST
8 1920 MANCHE non_stringent 3 0 ST
9 1920 MANCHE non_stringent 3 0 ST
10 1920 SOUTH non_stringent 1 0 ST
11 1920 LANCA stri 313 7 NST
12 1920 COUNTY stri 671 3 NST
13 1920 BUCKINGT stri 329 4 NSt
14 1920 BLACKB stri 389 4 NST
15 1920 BURNLEY stri 3 0 NST
16 1920 ROYAL stri 376 0 NST
17 1920 THE stri 306 2 NST
18 1920 LOUTH stri 0 1 NST
19 1920 IMPERIAL stri 409 3 NST
20 1920 WESTERN stri 589 1 NST
Can someone explain the mistakes I do?有人可以解释我犯的错误吗? Is there a way to achieve the counts first and then the percentages of the 2 numerical columns, namely
eld & ed
grouped by provider_name, type, st_nst
.有没有办法先实现计数,然后是 2 个数字列的百分比,即按
provider_name, type, st_nst
分组的eld & ed
。 I mean these two columns to be added together into a new one based on the group by variables.我的意思是将这两列根据变量分组添加到一个新列中。
There was no function passed into across
.没有传递给
across
函数。 If the intention is to select
the columns如果意图是
select
列
library(dplyr)
test_data_3 %>%
dplyr::group_by(across(where(is.factor))) %>%
dplyr::select(where(is.numeric))
Suppose, we want to get the sum
of those numeric
columns假设,我们想要得到这些
numeric
列的sum
test_data_3 %>%
dplyr::group_by(across(where(is.factor))) %>%
dplyr::summarise(across(where(is.numeric), sum))
If we want to get the sum per row of numeric columns, select
the numeric
columns ( where(is.numeric)
) from the data ( cur_data()
- would be more correct as it can also work when there are group attributes or use .
), get the row wise sum with rowSums
如果我们想获得每行数字列的总和,从数据(
cur_data()
select
numeric
列( where(is.numeric)
)会更正确,因为它也可以在有组属性或使用时工作.
),使用rowSums
获取行明智总和
test_data_3 %>%
mutate(count = select(cur_data(), where(is.numeric)) %>%
rowSums(na.rm = TRUE))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.