简体   繁体   English

为什么我没有得到仅由 tidyverse 的其他分类变量分组的两个数字列的计数?

[英]why I do not get counts over two numerical columns grouped by other categorical vars with tidyverse only?

I have tried to to counts of two numerical variables and did not succeed.我试图对两个数值变量进行计数,但没有成功。 Without this I cannot achieve the Percentages, which I hope I am able to get with your help.没有这个我就无法达到百分比,我希望我能在你的帮助下得到。 I am trying to do this only with tidyverse.我只想用 tidyverse 来做到这一点。

This is the error I got, with the code provided:这是我得到的错误,提供了代码:

 test_sum <- test_data_3 %>%
    dplyr::group_by(across(where(is.factor))) %>% 
    dplyr::summarise(across(where(is.numeric())))


Error: Problem with `summarise()` input `..1`.
ℹ `..1 = across(where(is.numeric()))`.
x 0 arguments passed to 'is.numeric' which requires 1
Run `rlang::last_error()` to see where the error occurred.

The I tried another code:我尝试了另一个代码:

test_sum <- test_data_3 %>%
    dplyr::group_by(provider_name, type, st_nst) %>% 
    dplyr::summarise(across(where(is.numeric())))

Error: Problem with `summarise()` input `..1`.
ℹ `..1 = across(where(is.numeric()))`.
x 0 arguments passed to 'is.numeric' which requires 1
ℹ The error occurred in group 1: provider_name = "BLACKB", type = "stri", st_nst = "NST".

And this is the stack overflow source I have inspired to try the previous codes: Group by multiple columns and sum other multiple columns这是我启发尝试以前代码的堆栈溢出源: 按多列分组并对其他多列求和

And this is the type of data I have:这是我拥有的数据类型:

dput(test_data_3)
structure(list(financial_year = c(1920, 1920, 1920, 1920, 1920, 
1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 
1920, 1920, 1920, 1920), provider_name = c("LIVEW", "MANCHE", 
"MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", "MANCHE", 
"SOUTH", "LANCA", "COUNTY", "BUCKINGT", "BLACKB", "BURNLEY", 
"ROYAL", "THE", "LOUTH", "IMPERIAL", "WESTERN"), type = c("non_stringent", 
"non_stringent", "non_stringent", "non_stringent", "non_stringent", 
"non_stringent", "non_stringent", "non_stringent", "non_stringent", 
"non_stringent", "stri", "stri", "stri", "stri", "stri", "stri", 
"stri", "stri", "stri", "stri"), eld = c(0, 326, 343, 43, 61, 
46, 1, 3, 3, 1, 313, 671, 329, 389, 3, 376, 306, 0, 409, 589), 
    ed = c(1, 23, 23, 0, 2, 0, 1, 0, 0, 0, 7, 3, 4, 4, 0, 0, 
    2, 1, 3, 1), st_nst = c("ST", "STI", "ST", "ST", "ST", "ST", 
    "ST", "ST", "ST", "ST", "NST", "NST", "NSt", "NST", "NST", 
    "NST", "NST", "NST", "NST", "NST")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), spec = structure(list(
    cols = list(financial_year = structure(list(), class = c("collector_double", 
    "collector")), trust_code = structure(list(), class = c("collector_character", 
    "collector")), provider_name = structure(list(), class = c("collector_character", 
    "collector")), prim_diag = structure(list(), class = c("collector_character", 
    "collector")), type = structure(list(), class = c("collector_character", 
    "collector")), elective_discharge = structure(list(), class = c("collector_double", 
    "collector")), emergency_admission = structure(list(), class = c("collector_double", 
    "collector")), st_nst = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

Or another way of visualising is as such:或者另一种可视化方式是这样的:

test_data_3
# A tibble: 20 x 6
   financial_year provider_name type            eld    ed st_nst
            <dbl> <chr>         <chr>         <dbl> <dbl> <chr> 
 1           1920 LIVEW         non_stringent     0     1 ST    
 2           1920 MANCHE        non_stringent   326    23 STI   
 3           1920 MANCHE        non_stringent   343    23 ST    
 4           1920 MANCHE        non_stringent    43     0 ST    
 5           1920 MANCHE        non_stringent    61     2 ST    
 6           1920 MANCHE        non_stringent    46     0 ST    
 7           1920 MANCHE        non_stringent     1     1 ST    
 8           1920 MANCHE        non_stringent     3     0 ST    
 9           1920 MANCHE        non_stringent     3     0 ST    
10           1920 SOUTH         non_stringent     1     0 ST    
11           1920 LANCA         stri            313     7 NST   
12           1920 COUNTY        stri            671     3 NST   
13           1920 BUCKINGT      stri            329     4 NSt   
14           1920 BLACKB        stri            389     4 NST   
15           1920 BURNLEY       stri              3     0 NST   
16           1920 ROYAL         stri            376     0 NST   
17           1920 THE           stri            306     2 NST   
18           1920 LOUTH         stri              0     1 NST   
19           1920 IMPERIAL      stri            409     3 NST   
20           1920 WESTERN       stri            589     1 NST   

Can someone explain the mistakes I do?有人可以解释我犯的错误吗? Is there a way to achieve the counts first and then the percentages of the 2 numerical columns, namely eld & ed grouped by provider_name, type, st_nst .有没有办法先实现计数,然后是 2 个数字列的百分比,即按provider_name, type, st_nst分组的eld & ed I mean these two columns to be added together into a new one based on the group by variables.我的意思是将这两列根据变量分组添加到一个新列中。

There was no function passed into across .没有传递给across函数。 If the intention is to select the columns如果意图是select

library(dplyr)
test_data_3 %>%
    dplyr::group_by(across(where(is.factor))) %>% 
    dplyr::select(where(is.numeric))

Suppose, we want to get the sum of those numeric columns假设,我们想要得到这些numeric列的sum

test_data_3 %>%
    dplyr::group_by(across(where(is.factor))) %>% 
    dplyr::summarise(across(where(is.numeric), sum))

Update更新

If we want to get the sum per row of numeric columns, select the numeric columns ( where(is.numeric) ) from the data ( cur_data() - would be more correct as it can also work when there are group attributes or use . ), get the row wise sum with rowSums如果我们想获得每行数字列的总和,从数据( cur_data() select numeric列( where(is.numeric) )会更正确,因为它也可以在有组属性或使用时工作. ),使用rowSums获取行明智总和

test_data_3 %>% 
      mutate(count = select(cur_data(), where(is.numeric)) %>% 
                  rowSums(na.rm = TRUE))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 R 中将 10 列收集到一个列中,将其他 10 列收集到另一个列中,计数和频率仅使用 tidyverse - How to gather 10 columns into a column and other 10 columns into another colum, with counts and frequency with tidyverse only, in R tidyverse 使用两种不同的重塑策略(创建分类和二元列)执行 pivot_wider - tidyverse do a pivot_wider with two different reshaping strategies (creating categorical and binary columns) 为什么我没有在我的表中按字母顺序排序,在 R 中? 只有 tidyverse - why do I not get the ordering in alphabetical order in my table, in R? Only with tidyverse tidyverse - 按其他列分组的多列之间的相关性 - tidyverse - Correlations among multiple columns grouped by other column 仅对包含分类级别的变量使用“tidyverse”获取使用和响应的平均值 - get averages of use and response with 'tidyverse' only for variables containing categorical levels 为什么在加载软件包tidyverse时出现错误? - Why do I get an error when loading the package tidyverse? 尝试在 R 中绘制随时间变化的计数时,为什么会收到指向 Inf 值的错误消息? - Why do I get an error message pointing to Inf values when trying to plot counts over time in R? 如何使用 tidyverse 根据其他列中的事件数创建列? - How do I create columns based on the number of events in other columns using tidyverse? 获取R中多个变量/列的分类因子计数 - Get counts of categorical factors across multiple variables/columns in R Tidyverse:将数值数据转换为分类数据,以便在不均匀的分箱宽度下进行绘图 - Tidyverse: Converting numerical data into categorical data for plotting with uneven bin width
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM