简体   繁体   English

如何使用 dplyr::c_across() 汇总不同类型的变量

[英]How to summarise across different types of variables with dplyr::c_across()

I have data with different types of variables.我有不同类型变量的数据。 Some are character, some factors, and some numeric, like below:一些是字符,一些因素,还有一些数字,如下所示:

df <- data.frame(a = c("tt", "ss", "ss", NA), b=c(2,3,NA,1), c=c(1,2,NA, NA), d=c("tt", "ss", "ss", NA))

I'm trying to count the number of missing values per observation using c_across in dplyr However, c_across doesn't seem to be able to combine different type of values, as the error message below suggests我正在尝试使用c_across中的dplyr计算每个观察值的缺失值数量但是, c_across似乎无法组合不同类型的值,如下面的错误消息所示

df %>%
  rowwise() %>%
  summarise(NAs = sum(is.na(c_across())))

Error: Problem with summarise() input NAs .错误: summarise()输入NAs x Can't combine a <factor> and b . x 不能组合a <factor> 和b ℹ Input NAs is sum(is.na(c_across())) . ℹ 输入NAssum(is.na(c_across())) ℹ The error occurred in row 1. ℹ 错误发生在第 1 行。

Indeed, if I include only numeric variables, it works.事实上,如果我只包含数字变量,它就可以工作。

df %>%
  rowwise() %>%
  summarise(NAs = sum(is.na(c_across(b:c))))

Same thing if I include only character variables如果我只包含字符变量,同样的事情

df %>%
  rowwise() %>%
  summarise(NAs = sum(is.na(c_across(c(a,d)))))

I could solve the issue without using c_across like below, but I have lots of variables, so it's not very practical.我可以在不使用c_across情况下解决这个问题,如下所示,但是我有很多变量,所以它不是很实用。

df %>%
  rowwise() %>%
  summarise(NAs = is.na(a)+is.na(b)+is.na(c)+is.na(d))

I could use the traditional apply approach, like below, but I'd like to solve this using dplyr .我可以使用传统的apply方法,如下所示,但我想使用dplyr解决这个dplyr

apply(df, 1, function(x)sum(is.na(x)))

Any suggestions as to how to compute the number of missing values, row-wise, efficiently, and using dplyr ?关于如何按行、有效地和使用dplyr计算缺失值数量的任何建议?

I would suggest this approach.我会建议这种方法。 The issue is because of two things.这个问题是因为两件事。 First, different type of variables in your dataframe an second that you need a key variable for the rowwise style task.首先,您的数据框中的不同类型的变量,您需要一个用于 rowwise 样式任务的关键变量。 So, in next code we first transform variables into a similar type, then we create an id based on the number of row.因此,在接下来的代码中,我们首先将变量转换为类似的类型,然后根据行数创建一个 id。 With this we use that element as input for rowwise() and then we can use c_across() function.有了这个,我们使用该元素作为rowwise()输入,然后我们可以使用c_across()函数。 Here the code (I have used you df data):这里的代码(我用过你的df数据):

library(tidyverse)
#Code
df %>% 
  mutate_at(vars(everything()),funs(as.character(.))) %>%
  mutate(id=1:n()) %>%
  rowwise(id) %>%
  mutate(NAs = sum(is.na(c_across(a:d))))

Output:输出:

# A tibble: 4 x 6
# Rowwise:  id
  a     b     c     d        id   NAs
  <chr> <chr> <chr> <chr> <int> <int>
1 tt    2     1     tt        1     0
2 ss    3     2     ss        2     0
3 ss    NA    NA    ss        3     2
4 NA    1     NA    NA        4     3

And we can avoid the mutate_at() function using the new across() with mutate() to homologate the variables:我们可以使用新的mutate_at() across()mutate()来避免mutate_at()函数来mutate_at()变量:

#Code 2
df %>% 
  mutate(across(a:d,~as.character(.))) %>%
  mutate(id=1:n()) %>%
  rowwise(id) %>%
  mutate(NAs = sum(is.na(c_across(a:d))))

Output:输出:

# A tibble: 4 x 6
# Rowwise:  id
  a     b     c     d        id   NAs
  <chr> <chr> <chr> <chr> <int> <int>
1 tt    2     1     tt        1     0
2 ss    3     2     ss        2     0
3 ss    NA    NA    ss        3     2
4 NA    1     NA    NA        4     3

A much faster option is not to use rowwise or c_across , but with rowSums一个更快的选择是不使用rowwisec_across ,而是使用rowSums

library(dplyr)
df %>% 
     mutate(NAs = rowSums(is.na(.)))
#     a  b  c    d NAs
#1   tt  2  1   tt   0
#2   ss  3  2   ss   0
#3   ss NA NA   ss   2
#4 <NA>  1 NA <NA>   3

If we want to select certain columns ie numeric如果我们想select某些列,即numeric

df %>%
   mutate(NAs = rowSums(is.na(select(., where(is.numeric)))))
#     a  b  c    d NAs
#1   tt  2  1   tt   0
#2   ss  3  2   ss   0
#3   ss NA NA   ss   2
#4 <NA>  1 NA <NA>   1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM