简体   繁体   English

如何在 dplyr/forcats R 中使用因子(f)语法?

[英]How to use the factor(f) syntax in dplyr/ forcats package in R?

I am trying to do something very simple, which is use the forcats package in R to work with factors.我正在尝试做一些非常简单的事情,即使用 R 中的 forcats package 来处理因子。 I have a dataframe with some factor variables, one of which is gender, and I'm simply trying to count the occurrence of the variables using fct_count.我有一个带有一些因子变量的 dataframe,其中一个是性别,我只是试图使用 fct_count 来计算变量的出现。 The syntax is shown in the documentation as fct_count(f) (what could be easier.).语法在文档中显示为fct_count(f) (这可能更容易。)。

I'm trying to do this the dplyr way, using the pipe operator instead of the $ syntax to access the variables, but it doesn't seem to work.我正在尝试以 dplyr 方式执行此操作,使用 pipe 运算符而不是 $ 语法来访问变量,但它似乎不起作用。 Am I just fundamentally misunderstanding the syntax?我只是从根本上误解了语法吗?

pid <- c('id1','id2','id3','id4','id5','id6')
gender <- c('Male','Female','Other','Male','Female','Female')
df <- data.frame(pid, gender)
df <- as.tibble(df)
df
# A tibble: 6 x 2
  pid   gender
  <chr> <fct> 
1 id1   Male  
2 id2   Female
3 id3   Other 
4 id4   Male  
5 id5   Female
6 id6   Female
# This throws an error
df %>%
  mutate(gender = as.factor(gender)) %>%
  fct_count(gender) # Error: `f` must be a factor (or character vector).
# This works but doesn't use the nice dplyr select syntax
fct_count(df$gender)
# A tibble: 3 x 2
  f          n
  <fct>  <int>
1 Female     3
2 Male       2
3 Other      1

Where am I going wrong?我哪里错了? New to dplyr and sorry for such a daft question but I can't seem to find a basic example anywhere! dplyr 的新手,很抱歉这个愚蠢的问题,但我似乎无法在任何地方找到一个基本的例子!

fct_count takes a vector that is of type factor or char, it isn't especially aware of tibbles and dataframes. fct_count采用类型为因子或字符的向量,它并不特别了解小标题和数据帧。 So the simplest pipe would be...所以最简单的 pipe 将是......

library(dplyr)
library(forcats)

df %>%
   pull(gender) %>%
   fct_count 
#> # A tibble: 3 x 2
#>   f          n
#>   <fct>  <int>
#> 1 Female     3
#> 2 Male       2
#> 3 Other      1

Your data您的数据

pid <- c('id1','id2','id3','id4','id5','id6')
gender <- c('Male','Female','Other','Male','Female','Female')
df <- data.frame(pid, gender)
df <- tibble::as_tibble(df)
df

you could just use group_by and n()你可以只使用 group_by 和 n()

pid <- c('id1','id2','id3','id4','id5','id6')
gender <- c('Male','Female','Other','Male','Female','Female')
df <- data.frame(pid, gender)
df <- tibble::tibble(df)


df %>%
  dplyr::group_by(gender) %>%
  dplyr::summarise(cnt_gender = n()) %>% 
  dplyr::ungroup()



声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM