简体   繁体   English

dplyr:将函数 table() 应用于 data.frame 的每一列

[英]dplyr: apply function table() to each column of a data.frame

Apply function table() to each column of a data.frame using dplyr使用 dplyr 将函数 table() 应用于 data.frame 的每一列

I often apply the table-function on each column of a data frame using plyr , like this:我经常使用plyr在数据框的每一列上应用表函数,如下所示:

library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) )  )

Is it possible to do this in dplyr also?是否也可以在dplyr 中执行此操作

My attempts fail:我的尝试失败了:

mtcars %>%  do( table %>% data.frame() )
melt( mtcars ) %>%  do( table %>% data.frame() )

You can try the following which does not rely on the tidyr package.您可以尝试以下不依赖于tidyr包的方法。

mtcars %>% 
   lapply(table) %>% 
   lapply(as.data.frame) %>% 
   Map(cbind,var = names(mtcars),.) %>% 
   rbind_all() %>% 
   group_by(var) %>% 
   mutate(pct = Freq / sum(Freq))

Using tidyverse (dplyr and purrr):使用 tidyverse(dplyr 和 purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )

Or simply:或者干脆:

library(tidyverse)

mtcars %>%
    map( table )

In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output.通常,您可能不想在数据框的每一列上运行table() ,因为至少有一个变量是唯一的(一个id字段)并产生很长的输出。 However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain.但是,您可以使用group_by()tally()来获取dplyr链中的频率表。 Or you can use count() which does the group_by() for you.或者您可以使用count() group_by()为您执行group_by()

> mtcars %>% 
    group_by(cyl) %>% 
    tally()
> # mtcars %>% count(cyl)

Source: local data frame [3 x 2]

  cyl  n
1   4 11
2   6  7
3   8 14

If you want to do a two-way frequency table, group by more than one variable.如果你想做一个双向频率表,按多个变量分组。

> mtcars %>% 
    group_by(gear, cyl) %>% 
    tally()
> # mtcars %>% count(gear, cyl)

You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.当输入两个变量时,您可以使用tidyr包的spread()将双向输出转换为用于使用table()接收的输出。

Solution by Caner did not work but from comenter akrun (credit goes to him), this solution worked great. Caner 的解决方案不起作用,但来自评论员 akrun(归功于他),这个解决方案效果很好。 Also using a much larger tibble to demo it.还使用更大的 tibble 来演示它。 Also I added an order by percent descending.我还按百分比降序添加了一个订单。

library(nycflights13);dim(flights)

tte<-gather(flights, Var, Val) %>% 
group_by(Var) %>% dplyr::mutate(n=n()) %>% 
group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>%
arrange(Var,desc(n1) %>% unique()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM