简体   繁体   中英

dplyr: how can I write equivalent to table() base function and preserve column names?

dplyr is fast and I would like to use the %.% piping a lot. I want to use a table function (count by frequency) and preserve column name and have output be data.frame.

How can I achieve the same as the code below using only dplyr functions (imagine huge data.table (BIGiris) with 6M rows)

> out<-as.data.frame(table(iris$Species))
> names(out)[1]<-'Species'
> names(out)[2]<-'my_cnt1'
> out

output is this. Notice that I have to rename back column 1. Also, in dplyr mutate or other call - I would like to specify name for my new count column somehow.

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

imagine joining to a table like this (assume iris data.frame has 6M rows) and species is more like "species_ID"

> habitat<-data.frame(Species=c('setosa'),lives_in='sea')

final join and output (for joining, I need to preserve column names all the time)

> left_join(out,habitat)
Joining by: "Species"
     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50     <NA>
3  virginica      50     <NA>
> 

For the first part you can use dplyr like this

library(dplyr)
out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n())
out

Source: local data frame [3 x 2]

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

To continue in one chain do this:

out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n()) %>% left_join(habitat)
out

Source: local data frame [3 x 3]

     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50       NA
3  virginica      50       NA

By the way, dplyr now uses %>% in place of %.% . It does the same thing and is part of the package magrittr as well.

Or you can simply attach the dataframe and then run the table function. This will display the column names too.

> attach(iris)
> table(Species)
 Species
    setosa versicolor  virginica 
        50         50         50

count() may be a convenient option to get behavior similar to table() :

iris %>% 
  group_by(Species) %>% 
  count(name="my_cnt1")

For table() -like output with two factors:

iris %>% 
  group_by(Species) %>% 
  count(Petal.Width) %>% 
  pivot_wider(names_from = Petal.Width, values_from=n)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM