简体   繁体   中英

How to aggregate categorical data in R?

I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns. The dataframe I am using is as follows:

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

I would like to come up with a table like this:

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

How would you go about it?

As mentioned in the comments, table is standard for this, like

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0

One dplyr and tidyr possibility could be:

df %>%
 gather(var, val) %>%
 count(var, val) %>%
 spread(var, n, fill = 0)

  val     Category.x Category.y
  <chr>        <dbl>      <dbl>
1 Better           2          2
2 Similar          1          2
3 Worse            1          0

It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.

Or with dplyr and reshape2 you can do:

df %>%
 mutate(rowid = row_number()) %>%
 melt(., id.vars = "rowid") %>%
 count(variable, value) %>%
 dcast(value ~ variable, value.var = "n", fill = 0)

    value Category.x Category.y
1  Better          2          2
2 Similar          1          2
3   Worse          1          0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM