如何聚合R中的分类数据？

Question

I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). 我有一个数据框，其中包含两列带有分类变量的列（更好，更相似，更糟）。 I would like to come up with a table which counts the number of times that these categories appear in the two columns. 我想提出一个表来计算这些类别在两列中出现的次数。 The dataframe I am using is as follows: 我使用的数据框如下：

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

I would like to come up with a table like this: 我想提出一个这样的表格：

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

How would you go about it? 你会怎么做？

Answer 1

As mentioned in the comments, table is standard for this, like 正如评论中所提到的， table就是这样的标准

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or 要么

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or 要么

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0

Answer 2

sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0

Answer 3

One dplyr and tidyr possibility could be: 一个dplyr和tidyr可能性可能是：

df %>%
 gather(var, val) %>%
 count(var, val) %>%
 spread(var, n, fill = 0)

  val     Category.x Category.y
  <chr>        <dbl>      <dbl>
1 Better           2          2
2 Similar          1          2
3 Worse            1          0

It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. 首先，它将数据从宽格式转换为长格式，列“var”包括变量名称，列“val”表示相应的值。 Second, it counts per "var" and "val". 其次，它按“var”和“val”计算。 Finally, it spreads the data into the desired format. 最后，它将数据传播到所需的格式。

Or with dplyr and reshape2 you can do: 或者使用dplyr和reshape2您可以：

df %>%
 mutate(rowid = row_number()) %>%
 melt(., id.vars = "rowid") %>%
 count(variable, value) %>%
 dcast(value ~ variable, value.var = "n", fill = 0)

    value Category.x Category.y
1  Better          2          2
2 Similar          1          2
3   Worse          1          0

如何聚合R中的分类数据？

问题描述

3 个解决方案

解决方案1
7 已采纳 2019-04-02 16:48:35

解决方案2
3 2019-04-02 16:33:33

解决方案3
2 2019-04-02 16:41:37

如何聚合R中的分类数据？

问题描述

3 个解决方案

解决方案1 7 已采纳 2019-04-02 16:48:35

解决方案2 3 2019-04-02 16:33:33

解决方案3 2 2019-04-02 16:41:37

解决方案1
7 已采纳 2019-04-02 16:48:35

解决方案2
3 2019-04-02 16:33:33

解决方案3
2 2019-04-02 16:41:37