简体   繁体   English

如何聚合R中的分类数据?

[英]How to aggregate categorical data in R?

I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). 我有一个数据框,其中包含两列带有分类变量的列(更好,更相似,更糟)。 I would like to come up with a table which counts the number of times that these categories appear in the two columns. 我想提出一个表来计算这些类别在两列中出现的次数。 The dataframe I am using is as follows: 我使用的数据框如下:

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

I would like to come up with a table like this: 我想提出一个这样的表格:

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

How would you go about it? 你会怎么做?

As mentioned in the comments, table is standard for this, like 正如评论中所提到的, table就是这样的标准

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or 要么

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or 要么

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0

One dplyr and tidyr possibility could be: 一个dplyrtidyr可能性可能是:

df %>%
 gather(var, val) %>%
 count(var, val) %>%
 spread(var, n, fill = 0)

  val     Category.x Category.y
  <chr>        <dbl>      <dbl>
1 Better           2          2
2 Similar          1          2
3 Worse            1          0

It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. 首先,它将数据从宽格式转换为长格式,列“var”包括变量名称,列“val”表示相应的值。 Second, it counts per "var" and "val". 其次,它按“var”和“val”计算。 Finally, it spreads the data into the desired format. 最后,它将数据传播到所需的格式。

Or with dplyr and reshape2 you can do: 或者使用dplyrreshape2您可以:

df %>%
 mutate(rowid = row_number()) %>%
 melt(., id.vars = "rowid") %>%
 count(variable, value) %>%
 dcast(value ~ variable, value.var = "n", fill = 0)

    value Category.x Category.y
1  Better          2          2
2 Similar          1          2
3   Worse          1          0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM