简体   繁体   English

防止dcast聚合数据

[英]Prevent dcast from aggregating data

I have data like: 我有类似的数据:

rating       title
5            Bean
5            Bean
4            Bean
5            Bean
5            Egg
4            Egg
3            Bacon
2            Bacon

And I want to dcast like 我想像

dcast(data, rating ~ title, value.var="rating")

So the titles become the column headers, and the ratings for each title are listed below. 因此,标题成为列标题,并且下面列出了每个标题的等级。 However, every time it aggregates them instead, but I don't want this. 但是,每次它将它们聚合在一起时,但是我不希望这样。

read.table(text="rating       title
5            Bean
5            Bean
4            Bean
5            Bean
5            Egg
4            Egg
3            Bacon
2            Bacon", header=TRUE, stringsAsFactors=FALSE) %>%
  dplyr::mutate(id = 1:n()) %>% 
  tidyr::spread(title, rating, fill = 0) %>% 
  dplyr::select(-id)
##   Bacon Bean Egg
## 1     0    5   0
## 2     0    5   0
## 3     0    4   0
## 4     0    5   0
## 5     0    0   5
## 6     0    0   4
## 7     3    0   0
## 8     2    0   0

It can be done with dplyr & tidyverse package : 可以使用dplyrtidyverse软件包来完成:

library(dplyr)
library(tidyverse)

data<-data.frame(rating=c(5,5,4,5,5,4,3,2),
                 title=c("Bean","Bean","Bean","Bean","Egg","Egg","Bacon","Bacon"))

Code : 代码:

data%>%mutate(dummy = 1:nrow(data)) %>% 
  spread(title, rating, fill = 0) %>% 
  select(-dummy)%>%t()

OUTPUT : 输出:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
Bacon    0    0    0    0    0    0    3    2
Bean     5    5    4    5    0    0    0    0
Egg      0    0    0    0    5    4    0    0

I'll provide an alternative using data.table , in the chance that your use of dcast is meaningful. data.table使您对dcast的使用有意义,我将提供一个使用data.table的替代方法。 In slight contrast to the other answers, I wonder if you are intending this for presentation vice actual summary, since there is no apparent contextual correlation between the different ratings. 与其他答案稍有不同,我想知道您是否打算将其用于演示副实际摘要,因为不同评分之间没有明显的上下文相关性。

library(data.table)
DT <- fread('rating       title
5            Bean
5            Bean
4            Bean
5            Bean
5            Egg
4            Egg
3            Bacon
2            Bacon')

First we need to assign some "id" that is preserved in the pivot. 首先,我们需要分配一些保留在数据透视表中的“ id”。 Similarly, since this is for presentation (and we likely want blanks in the unused spaces vice 0 or NA ), I'll convert the 类似地,由于这是为了演示(并且我们可能希望在未使用的空格中输入空白,反之为0NA ),因此我将转换

DT$rating <- as.character(DT$rating)
DT[, id := seq_len(.N), by="title"]
DT
#    rating title id
# 1:      5  Bean  1
# 2:      5  Bean  2
# 3:      4  Bean  3
# 4:      5  Bean  4
# 5:      5   Egg  1
# 6:      4   Egg  2
# 7:      3 Bacon  1
# 8:      2 Bacon  2


dcast(DT, id ~ title, value.var = "rating", fill = "")[,id := NULL,][]
#    Bacon Bean Egg
# 1:     3    5   5
# 2:     2    5   4
# 3:          4    
# 4:          5    

Note that this is not intended for calculations and analysis, merely for presentation. 请注意,这并非旨在进行计算和分析,仅用于演示。 If you want to keep everything numbers, then you'll end up with 如果您想保留所有数字,那么最终会得到

# starting with fresh `DT`, no as.character done
DT[, id := seq_len(.N), by="title"]
dcast(DT, id ~ title, value.var = "rating")[,id := NULL,][]
#    Bacon Bean Egg
# 1:     3    5   5
# 2:     2    5   4
# 3:    NA    4  NA
# 4:    NA    5  NA

or optionally use dcast(..., fill=0) to replace the NA s with 0 s. 或选择使用dcast(..., fill=0)NA替换为0 s。

(In this case, it is still not abundantly clear how the three values on any individual row relate to each other, but perhaps there's meaning in your real data/analysis.) (在这种情况下,仍然不清楚每个行上的三个值如何相互关联,但是在实际数据/分析中可能有意义。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM