[英]Dcast aggregate from melted data to long format
I did this successfully with one subset of a data frame but I can't seem to get it to work with my other subset. 我成功完成了数据框的一个子集的操作,但似乎无法使其与其他子集一起使用。 There is info for about 4000 orders with a range of 0 - 8 months and sentiment of 0-5. 有大约4000个订单的信息,范围为0-8个月,情绪为0-5。
The goal is to melt the data with id's of 'order' and 'month.of.service' and aggregate the mean sentiment for that month. 目标是融合ID为“ order”和“ month.of.service”的数据,并汇总该月的平均情绪。 The data frame looks like this: 数据框如下所示:
order | month | sentiment
123 | 0 | 3
123 | 0 | 4
123 | 1 | 3
124 | 0 | 2
I want it to look like this: 我希望它看起来像这样:
123 | 0 | 3.5
123 | 1 | 3
124 | 0 | 2
Here's the actual code I've used: 这是我使用的实际代码:
sentiment.md <- melt(sentiment, id = c('Related.order', 'Lifespan'))
sentiment.dc <- dcast(sentiment.md, Related.order + Lifespan ~ value, sum)
> head(sentiment.md)
Related.order Lifespan variable value
1 12771 0 Sentiment 5
2 11188 1 Sentiment 3
3 12236 3 Sentiment 5
4 12925 0 Sentiment 5
5 12151 3 Sentiment 5
6 12338 0 Sentiment 5
> head(sentiment.dc)
Related.order Lifespan 0 1 2 3 4 5
1 4976 0 NaN NaN NaN 3 NaN NaN
2 4976 1 NaN NaN NaN 3 NaN NaN
3 4976 2 NaN NaN NaN NaN 4 NaN
4 4976 3 NaN NaN NaN NaN 4 NaN
5 4976 4 NaN NaN NaN NaN 4 NaN
6 4976 5 NaN NaN NaN NaN 4 NaN
To demonstrate what I want it to look like further, here's the exact same thing using the only other column in the data frame in the format that I want, interactions: 为了进一步说明我希望它看起来像什么,这是使用数据框中我希望的格式的唯一其他列进行交互的完全相同的事情:
interactions.md <- melt(interactions, id = c('Related.order', 'Lifespan'))
interactions.dc <- dcast(interactions.md, Related.order + Lifespan ~ value, sum)
> head(interactions.md)
Related.order Lifespan variable value
1 12771 0 Event 1
2 11188 1 Event 1
3 12236 3 Event 1
4 12925 0 Event 1
5 12151 3 Event 1
6 12338 0 Event 1
> head(interactions.dc)
Related.order Lifespan 1
1 4976 0 6
2 4976 1 3
3 4976 2 3
4 4976 3 1
5 4976 4 2
6 4976 5 2
I thought maybe I was using the wrong structures or something but haven't been able to identify anything. 我以为也许我使用了错误的结构或某些东西,但无法识别任何东西。 For reference, here's a screenshot from R-studio: 供参考,这是R-studio的屏幕截图:
Perhaps you want to do some sort of aggregation / collapsing more than you want to dcast
? 也许您想要进行更多的聚合/折叠而不是进行dcast
?
library(data.table);
setDT(df)[, .(sentiment = mean(sentiment)), by = .(order, month)]
# order month V1
#1: 123 0 3.5
#2: 123 1 3.0
#3: 124 0 2.0
If you do want to do it with dcast
you could try: 如果您确实想使用dcast
进行操作,则可以尝试:
dcast(df, order + month ~ ., mean, value.var = "sentiment")
Or with dplyr
: 或使用dplyr
:
df %>% group_by(order, month) %>% summarise(sentiment = mean(sentiment))
These are just some of the many examples of aggregating in R. 这些只是R中许多聚合示例中的一些。
Data: 数据:
df <- structure(list(order = c(123L, 123L, 123L, 124L), month = c(0L,
0L, 1L, 0L), sentiment = c(3L, 4L, 3L, 2L)), .Names = c("order",
"month", "sentiment"), row.names = c(NA, -4L), class = "data.frame")
With base R, use aggregate
. 对于基数R,请使用aggregate
。
aggregate(sentiment ~ month + order, sentiment, mean, na.rm = TRUE)[c(2, 1, 3)]
# order month sentiment
#1 123 0 3.5
#2 123 1 3.0
#3 124 0 2.0
DATA. 数据。
sentiment <- read.table(text = "
order | month | sentiment
123 | 0 | 3
123 | 0 | 4
123 | 1 | 3
124 | 0 | 2
", header = TRUE, sep = "|")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.