简体   繁体   中英

Dcast aggregate from melted data to long format

I did this successfully with one subset of a data frame but I can't seem to get it to work with my other subset. There is info for about 4000 orders with a range of 0 - 8 months and sentiment of 0-5.

The goal is to melt the data with id's of 'order' and 'month.of.service' and aggregate the mean sentiment for that month. The data frame looks like this:

order | month | sentiment
123   |   0   |     3
123   |   0   |     4
123   |   1   |     3
124   |   0   |     2

I want it to look like this:

123   |   0   |    3.5
123   |   1   |    3
124   |   0   |    2

Here's the actual code I've used:

sentiment.md <- melt(sentiment, id = c('Related.order', 'Lifespan'))
sentiment.dc <- dcast(sentiment.md, Related.order + Lifespan ~ value, sum)

> head(sentiment.md)
  Related.order Lifespan  variable value
1         12771        0 Sentiment     5
2         11188        1 Sentiment     3
3         12236        3 Sentiment     5
4         12925        0 Sentiment     5
5         12151        3 Sentiment     5
6         12338        0 Sentiment     5

> head(sentiment.dc)
  Related.order Lifespan   0   1   2   3   4   5
1          4976        0 NaN NaN NaN   3 NaN NaN
2          4976        1 NaN NaN NaN   3 NaN NaN
3          4976        2 NaN NaN NaN NaN   4 NaN
4          4976        3 NaN NaN NaN NaN   4 NaN
5          4976        4 NaN NaN NaN NaN   4 NaN
6          4976        5 NaN NaN NaN NaN   4 NaN

To demonstrate what I want it to look like further, here's the exact same thing using the only other column in the data frame in the format that I want, interactions:

interactions.md <- melt(interactions, id = c('Related.order', 'Lifespan'))
interactions.dc <- dcast(interactions.md, Related.order + Lifespan ~ value, sum)

> head(interactions.md)
  Related.order Lifespan variable value
1         12771        0    Event     1
2         11188        1    Event     1
3         12236        3    Event     1
4         12925        0    Event     1
5         12151        3    Event     1
6         12338        0    Event     1
> head(interactions.dc)
  Related.order Lifespan 1
1          4976        0 6
2          4976        1 3
3          4976        2 3
4          4976        3 1
5          4976        4 2
6          4976        5 2

I thought maybe I was using the wrong structures or something but haven't been able to identify anything. For reference, here's a screenshot from R-studio:

在此处输入图片说明 Thanks in advance for your help.

Perhaps you want to do some sort of aggregation / collapsing more than you want to dcast ?

library(data.table);
setDT(df)[, .(sentiment = mean(sentiment)), by = .(order, month)]
#   order month  V1
#1:   123     0 3.5
#2:   123     1 3.0
#3:   124     0 2.0

If you do want to do it with dcast you could try:

dcast(df, order + month ~ ., mean, value.var = "sentiment")

Or with dplyr :

df %>% group_by(order, month) %>% summarise(sentiment = mean(sentiment))

These are just some of the many examples of aggregating in R.


Data:

df <- structure(list(order = c(123L, 123L, 123L, 124L), month = c(0L, 
0L, 1L, 0L), sentiment = c(3L, 4L, 3L, 2L)), .Names = c("order", 
"month", "sentiment"), row.names = c(NA, -4L), class = "data.frame")

With base R, use aggregate .

aggregate(sentiment ~ month + order, sentiment, mean, na.rm = TRUE)[c(2, 1, 3)]
#  order month sentiment
#1   123     0       3.5
#2   123     1       3.0
#3   124     0       2.0

DATA.

sentiment <- read.table(text = "
order | month | sentiment
123   |   0   |     3
123   |   0   |     4
123   |   1   |     3
124   |   0   |     2
", header = TRUE, sep = "|")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM