简体   繁体   中英

Merge Time Series Data by user and a specific date

I have a data frame like the below table. It is a time series by the user.

User Date Age SentimentScore
a 9.19 20 1
a 11.20 20 2
a 12.10 20 3
b 9.30 19 1
b 10.1 19 4
c 12.1 21 5

I hope to generate a table like this one. Trail 1 means the Mean Sentiment score before a certain date(eg. Nov 7th). Trail 2 means the Mean Sentiment score after a certain date(eg. Nov 7th).

User Age trial    Mean Sentiment Score
a    20  1          1-->(mean SentimentScore before 11.7)
a    20  2          2.5 -->(mean SentimentScoree after 11.7)
b    19  1          2.5--->(mean SentimentScoree before 11.7)
c    21  1          NA --->(mean SentimentScoree before 11.7)

library(data.table)

dt[, trial := fcase(Date <= as.Date("2021-11-07"), 1,
                    Date >  as.Date("2021-11-07"), 2)]

dt[,.( Mean.Sentiment.Score = mean(SentimentScore) ),
   by = .(User,Age,trial)]

Result:

   User Age trial Mean.Sentiment.Score
1:    a  20     1                  1.0
2:    a  20     2                  2.5
3:    b  19     1                  2.5
4:    c  21     2                  5.0

data(I type it by hand, your should provide it with dput in your question):

library(data.table)
dt <- data.table(
    User = c("a", "a", "a", "b", "b", "c"),
    Date = as.Date(c("2021-09-19", "2021-11-20", "2021-12-10", "2021-09-30",
                     "2021-10-01", "2021-12-01")),
    Age = c(20, 20, 20, 19, 19, 21),
    SentimentScore = c(1, 2, 3, 1, 4, 5)
)
dt
#>    User       Date Age SentimentScore
#> 1:    a 2021-09-19  20              1
#> 2:    a 2021-11-20  20              2
#> 3:    a 2021-12-10  20              3
#> 4:    b 2021-09-30  19              1
#> 5:    b 2021-10-01  19              4
#> 6:    c 2021-12-01  21              5

Created on 2021-04-28 by the reprex package (v2.0.0)

Is this what you are trying to do?

library(lubridate)
library(dplyr)
df %>% mutate(Date = as.Date(Date)) %>%
  group_by(User, Trial = ifelse(day(Date) > 7 & month(Date) >11, 2, 1)) %>%
  summarise(Age = mean(Age),
            SentimentScore = mean(SentimentScore), .groups = 'drop')

# A tibble: 4 x 4
  User  Trial   Age SentimentScore
  <chr> <dbl> <dbl>          <dbl>
1 a         1    20            1.5
2 a         2    20            3  
3 b         1    19            2.5
4 c         1    21            5 

data used

df <- read.table(text = "User   Date    Age SentimentScore
a   2021-09-19  20  1
a   2021-11-20  20  2
a   2021-12-10  20  3
b   2021-09-30  19  1
b   2021-10-01  19  4
c   2021-12-01  21  5", header = T)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM