繁体   English   中英

按用户和特定日期合并时间序列数据

[英]Merge Time Series Data by user and a specific date

我有一个如下表所示的数据框。 它是用户的时间序列。

用户 日期 年龄 情绪评分
一个 9.19 20 1
一个 11.20 20 2
一个 12.10 20 3
b 9.30 19 1
b 10.1 19 4
c 12.1 21 5

我希望生成一个像这样的表。 Trail 1表示某个日期(例如 11 月 7 日)之前的平均情绪得分 Trail 2表示某个日期(例如 11 月 7 日)之后的平均情绪得分

User Age trial    Mean Sentiment Score
a    20  1          1-->(mean SentimentScore before 11.7)
a    20  2          2.5 -->(mean SentimentScoree after 11.7)
b    19  1          2.5--->(mean SentimentScoree before 11.7)
c    21  1          NA --->(mean SentimentScoree before 11.7)

library(data.table)

dt[, trial := fcase(Date <= as.Date("2021-11-07"), 1,
                    Date >  as.Date("2021-11-07"), 2)]

dt[,.( Mean.Sentiment.Score = mean(SentimentScore) ),
   by = .(User,Age,trial)]

结果:

   User Age trial Mean.Sentiment.Score
1:    a  20     1                  1.0
2:    a  20     2                  2.5
3:    b  19     1                  2.5
4:    c  21     2                  5.0

数据(我手动输入,您应该在问题中提供dput ):

library(data.table)
dt <- data.table(
    User = c("a", "a", "a", "b", "b", "c"),
    Date = as.Date(c("2021-09-19", "2021-11-20", "2021-12-10", "2021-09-30",
                     "2021-10-01", "2021-12-01")),
    Age = c(20, 20, 20, 19, 19, 21),
    SentimentScore = c(1, 2, 3, 1, 4, 5)
)
dt
#>    User       Date Age SentimentScore
#> 1:    a 2021-09-19  20              1
#> 2:    a 2021-11-20  20              2
#> 3:    a 2021-12-10  20              3
#> 4:    b 2021-09-30  19              1
#> 5:    b 2021-10-01  19              4
#> 6:    c 2021-12-01  21              5

代表 package (v2.0.0) 于 2021 年 4 月 28 日创建

这是你想要做的吗?

library(lubridate)
library(dplyr)
df %>% mutate(Date = as.Date(Date)) %>%
  group_by(User, Trial = ifelse(day(Date) > 7 & month(Date) >11, 2, 1)) %>%
  summarise(Age = mean(Age),
            SentimentScore = mean(SentimentScore), .groups = 'drop')

# A tibble: 4 x 4
  User  Trial   Age SentimentScore
  <chr> <dbl> <dbl>          <dbl>
1 a         1    20            1.5
2 a         2    20            3  
3 b         1    19            2.5
4 c         1    21            5 

使用的数据

df <- read.table(text = "User   Date    Age SentimentScore
a   2021-09-19  20  1
a   2021-11-20  20  2
a   2021-12-10  20  3
b   2021-09-30  19  1
b   2021-10-01  19  4
c   2021-12-01  21  5", header = T)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM