[英]How do I represent and merge time-series data-frames with a *date range* in R?
[英]Merge Time Series Data by user and a specific date
我有一个如下表所示的数据框。 它是用户的时间序列。
用户 | 日期 | 年龄 | 情绪评分 |
---|---|---|---|
一个 | 9.19 | 20 | 1 |
一个 | 11.20 | 20 | 2 |
一个 | 12.10 | 20 | 3 |
b | 9.30 | 19 | 1 |
b | 10.1 | 19 | 4 |
c | 12.1 | 21 | 5 |
我希望生成一个像这样的表。 Trail 1表示某个日期(例如 11 月 7 日)之前的平均情绪得分。 Trail 2表示某个日期(例如 11 月 7 日)之后的平均情绪得分。
User Age trial Mean Sentiment Score
a 20 1 1-->(mean SentimentScore before 11.7)
a 20 2 2.5 -->(mean SentimentScoree after 11.7)
b 19 1 2.5--->(mean SentimentScoree before 11.7)
c 21 1 NA --->(mean SentimentScoree before 11.7)
library(data.table)
dt[, trial := fcase(Date <= as.Date("2021-11-07"), 1,
Date > as.Date("2021-11-07"), 2)]
dt[,.( Mean.Sentiment.Score = mean(SentimentScore) ),
by = .(User,Age,trial)]
结果:
User Age trial Mean.Sentiment.Score
1: a 20 1 1.0
2: a 20 2 2.5
3: b 19 1 2.5
4: c 21 2 5.0
数据(我手动输入,您应该在问题中提供dput
):
library(data.table)
dt <- data.table(
User = c("a", "a", "a", "b", "b", "c"),
Date = as.Date(c("2021-09-19", "2021-11-20", "2021-12-10", "2021-09-30",
"2021-10-01", "2021-12-01")),
Age = c(20, 20, 20, 19, 19, 21),
SentimentScore = c(1, 2, 3, 1, 4, 5)
)
dt
#> User Date Age SentimentScore
#> 1: a 2021-09-19 20 1
#> 2: a 2021-11-20 20 2
#> 3: a 2021-12-10 20 3
#> 4: b 2021-09-30 19 1
#> 5: b 2021-10-01 19 4
#> 6: c 2021-12-01 21 5
由代表 package (v2.0.0) 于 2021 年 4 月 28 日创建
这是你想要做的吗?
library(lubridate)
library(dplyr)
df %>% mutate(Date = as.Date(Date)) %>%
group_by(User, Trial = ifelse(day(Date) > 7 & month(Date) >11, 2, 1)) %>%
summarise(Age = mean(Age),
SentimentScore = mean(SentimentScore), .groups = 'drop')
# A tibble: 4 x 4
User Trial Age SentimentScore
<chr> <dbl> <dbl> <dbl>
1 a 1 20 1.5
2 a 2 20 3
3 b 1 19 2.5
4 c 1 21 5
使用的数据
df <- read.table(text = "User Date Age SentimentScore
a 2021-09-19 20 1
a 2021-11-20 20 2
a 2021-12-10 20 3
b 2021-09-30 19 1
b 2021-10-01 19 4
c 2021-12-01 21 5", header = T)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.