[英]I need to convert a column of dates into a time series
所以我有以下类型的数据集。 我有三个区域,每个区域恰好在一个日期下雨和冰雹,
area<-c("A","B","C")
rain<-c("1994/08/01","1994/08/01","1994/08/03")
hail<-c("1994/08/03","1994/08/04","1994/08/05")
data1<-as.data.frame(cbind(area,rain,hail))
data1
输出如下所示:
我拥有的数据类型
看起来像:
+-------+------------+------------+--+--+
| | | | | |
+-------+------------+------------+--+--+
| area | rain | hail | | |
| A | 1994/08/01 | 1994/08/03 | | |
| B | 1994/08/01 | 1994/08/04 | | |
| C | 1994/08/03 | 1994/08/05 | | |
+-------+------------+------------+--+--+
我想将其转换为每个区域的时间序列。 有点像长数据:
date<-as.Date(c("1994/08/01","1994/08/02","1994/08/03","1994/08/04","1994/08/05"))
date<-c(date,date,date)
area<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
rain<-c(1,0,0,0,0,1,0,0,0,0,0,0,1,0,0)
hail<-c(0,0,1,0,0,0,0,0,1,0,0,0,0,0,1)
data2<-as.data.frame(date)
data2<-cbind(data2,area,rain,hail)
data2
我想要的数据类型
或类似的东西:
------------+------+-------+------+--+
| date | area | rain | hail | |
+------------+------+-------+------+--+
| 1994-08-01 | A | 1 | 0 | |
| 1994-08-02 | A | 0 | 0 | |
| 1994-08-03 | A | 0 | 1 | |
| 1994-08-04 | A | 0 | 0 | |
| 1994-08-05 | A | 0 | 0 | |
| 1994-08-01 | B | 1 | 0 | |
| 1994-08-02 | B | 0 | 0 | |
| 1994-08-03 | B | 0 | 0 | |
| 1994-08-04 | B | 0 | 1 | |
| 1994-08-05 | B | 0 | 0 | |
+------------+------+-------+------+--+
这是非常非常规的,我确信没有 DPLYR 包可以做到这一点,但非常感谢任何帮助。 如果需要任何其他详细信息,请务必询问。
您可以使用tidyverse
函数执行此tidyverse
:
library(dplyr)
data1 %>%
mutate(across(c(rain, hail), lubridate::ymd),
date = list(seq(min(rain, hail), max(rain, hail), 'day'))) %>%
tidyr::unnest(date) %>%
mutate(across(c(rain, hail), ~+(. == date)))
# A tibble: 15 x 4
# area rain hail date
# <chr> <int> <int> <date>
# 1 A 1 0 1994-08-01
# 2 A 0 0 1994-08-02
# 3 A 0 1 1994-08-03
# 4 A 0 0 1994-08-04
# 5 A 0 0 1994-08-05
# 6 B 1 0 1994-08-01
# 7 B 0 0 1994-08-02
# 8 B 0 0 1994-08-03
# 9 B 0 1 1994-08-04
#10 B 0 0 1994-08-05
#11 C 0 0 1994-08-01
#12 C 0 0 1994-08-02
#13 C 1 0 1994-08-03
#14 C 0 0 1994-08-04
#15 C 0 1 1994-08-05
制作rain
和hail
日期列,在最小和最大日期之间创建一个序列,以长格式获取数据并为存在的日期分配 1/0 值。
使用pivot_longer
然后pivot_wider
从tidyr
。
您可以使用complete
获取每个area
complete
日期分布。
data1 %>%
pivot_longer(c(rain, hail), names_to = "weather", values_to = "date",
values_transform = list(date = as.Date)) %>%
mutate(min_date = min(date),
max_date = max(date)) %>%
group_by(area) %>%
complete(date = seq.Date(first(min_date), last(max_date), by="day")) %>%
pivot_wider(names_from = weather,
values_from = weather,
values_fn = length,
values_fill = 0) %>%
select(date, area, rain, hail)
# A tibble: 15 x 4
# Groups: area [3]
date area rain hail
<date> <fct> <int> <int>
1 1994-08-01 A 1 0
2 1994-08-02 A 0 0
3 1994-08-03 A 0 1
4 1994-08-04 A 0 0
5 1994-08-05 A 0 0
6 1994-08-01 B 1 0
7 1994-08-02 B 0 0
8 1994-08-03 B 0 0
9 1994-08-04 B 0 1
10 1994-08-05 B 0 0
11 1994-08-01 C 0 0
12 1994-08-02 C 0 0
13 1994-08-03 C 1 0
14 1994-08-04 C 0 0
15 1994-08-05 C 0 1
如果您更喜欢基础 R,并且想从基础函数中了解更多 R,您可以尝试:
data1[-1] <- lapply(data1[-1], as.Date) # Change the rain and hail columns to date
data2 <- merge(data1, do.call(seq,c(as.list(do.call(range,data1[-1])), by = 1))) #create data2
names(data2)[4] <- "date" # change the name of column y
data2[2:3] <- +sapply(data2[2:3],`==`,data2[,4]) # find the 1, 0
data.frame(data2[order(data2$area),], row.names = NULL) # Just arrange the data
area rain hail date
1 A 1 0 1994-08-01
2 A 0 0 1994-08-02
3 A 0 1 1994-08-03
4 A 0 0 1994-08-04
5 A 0 0 1994-08-05
6 B 1 0 1994-08-01
7 B 0 0 1994-08-02
8 B 0 0 1994-08-03
9 B 0 1 1994-08-04
10 B 0 0 1994-08-05
11 C 0 0 1994-08-01
12 C 0 0 1994-08-02
13 C 1 0 1994-08-03
14 C 0 0 1994-08-04
15 C 0 1 1994-08-05
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.