简体   繁体   中英

Split the dataframe by date in r

I have a dataset which has a date column and a data column.

The numbers of the rows of each date may not be the same, some of them may only have 10 rows instead of 24 rows. The dataset like this:

Date hour value
10-06-2000 1 4
2 5
3 7
4 7
5 8
6 1
7 7
8 2
9 3
10 4
11 5
12 7
13 8
14 9
15 10
16 12
17 1
18 4
19 7
20 9
21 10
22 7
23 8
24 9
11-06-2000 9 1
10 4
11 5
12 7
13 8
14 9
15 10
16 12
17 1
18 4
19 7
20 9
21 10
22 7
23 8
24 9

I want to split the dataset into multiple data frames by date. However, in the date variable, the elements between the two dates are empty. When I tried to use split function in base r, the function only returned the first row of each date:

$`2000-06-11`
            V1 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
264 2000-06-11 2  7  8  3   2   3   4   7   4   5   8

$`2000-06-12`
            V1 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
278 2000-06-12 2  2  9  6   2   3   1   0   1   4   4

Sorry for asking such simple question, I try to use for loop to handle this problem but the dataset is too large that the running speed is very slow.

If you know for certain that the data are sorted in the right order, you could use tidyr::fill :

library(tidyr)

df <- data.frame(
  Date = c("10-06-2000", rep(NA, 5), "11-06-2000", rep(NA, 12)),
  hour = c(4:9, 1:13),
  value = 1:19
  )

df_filled <- fill(df, Date, .direction = "down")

split(df_filled, df_filled$Date)


$`10-06-2000`
        Date hour value
1 10-06-2000    4     1
2 10-06-2000    5     2
3 10-06-2000    6     3
4 10-06-2000    7     4
5 10-06-2000    8     5
6 10-06-2000    9     6

$`11-06-2000`
         Date hour value
7  11-06-2000    1     7
8  11-06-2000    2     8
9  11-06-2000    3     9
10 11-06-2000    4    10
11 11-06-2000    5    11
12 11-06-2000    6    12
13 11-06-2000    7    13
14 11-06-2000    8    14
15 11-06-2000    9    15
16 11-06-2000   10    16
17 11-06-2000   11    17
18 11-06-2000   12    18
19 11-06-2000   13    19

You could also use group_split() in combination with fill() :

library(tidyr)
library(dplyr)
df <- data.frame(
  Date = c("10-06-2000", rep(NA, 5), "11-06-2000", rep(NA, 12)),
  hour = c(4:9, 1:13),
  value = 1:19
)

df_filled <- df |> 
  fill(Date, .direction = "down") |> 
  group_split(Date) |> 
  purrr::set_names(unique(df$Date)[!is.na(unique(df$Date))])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM