简体   繁体   中英

Fill zeros for missing values in R

I am trying to deal with this problem. I have a df with a date column and I want to count the occurences per hour. Here is what I've done:

x <- df %>% 
  mutate(hora = hour(date)) %>% 
  select(hora) %>% 
  count(hora)

that gives as a result:

> x
# A tibble: 19 x 2
    hora     n
   <int> <int>
 1     0     1
 2     1     1
 3     3     1
 4     8     4
 5     9     7
 6    10    10
 7    11    14
 8    12    10
 9    13     8
10    14     4
11    15     5
12    16    12
13    17     4
14    18    12
15    19     9
16    20     5
17    21     2
18    22     4
19    23     4

As you can see, there are hours that don't show up that would have n=0 , like 2 or 4:7 . What I want is it to add the hours that are not in x with n=0 so the table is complete.

The expected output should be something like this:

 hora  n
1     0 12
2     1  3
3     2  5
4     3  7
5     4  8
6     5  1
7     6  0
8     7 11
9     8  6
10    9 10
11   10  9
12   11  0
13   12  0
14   13  3
15   14  0
16   15  7
17   16  8
18   17  1
19   18  2
20   19 11
21   20  6
22   21 10
23   22  9
24   23  4

I tried creating a table with hours 0:23 and all n=0 and trying to sum the two tables but obviously that didn't work. I also tried x$hour <- 0:23 , thinking that the missing values would be added, but it didn't work as well.

You could convert hora to factor and use .drop = FALSE in count

library(dplyr)
library(lubridate)

df %>%  
   mutate(hora = factor(hour(date), levels = 0:23)) %>%  
   count(hora, .drop = FALSE)

Another option is to use complete :

df %>%  
  mutate(hora = hour(date)) %>%  
  count(hora) %>%
  tidyr::complete(hora = 0:23, fill = list(n = 0))

A solution in Base R merges a vector of hours with the summarized data, and sets the missing counts to 0.

textFile <- "row hour count
1     0     1
 2     1     1
 3     3     1
 4     8     4
 5     9     7
 6    10    10
 7    11    14
 8    12    10
 9    13     8
10    14     4
11    15     5
12    16    12
13    17     4
14    18    12
15    19     9
16    20     5
17    21     2
18    22     4
19    23     4"

data <- read.table(text = textFile,header = TRUE)[-1]
hours <- data.frame(hour = 0:23)
merged <- merge(data,hours,all.y = TRUE)
merged[is.na(merged$count),"count"] <- 0

...and the output:

> head(merged)
  hour count
1    0     1
2    1     1
3    2     0
4    3     1
5    4     0
6    5     0
> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM