[英]How do I fill the NA to next row in R?
我想將NA填入下一行。 這是數據集。
結構(列表(時間戳=結構(c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,11L,1L,2L,3L,4L,5L,6L,7L,8L,9L, 10L,11L),.Label = c(“ 2019-07-07 00:00:00”,“ 2019-07-07 00:00:01”,“ 2019-07-07 00:00:02”,“ 2019-07-07 00:00:03“,” 2019-07-07 00:00:04“,” 2019-07-07 00:00:05“,” 2019-07-07 00:00:06“ ,“ 2019-07-07 00:00:07”,“ 2019-07-07 00:00:08”,“ 2019-07-07 00:00:09”,“ 2019-07-07 00:00: 10“),類=”因子“),源=結構(c(NA,NA,NA,1L,NA,NA,1L,NA,NA,NA,NA,NA,NA,2L,NA,2L,NA,NA ,2L,NA,NA,2L,NA),.Label = c(“ USER_A”,“ USER_B”),class =“ factor”),value = c(NA,NA,NA,1L,NA,NA,1L ,NA,NA,NA,NA,NA,1L,NA,1L,NA,NA,2L,NA,NA,3L,NA)),類=“ data.frame”,row.names = c(NA,- 22L))
timestamp source value
1 2019-07-07 00:00:00 <NA> NA
2 2019-07-07 00:00:01 <NA> NA
3 2019-07-07 00:00:02 <NA> NA
4 2019-07-07 00:00:03 USER_A 1
5 2019-07-07 00:00:04 <NA> NA
6 2019-07-07 00:00:05 <NA> NA
7 2019-07-07 00:00:06 USER_A 1
8 2019-07-07 00:00:07 <NA> NA
9 2019-07-07 00:00:08 <NA> NA
10 2019-07-07 00:00:09 <NA> NA
11 2019-07-07 00:00:10 <NA> NA
12 2019-07-07 00:00:00 <NA> NA
13 2019-07-07 00:00:01 USER_B 1
14 2019-07-07 00:00:02 <NA> NA
15 2019-07-07 00:00:03 USER_B 1
16 2019-07-07 00:00:04 <NA> NA
17 2019-07-07 00:00:05 <NA> NA
18 2019-07-07 00:00:06 USER_B 2
19 2019-07-07 00:00:07 <NA> NA
20 2019-07-07 00:00:08 <NA> NA
21 2019-07-07 00:00:09 USER_B 3
22 2019-07-07 00:00:10 <NA> NA
該表是時間和源之間的各種循環。 每個源(A和B)都有固定的行(在這種情況下為00:00:00到00:00:10)。
這是預期結果表。
timestamp source value
1 2019-07-07 00:00:00 <NA> NA
2 2019-07-07 00:00:01 <NA> NA
3 2019-07-07 00:00:02 <NA> NA
4 2019-07-07 00:00:03 USER_A 1
5 2019-07-07 00:00:04 USER_A 1
6 2019-07-07 00:00:05 USER_A 1
7 2019-07-07 00:00:06 USER_A 1
8 2019-07-07 00:00:07 <NA> NA
9 2019-07-07 00:00:08 <NA> NA
10 2019-07-07 00:00:09 <NA> NA
11 2019-07-07 00:00:10 <NA> NA
12 2019-07-07 00:00:00 <NA> NA
13 2019-07-07 00:00:01 USER_B 1
14 2019-07-07 00:00:02 USER_B 1
15 2019-07-07 00:00:03 USER_B 1
16 2019-07-07 00:00:04 USER_B 2
17 2019-07-07 00:00:05 USER_B 2
18 2019-07-07 00:00:06 USER_B 2
19 2019-07-07 00:00:07 USER_B 3
20 2019-07-07 00:00:08 USER_B 3
21 2019-07-07 00:00:09 USER_B 3
22 2019-07-07 00:00:10 <NA> NA
根據USER_A,將5和6行的值和源替換為7行的值和源。 USER_B行也將基於下一行以相同方式替換。
如何在R中進行此過程?
這是使用dplyr
一種方法,因為每個source
都有固定數量的行。 我們首先為每n
行創建一個組,並添加一個新列group2
,該組僅在組中非NA值的min
和max
之間具有1。 然后,我們也通過group2
進行group_by
fill
,以按組fill
先前的非缺失值。
n <- 11
library(dplyr)
df %>%
group_by(group1 = gl(n()/n, n)) %>%
mutate(group2 = 0,
group2 = replace(group2, min(which(!is.na(source))) :
max(which(!is.na(source))), 1)) %>%
group_by(group2) %>%
tidyr::fill(source, value) %>%
ungroup() %>%
select(-group1, -group2)
# A tibble: 22 x 3
# timestamp source value
# <fct> <fct> <int>
# 1 2019-07-07 00:00:00 NA NA
# 2 2019-07-07 00:00:01 NA NA
# 3 2019-07-07 00:00:02 NA NA
# 4 2019-07-07 00:00:03 USER_A 1
# 5 2019-07-07 00:00:04 USER_A 1
# 6 2019-07-07 00:00:05 USER_A 1
# 7 2019-07-07 00:00:06 USER_A 1
# 8 2019-07-07 00:00:07 NA NA
# 9 2019-07-07 00:00:08 NA NA
#10 2019-07-07 00:00:09 NA NA
# … with 12 more rows
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.