[英]how to fill the NA values by using known formula from another dataframe in r
我有一個名為“test1”的數據框,如下所示,(這里的“day”是“POSIXt”對象)
day Rain SWC_11 SWC_12 SWC_13 SWC_14 SWC_21
01/01/2019 00:00:00 0.0 51 60 63 60 64
02/01/2019 00:00:00 0.2 51.5 60.3 63.4 60.8 64.4
03/01/2019 00:00:00 0.0 51.3 60.3 63.3 60.6 64.1
04/01/2019 00:00:00 0.4 NA NA NA NA NA
05/01/2019 00:00:00 0.0 NA NA NA NA NA
06/01/2019 00:00:00 0.0 NA NA NA NA NA
07/01/2019 00:00:00 0.0 NA NA NA NA NA
08/01/2019 00:00:00 0.0 NA NA NA NA NA
09/01/2019 00:00:00 0.0 NA NA NA NA NA
10/01/2019 00:00:00 0.0 NA NA NA NA NA
另一個名為“test2”的數據框,如下所示
SWC_11_(Intercept) SWC_11_slope SWC_12_(Intercept) SWC_12_slope SWC_13_(Intercept) SWC_13_slope SWC_14_(Intercept) SWC_14_slope SWC_21(Intercept) SWC_21_slope
10471.95 -6.563423e-06 4063.32 -2.525118e-06 75040.76 -4.726106e-05 7742.763 -4.842427e-06 22965.85 -1.443707e-05
我現在想要做的是用相應的系數填充缺失的 (NA) 值。 我會有一個這樣的模型:
missing variables of SWC_11= SWC_11_(Intercept) + SWC_11_slope*day
missing variables of SWC_12= SWC_12_(Intercept)+ SWC_12_slope*day
其他列同理。 我認為這里sapply
功能應該有所幫助,
test1<- data.frame(sapply(test2, function(x) )))
但是現在我對如何編寫函數部分感到有些困惑。 希望有人能幫忙。 謝謝。
我建議采用tidyverse
方法,在該方法中您重塑數據,然后合並以計算缺失變量的值。 我不清楚這一天,所以我所做的是從您的日期變量中提取這一天,但如果有必要,您可以更改它。 您必須為變量名稱執行一些清理步驟,但一切都在代碼中。 這里的解決方案:
library(tidyverse)
#First format test2
test2 %>% pivot_longer(everything()) %>%
#Mutate for cleaning
mutate(name2=ifelse(grepl('Intercept',name),'Intercept','slope')) %>%
mutate(name=gsub('Intercept|slope','',name),name=substr(name,1,6)) %>%
#format to wide
pivot_wider(names_from = name2,values_from=value) %>%
#Left join with original test 1 in long format
left_join(
test1 %>% pivot_longer(-c(day,Rain)) %>%
#Format date to extract days
mutate(Day=as.numeric(format(as.Date(day,'%d/%m/%Y'),'%d')))) %>%
#Compute new values
mutate(value2=ifelse(is.na(value),Intercept+slope*Day,value)) %>%
select(name,day,Rain,value2) %>%
pivot_wider(names_from = name,values_from=value2)
輸出:
# A tibble: 10 x 7
day Rain SWC_11 SWC_12 SWC_13 SWC_14 SWC_21
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01/01/2019 00:00:00 0 51 60 63 60 64
2 02/01/2019 00:00:00 0.2 51.5 60.3 63.4 60.8 64.4
3 03/01/2019 00:00:00 0 51.3 60.3 63.3 60.6 64.1
4 04/01/2019 00:00:00 0.4 10472. 4063. 75041. 7743. 22966.
5 05/01/2019 00:00:00 0 10472. 4063. 75041. 7743. 22966.
6 06/01/2019 00:00:00 0 10472. 4063. 75041. 7743. 22966.
7 07/01/2019 00:00:00 0 10472. 4063. 75041. 7743. 22966.
8 08/01/2019 00:00:00 0 10472. 4063. 75041. 7743. 22966.
9 09/01/2019 00:00:00 0 10472. 4063. 75041. 7743. 22966.
10 10/01/2019 00:00:00 0 10472. 4063. 75041. 7743. 22966.
使用的一些數據:
#Data 1
test1 <- structure(list(day = c("01/01/2019 00:00:00", "02/01/2019 00:00:00",
"03/01/2019 00:00:00", "04/01/2019 00:00:00", "05/01/2019 00:00:00",
"06/01/2019 00:00:00", "07/01/2019 00:00:00", "08/01/2019 00:00:00",
"09/01/2019 00:00:00", "10/01/2019 00:00:00"), Rain = c(0, 0.2,
0, 0.4, 0, 0, 0, 0, 0, 0), SWC_11 = c(51, 51.5, 51.3, NA, NA,
NA, NA, NA, NA, NA), SWC_12 = c(60, 60.3, 60.3, NA, NA, NA, NA,
NA, NA, NA), SWC_13 = c(63, 63.4, 63.3, NA, NA, NA, NA, NA, NA,
NA), SWC_14 = c(60, 60.8, 60.6, NA, NA, NA, NA, NA, NA, NA),
SWC_21 = c(64, 64.4, 64.1, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-10L), class = "data.frame")
#Data2
test2 <- structure(list(SWC_11_.Intercept. = 10471.95, SWC_11_slope = -6.563423e-06,
SWC_12_.Intercept. = 4063.32, SWC_12_slope = -2.525118e-06,
SWC_13_.Intercept. = 75040.76, SWC_13_slope = -4.726106e-05,
SWC_14_.Intercept. = 7742.763, SWC_14_slope = -4.842427e-06,
SWC_21.Intercept. = 22965.85, SWC_21_slope = -1.443707e-05), class = "data.frame", row.names = c(NA,
-1L))
從概念上講,這個類似於@Duck 的解決方案,但步驟可能更少。
library(dplyr)
library(tidyr)
library(lubridate)
test2 %>%
#Get the data in long format with SWC number
pivot_longer(cols = everything(), names_to = c('name', '.value'),
names_pattern = '(SWC_\\d+).*(slope|Intercept)') %>%
#Join the data with test1
right_join(test1 %>% pivot_longer(cols = contains('SWC')), by = 'name') %>%
#Select first non-NA value between value and val
mutate(value = coalesce(value, Intercept + slope * day(day))) %>%
select(-Intercept, -slope) %>%
#Get the data in wide format
pivot_wider()
# A tibble: 10 x 7
# day Rain SWC_11 SWC_12 SWC_13 SWC_14 SWC_21
# <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2019-01-01 00:00:00 0 51 60 63 60 64
# 2 2019-01-02 00:00:00 0.2 51.5 60.3 63.4 60.8 64.4
# 3 2019-01-03 00:00:00 0 51.3 60.3 63.3 60.6 64.1
# 4 2019-01-04 00:00:00 0.4 10472. 4063. 75041. 7743. 22966.
# 5 2019-01-05 00:00:00 0 10472. 4063. 75041. 7743. 22966.
# 6 2019-01-06 00:00:00 0 10472. 4063. 75041. 7743. 22966.
# 7 2019-01-07 00:00:00 0 10472. 4063. 75041. 7743. 22966.
# 8 2019-01-08 00:00:00 0 10472. 4063. 75041. 7743. 22966.
# 9 2019-01-09 00:00:00 0 10472. 4063. 75041. 7743. 22966.
#10 2019-01-10 00:00:00 0 10472. 4063. 75041. 7743. 22966.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.