簡體   English   中英

如何使用來自 r 中另一個數據幀的已知公式填充 NA 值

[英]how to fill the NA values by using known formula from another dataframe in r

我有一個名為“test1”的數據框,如下所示,(這里的“day”是“POSIXt”對象)

 day                     Rain      SWC_11    SWC_12    SWC_13    SWC_14   SWC_21   
01/01/2019  00:00:00     0.0        51    60      63         60        64 
02/01/2019  00:00:00     0.2        51.5      60.3      63.4     60.8      64.4
03/01/2019  00:00:00     0.0        51.3      60.3      63.3     60.6      64.1 
04/01/2019  00:00:00     0.4        NA        NA        NA       NA        NA   
05/01/2019  00:00:00     0.0        NA        NA        NA       NA        NA
06/01/2019  00:00:00     0.0        NA        NA        NA       NA        NA
07/01/2019  00:00:00     0.0        NA        NA        NA       NA        NA
08/01/2019  00:00:00     0.0        NA        NA        NA       NA        NA
09/01/2019  00:00:00     0.0        NA        NA        NA       NA        NA
10/01/2019  00:00:00     0.0        NA        NA        NA       NA        NA

另一個名為“test2”的數據框,如下所示

    SWC_11_(Intercept)  SWC_11_slope  SWC_12_(Intercept)  SWC_12_slope  SWC_13_(Intercept)  SWC_13_slope  SWC_14_(Intercept)  SWC_14_slope  SWC_21(Intercept)  SWC_21_slope
    10471.95            -6.563423e-06    4063.32          -2.525118e-06     75040.76        -4.726106e-05        7742.763    -4.842427e-06     22965.85       -1.443707e-05

我現在想要做的是用相應的系數填充缺失的 (NA) 值。 我會有一個這樣的模型:

missing variables of SWC_11= SWC_11_(Intercept) + SWC_11_slope*day
missing variables of SWC_12= SWC_12_(Intercept)+ SWC_12_slope*day

其他列同理。 我認為這里sapply功能應該有所幫助,

 test1<- data.frame(sapply(test2, function(x) )))

但是現在我對如何編寫函數部分感到有些困惑。 希望有人能幫忙。 謝謝。

我建議采用tidyverse方法,在該方法中您重塑數據,然后合並以計算缺失變量的值。 我不清楚這一天,所以我所做的是從您的日期變量中提取這一天,但如果有必要,您可以更改它。 您必須為變量名稱執行一些清理步驟,但一切都在代碼中。 這里的解決方案:

library(tidyverse)
#First format test2
test2 %>% pivot_longer(everything()) %>%
  #Mutate for cleaning
  mutate(name2=ifelse(grepl('Intercept',name),'Intercept','slope')) %>%
  mutate(name=gsub('Intercept|slope','',name),name=substr(name,1,6)) %>%
  #format to wide
  pivot_wider(names_from = name2,values_from=value) %>%
  #Left join with original test 1 in long format
  left_join(
    test1 %>% pivot_longer(-c(day,Rain)) %>%
      #Format date to extract days
      mutate(Day=as.numeric(format(as.Date(day,'%d/%m/%Y'),'%d')))) %>%
  #Compute new values
  mutate(value2=ifelse(is.na(value),Intercept+slope*Day,value)) %>%
  select(name,day,Rain,value2) %>%
  pivot_wider(names_from = name,values_from=value2)

輸出:

# A tibble: 10 x 7
   day                  Rain  SWC_11 SWC_12  SWC_13 SWC_14  SWC_21
   <chr>               <dbl>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
 1 01/01/2019 00:00:00   0      51     60      63     60      64  
 2 02/01/2019 00:00:00   0.2    51.5   60.3    63.4   60.8    64.4
 3 03/01/2019 00:00:00   0      51.3   60.3    63.3   60.6    64.1
 4 04/01/2019 00:00:00   0.4 10472.  4063.  75041.  7743.  22966. 
 5 05/01/2019 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
 6 06/01/2019 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
 7 07/01/2019 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
 8 08/01/2019 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
 9 09/01/2019 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
10 10/01/2019 00:00:00   0   10472.  4063.  75041.  7743.  22966. 

使用的一些數據:

#Data 1
test1 <- structure(list(day = c("01/01/2019 00:00:00", "02/01/2019 00:00:00", 
"03/01/2019 00:00:00", "04/01/2019 00:00:00", "05/01/2019 00:00:00", 
"06/01/2019 00:00:00", "07/01/2019 00:00:00", "08/01/2019 00:00:00", 
"09/01/2019 00:00:00", "10/01/2019 00:00:00"), Rain = c(0, 0.2, 
0, 0.4, 0, 0, 0, 0, 0, 0), SWC_11 = c(51, 51.5, 51.3, NA, NA, 
NA, NA, NA, NA, NA), SWC_12 = c(60, 60.3, 60.3, NA, NA, NA, NA, 
NA, NA, NA), SWC_13 = c(63, 63.4, 63.3, NA, NA, NA, NA, NA, NA, 
NA), SWC_14 = c(60, 60.8, 60.6, NA, NA, NA, NA, NA, NA, NA), 
    SWC_21 = c(64, 64.4, 64.1, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-10L), class = "data.frame")

#Data2
test2 <- structure(list(SWC_11_.Intercept. = 10471.95, SWC_11_slope = -6.563423e-06, 
    SWC_12_.Intercept. = 4063.32, SWC_12_slope = -2.525118e-06, 
    SWC_13_.Intercept. = 75040.76, SWC_13_slope = -4.726106e-05, 
    SWC_14_.Intercept. = 7742.763, SWC_14_slope = -4.842427e-06, 
    SWC_21.Intercept. = 22965.85, SWC_21_slope = -1.443707e-05), class = "data.frame", row.names = c(NA, 
-1L))

從概念上講,這個類似於@Duck 的解決方案,但步驟可能更少。

library(dplyr)
library(tidyr)
library(lubridate)

test2 %>%
  #Get the data in long format with SWC number
  pivot_longer(cols = everything(), names_to = c('name', '.value'), 
               names_pattern = '(SWC_\\d+).*(slope|Intercept)') %>%
  #Join the data with test1
  right_join(test1 %>% pivot_longer(cols = contains('SWC')), by = 'name') %>% 
  #Select first non-NA value between value and val
  mutate(value = coalesce(value, Intercept + slope * day(day))) %>%
  select(-Intercept, -slope) %>%
  #Get the data in wide format
  pivot_wider()

# A tibble: 10 x 7
#   day                  Rain  SWC_11 SWC_12  SWC_13 SWC_14  SWC_21
#   <dttm>              <dbl>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
# 1 2019-01-01 00:00:00   0      51     60      63     60      64  
# 2 2019-01-02 00:00:00   0.2    51.5   60.3    63.4   60.8    64.4
# 3 2019-01-03 00:00:00   0      51.3   60.3    63.3   60.6    64.1
# 4 2019-01-04 00:00:00   0.4 10472.  4063.  75041.  7743.  22966. 
# 5 2019-01-05 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
# 6 2019-01-06 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
# 7 2019-01-07 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
# 8 2019-01-08 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
# 9 2019-01-09 00:00:00   0   10472.  4063.  75041.  7743.  22966. 
#10 2019-01-10 00:00:00   0   10472.  4063.  75041.  7743.  22966.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM