简体   繁体   English

如何填充R中data.frame的缺失值?

[英]How to fill in missing value of a data.frame in R?

I have multiple columns that has missing values .我有多个missing values columns I want to use the mean of the same day across all years while filling the missing data for each column .我想使用所有年份的同一天的mean ,同时fillingcolumnmissing数据。 for example, DF is my fake data where I see missing values for the two columns (A & X)例如, DF是我的假数据,其中我看到two columns (A & X) missing

library(lubridate)
library(tidyverse)
library(naniar)

set.seed(123)

DF <- data.frame(Date = seq(as.Date("1985-01-01"), to = as.Date("1987-12-31"), by = "day"),
                 A = sample(1:10,1095, replace = T), X = sample(5:15,1095, replace = T)) %>% 
                replace_with_na(replace = list(A = 2, X = 5))

To fill in Column A , i use the following codefill Column A ,我使用以下代码

Fill_DF_A <- DF %>% 
          mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>% 
          group_by(Year, Day) %>% 
  mutate(A = ifelse(is.na(A), mean(A, na.rm=TRUE), A))

I have many columns in my data.frame and I would like to generalize this for all the columns to fill in the missing value?我的data.frame有很多columns ,我想将其概括为所有columns以填充缺失值?

We can use na.aggregate from zoo我们可以使用zoo na.aggregate

library(dplyr)
library(zoo)
DF %>% 
  mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>% 
      group_by(Year, Day)  %>%
     mutate(across(A:X, na.aggregate))

Or if we prefer to use conditional statements或者如果我们更喜欢使用条件语句

DF %>% 
  mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>% 
  group_by(Year, Day)  %>%
  mutate(across(A:X, ~ case_when(is.na(.) 
                 ~ mean(., na.rm = TRUE), TRUE ~ as.numeric(.))))  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM