繁体   English   中英

如何用R中一些简洁高效的代码填充不同数据框中列中的缺失观测值(N / As)?

[英]How to populate missing observations (N/As) in columns from different data frames with some elegant and efficient code in R?

问题与目标

有三个R数据帧,它们具有相同的结构,但是设置为三个不同的时间频率(季度“ _q”,半年“ _sa”和年份“ _y”)。 目标是仅以季度频率(“ data_q”)填充数据帧中每个变量(列)的缺失观测值(N / As),并以半年(“ data_sa”)和每年( “ data_y”)数据帧。

原始数据帧是

data_q <-data.frame(date=as.Date(c('2010-03-31','2010-06-30','2010-09-30','2010-12-31','2011-03-31','2011-06-30','2011-09-30','2011-12-31','2012-03-31','2012-06-30','2012-09-30','2012-12-31')),
                cost_q=c('20','N/A','4','7','9','43','N/A','2','5','N/A','N/A','N/A'),
                rate_q=c('500','N/A','600','50','830','260','N/A','560','800','N/A','N/A','N/A'));

data_sa <- data.frame(date=as.Date(c('2010-06-30','2010-12-31','2011-06-30','2011-12-31','2012-06-30','2012-12-31')),
                  cost_sa=c('100','N/A','N/A','N/A','100','N/A'),
                  rate_sa=c('1000','N/A','N/A','N/A','1000','N/A'));
data_y <- data.frame(date=as.Date(c('2010-12-31','2011-12-31','2012-12-31')),
                 cost_y=c('100','100','100'),
                 rate_y=c('1000','1000','1000'));

所需的输出如下

data_q_desired <-data.frame(date=as.Date(c('2010-03-31','2010-06-30','2010-09-30','2010-12-31','2011-03-31','2011-06-30','2011-09-30','2011-12-31','2012-03-31','2012-06-30','2012-09-30','2012-12-31')),
                        cost_q=c('20','100','4','7','9','43','N/A','2','5','100','N/A','100'),
                        rate_q=c('500','1000','600','50','830','260','N/A','560','800','1000','N/A','1000'));

如何用R中一些简洁高效的代码填充不同数据框中列中的缺失观测值(N / As)?

我已经用NA搜索/替换了“ N / A”。 lapply部分是必需的,因为您已将数字编码为字符('500')。

library(data.table)

df = as.data.table(
  Reduce(function(...) merge(..., all=TRUE), list(data_q, data_sa, data_y)))

mynames = c("cost_q",  "rate_q",  "cost_sa", "rate_sa", "cost_y",  "rate_y" )
df[, (mynames) := lapply(.SD, as.numeric), .SDcols = mynames]

df[,  `:=` (
  cost_qx = ifelse(is.na(cost_q), pmax(cost_sa, cost_y, na.rm = T), cost_q),
  rate_qx = ifelse(is.na(rate_q), pmax(rate_sa, rate_y, na.rm = T), rate_q) 
)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM