[英]How to populate missing observations (N/As) in columns from different data frames with some elegant and efficient code in R?
有三个R数据帧,它们具有相同的结构,但是设置为三个不同的时间频率(季度“ _q”,半年“ _sa”和年份“ _y”)。 目标是仅以季度频率(“ data_q”)填充数据帧中每个变量(列)的缺失观测值(N / As),并以半年(“ data_sa”)和每年( “ data_y”)数据帧。
data_q <-data.frame(date=as.Date(c('2010-03-31','2010-06-30','2010-09-30','2010-12-31','2011-03-31','2011-06-30','2011-09-30','2011-12-31','2012-03-31','2012-06-30','2012-09-30','2012-12-31')),
cost_q=c('20','N/A','4','7','9','43','N/A','2','5','N/A','N/A','N/A'),
rate_q=c('500','N/A','600','50','830','260','N/A','560','800','N/A','N/A','N/A'));
data_sa <- data.frame(date=as.Date(c('2010-06-30','2010-12-31','2011-06-30','2011-12-31','2012-06-30','2012-12-31')),
cost_sa=c('100','N/A','N/A','N/A','100','N/A'),
rate_sa=c('1000','N/A','N/A','N/A','1000','N/A'));
data_y <- data.frame(date=as.Date(c('2010-12-31','2011-12-31','2012-12-31')),
cost_y=c('100','100','100'),
rate_y=c('1000','1000','1000'));
data_q_desired <-data.frame(date=as.Date(c('2010-03-31','2010-06-30','2010-09-30','2010-12-31','2011-03-31','2011-06-30','2011-09-30','2011-12-31','2012-03-31','2012-06-30','2012-09-30','2012-12-31')),
cost_q=c('20','100','4','7','9','43','N/A','2','5','100','N/A','100'),
rate_q=c('500','1000','600','50','830','260','N/A','560','800','1000','N/A','1000'));
如何用R中一些简洁高效的代码填充不同数据框中列中的缺失观测值(N / As)?
我已经用NA搜索/替换了“ N / A”。 lapply部分是必需的,因为您已将数字编码为字符('500')。
library(data.table)
df = as.data.table(
Reduce(function(...) merge(..., all=TRUE), list(data_q, data_sa, data_y)))
mynames = c("cost_q", "rate_q", "cost_sa", "rate_sa", "cost_y", "rate_y" )
df[, (mynames) := lapply(.SD, as.numeric), .SDcols = mynames]
df[, `:=` (
cost_qx = ifelse(is.na(cost_q), pmax(cost_sa, cost_y, na.rm = T), cost_q),
rate_qx = ifelse(is.na(rate_q), pmax(rate_sa, rate_y, na.rm = T), rate_q)
)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.