[英]R filling partially NA in data.table
I have the following data.table: 我有以下data.table:
dt <- data.table(date=rep(c(2014,2013), each=4), price=c(3.14, 1.45, 3.4 ,5.1, 1, 2.3, 2.79, 3), brand=rep(c("Mercedes", "Audi"), each=4), num=c(3,6,7,8,3,5,9,12), seller=rep(c("gregory", "dan"), each=4))
Resulting in: 导致:
date price brand num seller
1: 2013 1.00 Audi 3 dan
2: 2013 2.30 Audi 5 dan
3: 2013 2.79 Audi 9 dan
4: 2013 3.00 Audi 12 dan
5: 2014 3.14 Mercedes 3 gregory
6: 2014 1.45 Mercedes 6 gregory
7: 2014 3.40 Mercedes 7 gregory
8: 2014 5.10 Mercedes 8 gregory
My target is now to have this: 我现在的目标是:
date num price brand seller
1: 2013 3 1.00 Audi dan
2: 2013 5 2.30 Audi dan
3: 2013 6 NA Audi dan
4: 2013 7 NA Audi dan
5: 2013 8 NA Audi dan
6: 2013 9 2.79 Audi dan
7: 2013 12 3.00 Audi dan
8: 2014 3 3.14 Mercedes gregory
9: 2014 5 NA Mercedes gregory
10: 2014 6 1.45 Mercedes gregory
11: 2014 7 3.40 Mercedes gregory
12: 2014 8 5.10 Mercedes gregory
13: 2014 9 NA Mercedes gregory
14: 2014 12 NA Mercedes gregory
I first add lines for the missing num for every date: 我首先为每个日期为缺少的数字添加行:
setkey(dt, date, num)
dtt<-dt[CJ(unique(date), unique(dt[,num]))]
Giving this first step: 第一步:
date num price brand seller
1: 2013 3 1.00 Audi dan
2: 2013 5 2.30 Audi dan
3: 2013 6 NA NA NA
4: 2013 7 NA NA NA
5: 2013 8 NA NA NA
6: 2013 9 2.79 Audi dan
7: 2013 12 3.00 Audi dan
8: 2014 3 3.14 Mercedes gregory
9: 2014 5 NA NA NA
10: 2014 6 1.45 Mercedes gregory
11: 2014 7 3.40 Mercedes gregory
12: 2014 8 5.10 Mercedes gregory
13: 2014 9 NA NA NA
14: 2014 12 NA NA NA
And then: 接着:
dtt[date==2013, c("brand","seller"):=list("Audi","dan")]
dtt[date==2014, c("brand","seller"):=list("Mercedes","gregory")]
Gives the wanted result. 给出想要的结果。
However: 然而:
1 - the last piece of code is awfull. 1-最后一段代码糟糕透顶。
2 - I would like to make a generic function (or a join) because I have lots of different dates and columns to replace/keep the NA's in my real data.table. 2-我想创建一个泛型函数(或联接),因为我有很多不同的日期和列来替换/保留真实data.table中的NA。
It seems simple but I am stuck! 看起来很简单,但是我被卡住了!
How about: 怎么样:
require(data.table) ## 1.9.2
setkey(dt, num)
nums = unique(dt$num)
dt[, list(price=.SD[J(nums)]$price, brand=brand[1L],
num=nums, seller=seller[1L]), by=date]
# date price brand num seller
# 1: 2014 3.14 Mercedes 3 gregory
# 2: 2014 NA Mercedes 5 gregory
# 3: 2014 1.45 Mercedes 6 gregory
# 4: 2014 3.40 Mercedes 7 gregory
# 5: 2014 5.10 Mercedes 8 gregory
# 6: 2014 NA Mercedes 9 gregory
# 7: 2014 NA Mercedes 12 gregory
# 8: 2013 1.00 Audi 3 dan
# 9: 2013 2.30 Audi 5 dan
# 10: 2013 NA Audi 6 dan
# 11: 2013 NA Audi 7 dan
# 12: 2013 NA Audi 8 dan
# 13: 2013 2.79 Audi 9 dan
# 14: 2013 3.00 Audi 12 dan
or alternatively: 或者:
dt[, c(.SD[J(nums), list(price=price)], brand=brand[1L],
seller=seller[1L]), by=date]
where the order of columns will be different. 列的顺序会有所不同。
In 1.9.3
, this'll be much more efficient (in terms of both syntax and speed), because we don't have to join and return all the columns: 在
1.9.3
,这将效率更高(就语法和速度而言),因为我们不必联接并返回所有列:
## 1.9.3
dt[, list(price=.SD[J(nums), price], brand=brand[1L],
num=nums, seller=seller[1L]), by=date]
.SD[J(nums), price]
will result in a vector, as opposed to a data.table
in previous versions and will not perform an implicit by (by-without-by) and will therefore be faster as well. .SD[J(nums), price]
将产生一个向量,与之前版本中的data.table
相反,并且将不执行隐式by(逐个by-by),因此也将更快。
Have a look at under the new FRs implemented (points 1 and 2) for v1.9.3 here for details. 请查看此处针对v1.9.3实施的新FR(第1点和第2点)的详细信息。
HTH HTH
You could use the roll
argument to fill the NA
's with nearest values. 您可以使用
roll
参数以最接近的值填充NA
。 The problem is that will also fill the price
, but that's easy to remedy: 问题是,这也将填补
price
,但这很容易补救:
setkey(dt, date, num)
dt[CJ(unique(date), unique(num)), roll = 'nearest'][!dt, price := NA][]
# date price brand num seller
# 1: 2013 1.00 Audi 3 dan
# 2: 2013 2.30 Audi 5 dan
# 3: 2013 NA Audi 6 dan
# 4: 2013 NA Audi 7 dan
# 5: 2013 NA Audi 8 dan
# 6: 2013 2.79 Audi 9 dan
# 7: 2013 3.00 Audi 12 dan
# 8: 2014 3.14 Mercedes 3 gregory
# 9: 2014 NA Mercedes 5 gregory
#10: 2014 1.45 Mercedes 6 gregory
#11: 2014 3.40 Mercedes 7 gregory
#12: 2014 5.10 Mercedes 8 gregory
#13: 2014 NA Mercedes 9 gregory
#14: 2014 NA Mercedes 12 gregory
I think this should be much faster than the .SD[...]
solution. 我认为这应该比
.SD[...]
解决方案快得多。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.