简体   繁体   English

在R中:快速在data.table中设置新值

[英]in R: Setting new Values in a data.table fast

I am trying to set values to a data.table in an efficient way. 我试图以一种有效的方式将值设置为data.table。 The following code will do what I want, but it is too slow for large datasets: 以下代码可以完成我想要的操作,但是对于大型数据集来说太慢了:

DTcars<-as.data.table(mtcars)
for(i in 1:(dim(DTcars)[1]-1)){
  for(j in 1:dim(DTcars)[2]){
    if(DTcars[i,j, with=F]>10){
      set(DTcars,
          i=as.integer(i),
          j =as.integer(j)  ,
          value = DTcars[dim(DTcars)[1],j,with=F])
    }
  }
}

And I want something like this... which is totally a wrong code, but expresses my need and I think it would be faster. 我想要这样的代码……这完全是错误的代码,但是表达了我的需求,我认为这样会更快。 Meaning that I want to subset my data.table and insert the same value for a particular column and repeat for each column. 这意味着我想对我的data.table进行子集化,并为特定列插入相同的值,并为每一列重复。

DTcars<-as.data.table(mtcars)
ns<-names(DTcars)
for(j in 1:length(ns)){
  DTcars[ns[j]>10]<-DTcars[20,ns[j]]
}

I think you're looking for 我想你在找

for (j in names(DTcars)) set(DTcars,
  i     = which(DTcars[[j]]>10),
  j     = j,
  value = tail(DTcars[[j]],1)
)

The column numbers or names can be used as the for iterator here. 列号或名称可用作此处的for迭代器。

The value changes between the two pieces of code in the OP, so I'm not sure about that. value在OP中的两段代码之间变化,因此我不确定。

IMO set should be used sparingly, and regular := is sufficient almost always: 应当谨慎使用IMO set ,而常规:=几乎总是足够的:

for (col in names(DTcars))
  DTcars[get(col) > 10, (col) := get(col)[.N]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM