简体   繁体   中英

in R: Setting new Values in a data.table fast

I am trying to set values to a data.table in an efficient way. The following code will do what I want, but it is too slow for large datasets:

DTcars<-as.data.table(mtcars)
for(i in 1:(dim(DTcars)[1]-1)){
  for(j in 1:dim(DTcars)[2]){
    if(DTcars[i,j, with=F]>10){
      set(DTcars,
          i=as.integer(i),
          j =as.integer(j)  ,
          value = DTcars[dim(DTcars)[1],j,with=F])
    }
  }
}

And I want something like this... which is totally a wrong code, but expresses my need and I think it would be faster. Meaning that I want to subset my data.table and insert the same value for a particular column and repeat for each column.

DTcars<-as.data.table(mtcars)
ns<-names(DTcars)
for(j in 1:length(ns)){
  DTcars[ns[j]>10]<-DTcars[20,ns[j]]
}

I think you're looking for

for (j in names(DTcars)) set(DTcars,
  i     = which(DTcars[[j]]>10),
  j     = j,
  value = tail(DTcars[[j]],1)
)

The column numbers or names can be used as the for iterator here.

The value changes between the two pieces of code in the OP, so I'm not sure about that.

IMO set should be used sparingly, and regular := is sufficient almost always:

for (col in names(DTcars))
  DTcars[get(col) > 10, (col) := get(col)[.N]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM