why is python so much faster at editing column properties than r

Question

I have been working with spotfire and realized that my python codes edit the column properties much faster than the r codes. The r code takes about 24 seconds, while the python code takes about 4 to do the same thing. Is my r code just written poorly that it makes this happen.

Here is an example of my python code:

start=time.time()
count=0
names=[]
for i in olddt.Columns: #getting columns from old data table
    names.append(i)

for i in dt.Columns: #assigning new values
    if count<=4:
        i.Properties["Limits.Whatif.Upper"]=1.0
        i.Properties["Limits.Whatif.Lower"]=1.0
        i.Properties["Limits.Prod.Upper"]=1.0
        i.Properties["Limits.Prod.Lower"]=1.0
        count=count+1
    else:
        i.Properties["Limits.Whatif.Upper"]=float(count-4)+26.0
        i.Properties["Limits.Whatif.Lower"]=float(count-4)-39.0
        i.Properties["Limits.Prod.Upper"]=names[count-4].Properties["Limits.Whatif.Upper"]+5.0
        i.Properties["Limits.Prod.Lower"]=names[count-4].Properties["Limits.Whatif.Lower"]-4.0
        count=count+1

print time.time()-start

Here is my R code:

for(col in 1:ncol(temp2)){
    if (col<=4){
        attributes(temp2[,col])$SpotfireColumnMetaData$upper=Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower=-1*Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2=Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2=-1*Inf
    }
    else{
        names(attributes(dt[,col-4])$SpotfireColumnMetaData)<- lapply( names( attributes(dt[ ,col-4] )$SpotfireColumnMetaData), tolower)
        attributes(temp2[,col])$SpotfireColumnMetaData$upper=2
        attributes(temp2[,col])$SpotfireColumnMetaDatalower=1
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2=attributes(dt[,col-4])$SpotfireColumnMetaData$upper
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2=attributes(dt[,col-4])$SpotfireColumnMetaData$lower
    }
}

I also used an lapply function seen here:

applyLimits <- function(col){
    if (count<4){
        attributes(temp2[,col])$SpotfireColumnMetaData$upper<<-Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower<<- (-1*Inf)
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2<<-Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2<<- (-1*Inf)
        count<<-count+1
    }
    else{
        attributes(temp2[,col])$SpotfireColumnMetaData$upper<<-2
        attributes(temp2[,col])$SpotfireColumnMetaData$lower<<-1
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2<<-attributes(dt[,col-4])$SpotfireColumnMetaData$upper2
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2<<-attributes(dt[,col-4])$SpotfireColumnMetaData$lower2
        count<<-count+1
    }
}

lapply(1:ncol(temp),applyLimits)

If there is some way to improve my r code please tell me, but I haven't seen a better way of adjust the properties of it. According to some research I have done temp2 and dt both should be data.frame

Answer 1

Remember R is a vectorised language, your lapply function is not vectorised. To get good performance you need lapply to return a vector and update the whole vector in one go. Your function updates one row and one column at a time which is why you are getting poor performance.

The vectorised approach would be four lapply calls, each updating one whole column. Should look a little like this:

applyLimits1 <- function(col){
  count <<- count+1
  if (count<4) Inf else 2 
}
applyLimits2 <- function(col){
  count <<- count+1
  if (count<4) (-1*Inf) else 1 
}
applyLimits3 <- function(col){
  count <<- count+1
  if (count<4) Inf else attributes(dt[,col-4])$SpotfireColumnMetaData$upper2
}
applyLimits4 <- function(col){
  count <<- count+1
  if (count<4) (-1*Inf) else attributes(dt[,col-4])$SpotfireColumnMetaData$lower2
}

count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$upper <- lapply(1:ncol(temp),applyLimits1)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$lower <- lapply(1:ncol(temp),applyLimits2)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$upper2 <- lapply(1:ncol(temp),applyLimits3)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$lower2 <- lapply(1:ncol(temp),applyLimits4)

I don't have the data to test, I've just pasted your code around. You may be better with sapply or vapply. And of course some languages are better for certain tasks than others...

why is python so much faster at editing column properties than r

Question

1 answers

solution1
0 2019-10-30 17:15:38

why is python so much faster at editing column properties than r

Question

1 answers

solution1 0 2019-10-30 17:15:38

solution1
0 2019-10-30 17:15:38