为什么 python 在编辑列属性时比 r 快得多

Question

I have been working with spotfire and realized that my python codes edit the column properties much faster than the r codes.我一直在使用 spotfire，并意识到我的 python 代码编辑列属性的速度比 r 代码快得多。 The r code takes about 24 seconds, while the python code takes about 4 to do the same thing. r 代码大约需要 24 秒，而 python 代码大约需要 4 秒才能完成相同的操作。 Is my r code just written poorly that it makes this happen.我的 r 代码是不是写得不好才导致这种情况发生。

Here is an example of my python code:这是我的 python 代码示例：

start=time.time()
count=0
names=[]
for i in olddt.Columns: #getting columns from old data table
    names.append(i)

for i in dt.Columns: #assigning new values
    if count<=4:
        i.Properties["Limits.Whatif.Upper"]=1.0
        i.Properties["Limits.Whatif.Lower"]=1.0
        i.Properties["Limits.Prod.Upper"]=1.0
        i.Properties["Limits.Prod.Lower"]=1.0
        count=count+1
    else:
        i.Properties["Limits.Whatif.Upper"]=float(count-4)+26.0
        i.Properties["Limits.Whatif.Lower"]=float(count-4)-39.0
        i.Properties["Limits.Prod.Upper"]=names[count-4].Properties["Limits.Whatif.Upper"]+5.0
        i.Properties["Limits.Prod.Lower"]=names[count-4].Properties["Limits.Whatif.Lower"]-4.0
        count=count+1

print time.time()-start

Here is my R code:这是我的 R 代码：

for(col in 1:ncol(temp2)){
    if (col<=4){
        attributes(temp2[,col])$SpotfireColumnMetaData$upper=Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower=-1*Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2=Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2=-1*Inf
    }
    else{
        names(attributes(dt[,col-4])$SpotfireColumnMetaData)<- lapply( names( attributes(dt[ ,col-4] )$SpotfireColumnMetaData), tolower)
        attributes(temp2[,col])$SpotfireColumnMetaData$upper=2
        attributes(temp2[,col])$SpotfireColumnMetaDatalower=1
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2=attributes(dt[,col-4])$SpotfireColumnMetaData$upper
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2=attributes(dt[,col-4])$SpotfireColumnMetaData$lower
    }
}

I also used an lapply function seen here:我还使用了一个 lapply function 在这里看到：

applyLimits <- function(col){
    if (count<4){
        attributes(temp2[,col])$SpotfireColumnMetaData$upper<<-Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower<<- (-1*Inf)
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2<<-Inf
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2<<- (-1*Inf)
        count<<-count+1
    }
    else{
        attributes(temp2[,col])$SpotfireColumnMetaData$upper<<-2
        attributes(temp2[,col])$SpotfireColumnMetaData$lower<<-1
        attributes(temp2[,col])$SpotfireColumnMetaData$upper2<<-attributes(dt[,col-4])$SpotfireColumnMetaData$upper2
        attributes(temp2[,col])$SpotfireColumnMetaData$lower2<<-attributes(dt[,col-4])$SpotfireColumnMetaData$lower2
        count<<-count+1
    }
}

lapply(1:ncol(temp),applyLimits)

If there is some way to improve my r code please tell me, but I haven't seen a better way of adjust the properties of it.如果有什么方法可以改进我的 r 代码，请告诉我，但我还没有看到更好的方法来调整它的属性。 According to some research I have done temp2 and dt both should be data.frame根据我所做的一些研究 temp2 和 dt 都应该是 data.frame

Answer 1

Remember R is a vectorised language, your lapply function is not vectorised.请记住 R 是矢量化语言，您的 lapply function 未矢量化。 To get good performance you need lapply to return a vector and update the whole vector in one go.为了获得良好的性能，您需要 lapply 返回一个向量并在一个 go 中更新整个向量。 Your function updates one row and one column at a time which is why you are getting poor performance.您的 function 一次更新一行和一列，这就是您性能不佳的原因。

The vectorised approach would be four lapply calls, each updating one whole column.矢量化方法将是四个 lapply 调用，每个调用更新一整列。 Should look a little like this:应该看起来像这样：

applyLimits1 <- function(col){
  count <<- count+1
  if (count<4) Inf else 2 
}
applyLimits2 <- function(col){
  count <<- count+1
  if (count<4) (-1*Inf) else 1 
}
applyLimits3 <- function(col){
  count <<- count+1
  if (count<4) Inf else attributes(dt[,col-4])$SpotfireColumnMetaData$upper2
}
applyLimits4 <- function(col){
  count <<- count+1
  if (count<4) (-1*Inf) else attributes(dt[,col-4])$SpotfireColumnMetaData$lower2
}

count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$upper <- lapply(1:ncol(temp),applyLimits1)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$lower <- lapply(1:ncol(temp),applyLimits2)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$upper2 <- lapply(1:ncol(temp),applyLimits3)
count <- -1
attributes(temp2[,col])$SpotfireColumnMetaData$lower2 <- lapply(1:ncol(temp),applyLimits4)

I don't have the data to test, I've just pasted your code around.我没有要测试的数据，我刚刚粘贴了您的代码。 You may be better with sapply or vapply.使用 sapply 或 vapply 可能会更好。 And of course some languages are better for certain tasks than others...当然，有些语言比其他语言更适合某些任务......

为什么 python 在编辑列属性时比 r 快得多

问题描述

1 个解决方案

解决方案1
0 2019-10-30 17:15:38

为什么 python 在编辑列属性时比 r 快得多

问题描述

1 个解决方案

解决方案1 0 2019-10-30 17:15:38

解决方案1
0 2019-10-30 17:15:38