简体   繁体   English

使循环和空间计算更快?

[英]Make For Loop and Spacial Computing Faster?

I am playing with a large dataset (~1.5m rows x 21 columns). 我正在使用大型数据集(约150万行x 21列)。 Which includes a long, lat information of a transaction. 其中包含交易的长期信息。 I am computing the distance of this transaction from couple of target locations and appending this as new column to main dataset: 我正在计算此交易与几个目标位置之间的距离,并将其作为新列附加到主要数据集:

TargetLocation1<-data.frame(Long=XX.XXX,Lat=XX.XXX, Name="TargetLocation1", Size=ZZZZ)
TargetLocation2<-data.frame(Long=XX.XXX,Lat=XX.XXX, Name="TargetLocation2", Size=YYYY)

## MainData[6:7] are long and lat columns

MainData$DistanceFromTarget1<-distVincentyEllipsoid(MainData[6:7], TargetLocation1[1:2]) 
MainData$DistanceFromTarget2<-distVincentyEllipsoid(MainData[6:7], TargetLocation2[1:2]) 

I am using geosphere() package's distVincentyEllipsoid function to compute the distances. 我正在使用distVincentyEllipsoid geosphere()包的distVincentyEllipsoid函数来计算距离。 As you can imaging, distVincentyEllipsoid function is a computing intensive but it is more accurate (compared to other functions of the same package distHaversine(); distMeeus(); distRhumb(); distVincentySphere() ) 正如您可以想象的那样,distVincentyEllipsoid函数需要大量计算,但是它的准确性更高(与相同软件包distHaversine(); distMeeus(); distRhumb(); distVincentySphere()其他函数相比)

Q1) It takes me about 5-10 mins to compute distances for each target location [I have 16 GB RAM and i7 6600U 2.81Ghz Intel CPU ], and I have multiple target locations. Q1)我需要大约5-10分钟的时间来计算每个目标位置的距离[我有16 GB RAM和i7 6600U 2.81Ghz Intel CPU],并且我有多个目标位置。 Is there any faster way to do this? 有没有更快的方法可以做到这一点?

Q2) Then I am creating a new column for a categorical variable to mark each transaction if it belongs to market definition of target locations. Q2)然后,我将为分类变量创建一个新列,以标记每笔交易(如果它属于目标位置的市场定义)。 A for loop with 2 if statements. 带2个if语句的for循环。 Is there any other way to make this computation faster? 还有其他方法可以使此计算更快吗?

  MainData$TransactionOrigin<-"Other"

  for (x in 1:nrow(MainData)){
  if (MainData$DistanceFromTarget1[x]<=7000)
  MainData$TransactionOrigin[x]="Target1"
  if (MainData$DistanceFromTarget2[x]<=4000)
  MainData$TransactionOrigin[x]="Target2"
}

Thanks 谢谢

Regarding Q2 关于第二季度
This will run much faster if you lose the loop. 如果您丢失了循环,这将运行得更快。

    MainData$TransactionOrigin <- "Other"
    MainData$TransactionOrigin[which(MainData$DistanceFromTarget1[x]<=7000)] <- "Target1"
    MainData$TransactionOrigin[which(MainData$DistanceFromTarget2[x]<=4000)] <- "Target2"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM