[英]Make For Loop and Spacial Computing Faster?
I am playing with a large dataset (~1.5m rows x 21 columns). 我正在使用大型数据集(约150万行x 21列)。 Which includes a long, lat information of a transaction.
其中包含交易的长期信息。 I am computing the distance of this transaction from couple of target locations and appending this as new column to main dataset:
我正在计算此交易与几个目标位置之间的距离,并将其作为新列附加到主要数据集:
TargetLocation1<-data.frame(Long=XX.XXX,Lat=XX.XXX, Name="TargetLocation1", Size=ZZZZ)
TargetLocation2<-data.frame(Long=XX.XXX,Lat=XX.XXX, Name="TargetLocation2", Size=YYYY)
## MainData[6:7] are long and lat columns
MainData$DistanceFromTarget1<-distVincentyEllipsoid(MainData[6:7], TargetLocation1[1:2])
MainData$DistanceFromTarget2<-distVincentyEllipsoid(MainData[6:7], TargetLocation2[1:2])
I am using geosphere()
package's distVincentyEllipsoid
function to compute the distances. 我正在使用
distVincentyEllipsoid
geosphere()
包的distVincentyEllipsoid
函数来计算距离。 As you can imaging, distVincentyEllipsoid function is a computing intensive but it is more accurate (compared to other functions of the same package distHaversine(); distMeeus(); distRhumb(); distVincentySphere()
) 正如您可以想象的那样,distVincentyEllipsoid函数需要大量计算,但是它的准确性更高(与相同软件包
distHaversine(); distMeeus(); distRhumb(); distVincentySphere()
其他函数相比)
Q1) It takes me about 5-10 mins to compute distances for each target location [I have 16 GB RAM and i7 6600U 2.81Ghz Intel CPU ], and I have multiple target locations. Q1)我需要大约5-10分钟的时间来计算每个目标位置的距离[我有16 GB RAM和i7 6600U 2.81Ghz Intel CPU],并且我有多个目标位置。 Is there any faster way to do this?
有没有更快的方法可以做到这一点?
Q2) Then I am creating a new column for a categorical variable to mark each transaction if it belongs to market definition of target locations. Q2)然后,我将为分类变量创建一个新列,以标记每笔交易(如果它属于目标位置的市场定义)。 A for loop with 2 if statements.
带2个if语句的for循环。 Is there any other way to make this computation faster?
还有其他方法可以使此计算更快吗?
MainData$TransactionOrigin<-"Other"
for (x in 1:nrow(MainData)){
if (MainData$DistanceFromTarget1[x]<=7000)
MainData$TransactionOrigin[x]="Target1"
if (MainData$DistanceFromTarget2[x]<=4000)
MainData$TransactionOrigin[x]="Target2"
}
Thanks 谢谢
Regarding Q2 关于第二季度
This will run much faster if you lose the loop. 如果您丢失了循环,这将运行得更快。
MainData$TransactionOrigin <- "Other"
MainData$TransactionOrigin[which(MainData$DistanceFromTarget1[x]<=7000)] <- "Target1"
MainData$TransactionOrigin[which(MainData$DistanceFromTarget2[x]<=4000)] <- "Target2"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.