I am playing with a large dataset (~1.5m rows x 21 columns). Which includes a long, lat information of a transaction. I am computing the distance of this transaction from couple of target locations and appending this as new column to main dataset:
TargetLocation1<-data.frame(Long=XX.XXX,Lat=XX.XXX, Name="TargetLocation1", Size=ZZZZ)
TargetLocation2<-data.frame(Long=XX.XXX,Lat=XX.XXX, Name="TargetLocation2", Size=YYYY)
## MainData[6:7] are long and lat columns
MainData$DistanceFromTarget1<-distVincentyEllipsoid(MainData[6:7], TargetLocation1[1:2])
MainData$DistanceFromTarget2<-distVincentyEllipsoid(MainData[6:7], TargetLocation2[1:2])
I am using geosphere()
package's distVincentyEllipsoid
function to compute the distances. As you can imaging, distVincentyEllipsoid function is a computing intensive but it is more accurate (compared to other functions of the same package distHaversine(); distMeeus(); distRhumb(); distVincentySphere()
)
Q1) It takes me about 5-10 mins to compute distances for each target location [I have 16 GB RAM and i7 6600U 2.81Ghz Intel CPU ], and I have multiple target locations. Is there any faster way to do this?
Q2) Then I am creating a new column for a categorical variable to mark each transaction if it belongs to market definition of target locations. A for loop with 2 if statements. Is there any other way to make this computation faster?
MainData$TransactionOrigin<-"Other"
for (x in 1:nrow(MainData)){
if (MainData$DistanceFromTarget1[x]<=7000)
MainData$TransactionOrigin[x]="Target1"
if (MainData$DistanceFromTarget2[x]<=4000)
MainData$TransactionOrigin[x]="Target2"
}
Thanks
Regarding Q2
This will run much faster if you lose the loop.
MainData$TransactionOrigin <- "Other"
MainData$TransactionOrigin[which(MainData$DistanceFromTarget1[x]<=7000)] <- "Target1"
MainData$TransactionOrigin[which(MainData$DistanceFromTarget2[x]<=4000)] <- "Target2"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.