Currently, I'm working on the data transformation. The data is not super large, about 190k rows.
I wrote a for loop like this:
for (i in 1:nrow(df2)){
#a
record.a <- df[which(df$first_lat==df2[i,"third_lat"]
& df$first_lon==df2[i,"third_lon"]
& df$sixth_lat==df2[i,"fourth_lat"]
& df$sixth_lon==df2[i,"fourth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,18] <- ifelse(nrow(record.a) != 0,record.a$order_cnt,NA)
#b
record.b <- df[which(df$fifth_lat==df2[i,"third_lat"]
& df$fifth_lon==df2[i,"third_lon"]
& df$sixth_lat==df2[i,"second_lat"]
& df$sixth_lon==df2[i,"second_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,19] <- ifelse(nrow(record.b) != 0,record.b$order_cnt,NA)
#c
record.c <- df[which(df$fifth_lat==df2[i,"first_lat"]
& df$fifth_lon==df2[i,"first_lon"]
& df$fourth_lat==df2[i,"second_lat"]
& df$fourth_lon==df2[i,"second_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,20] <- ifelse(nrow(record.c) != 0,record.c$order_cnt,NA)
#d
record.d <- df[which(df$third_lat==df2[i,"first_lat"]
& df$third_lon==df2[i,"first_lon"]
& df$fourth_lat==df2[i,"sixth_lat"]
& df$fourth_lon==df2[i,"sixth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,21] <- ifelse(nrow(record.d) != 0,record.d$order_cnt,NA)
#e
record.e <- df[which(df$third_lat==df2[i,"fifth_lat"]
& df$third_lon==df2[i,"fifth_lon"]
& df$second_lat==df2[i,"sixth_lat"]
& df$second_lon==df2[i,"sixth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,22] <- ifelse(nrow(record.e) != 0,record.e$order_cnt,NA)
#f
record.f <- df[which(df$first_lat==df2[i,"fifth_lat"]
& df$first_lon==df2[i,"fifth_lon"]
& df$second_lat==df2[i,"fourth_lat"]
& df$second_lon==df2[i,"fourth_lon"]
& df[,4]==df2[i,4]
& df[,3]==df2[i,5]),]
df2[i,23] <- ifelse(nrow(record.f) != 0,record.f$order_cnt,NA)
}
So, basically, I need to fill out 6 columns of df2 respectively from df with 6 criteria. In the for loop, nrow(df2) is about 190k. It runs super slow. But I used view(df2) to check it and it runs fine. So is there any method I could make it faster? I may apply the same data transformation to a much larger dataset in the future.
df: df
df2: df2
The data is about grids on a map. df2 is basically a subset of df but add 6 additional columns. Both df and df2 has the same lon and lat information.
Each grid_id stands for a hexagon area in a map. Each hexagon is connected to other six hexagons by two pairs of lon and lat. What I want to do is that find a particular values from the six surrounding hexagons (in df) to fill into columns (a, b, c, d, e, f) in df2. Also, I need two other conditions, which is hours, ten_mins_interval. (df[,4]==df2[i,4] & df[,3]==df2[i,5]))
So I think the logic is:
If you start with your current df2[,1:17]
you can add df[,18]
with the merge command:
df2 <- merge(df[,c("first_lat","first_lon","sixth_lat","sixth_lon","col4name","col5name","order_cn")],
df2,
by.x=c("first_lat","first_lon","sixth_lat","sixth_lon","col4name","col5name"),
by.y=c("third_lat","third_lon","fourth_lat","fourth_lon","col4name","col3name"),
all.y=TRUE)
You'll need to replace col4name
with the name of the fourth column and so on - I can't see from the screenshot what that might be. Five more versions of this command can be easily generated to add the other five columns. As the operation works on whole vectors at time, it's likely to be faster than looping. As data wasn't provided in a suitable format this isn't tested.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.