简体   繁体   中英

How to make this loop more efficient?

I have a data frame that looks like this:

user1,product1,0
user1,product2,2
user1,product3,1
user1,product4,2
user2,product3,0
user2,product2,2
user3,product4,0
user3,product5,3

The data frame has millions of rows. I need to go through each row, and if the value in the last column is 0, then keep that product number, otherwise attach the product number to the previous product number that has value = 0, then write to a new data frame.

For example, the resulting matrix should be

user1,product1
user1,product1product2
user1,product1product3
user1,product1product4
user2,product3
user2,product3product2
user3,product4
user3,product4product5

I wrote a for loop to go through each row, and it works, but is very very slow. How can I speed it up? I tried to vectorize it, but I'm not sure how because I need to check the value of previous row.

Note that you don't really have a matrix . A matrix can only contain one atomic type (numeric, integer, character, etc.). You really have a data.frame.

What you want to do is easily done with na.locf from the zoo package and the ifelse function.

x <- structure(list(V1 = c("user1", "user1", "user1", "user1", "user2", 
"user2", "user3", "user3"), V2 = c("product1", "product2", "product3", 
"product4", "product3", "product2", "product4", "product5"), 
    V3 = c("0", "2", "1", "2", "0", "2", "0", "3")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, 8L))

library(zoo)
# First, create a column that contains the value from the 2nd column
# when the 3rd column is zero.
x$V4 <- ifelse(x$V3==0,x$V2,NA)
# Next, replace all the NA with the previous non-NA value
x$V4 <- na.locf(x$V4)
# Finally, create a column that contains the concatenated strings
x$V5 <- ifelse(x$V2==x$V4,x$V2,paste(x$V4,x$V2,sep=""))
# Desired output
x[,c(1,5)]

Since you're using a data.frame, you need to ensure the "product" columns are character and not factor (the code above will give odd results if the "product" columns are factor).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM