I would like to perform an operation across a column of a data frame wherein the output is dependent on a comparison between two values.
My data frame dat
is arranged like this:
region value1
a 0
a 0
a 6
a 7
a 3
a 0
a 4
b 5
b 1
b 0
I want to create a vector of factor values based in integers. The factor value should increment every time the region value changes or every time value1
is 0. So in this case the vector I want would be equivalent to c(1, 2, 2, 2, 2, 3, 3, 4, 4, 5)
.
I have code to make a factor vector that increments ONLY when value1
is 0:
fac <- as.factor(cumsum(dat[,2]==0))
and I have c-style code that gets roughly the vector I want, but runs extremely slowly on my overall data and is just plain ugly:
p <- 1
facint <- 1
for (i in 2:length(dat[,2])) {
facint <- c(facint, p)
if (dat[i, 2]==0 || dat[i, 1] != dat[i-1, 1])
p = p+1
}
fac <- as.factor(facint)
So how can I accomplish an operation such as this when operating on every row in R-style programming?
Try
cumsum(dat[,2]==0|c(FALSE,dat$region[-1]!=dat$region[-nrow(dat)]))
# [1] 1 2 2 2 2 3 3 4 4 5
Or
cumsum(!duplicated(dat[,1]) | dat[,2]==0)
#[1] 1 2 2 2 2 3 3 4 4 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.