I have following dataframe in R
ID bay row tier
1 1 2 80
2 3 2 80
3 2 5 06
4 4 5 06
5 23 6 82
6 25 6 82
7 24 6 82
8 4 12 08
What I want to find is row
and tier
values are equal and at the same time bay
should be an odd
number and bay difference between two same row and tier
entries should be 2
.
Eg
ID bay row tier
1 1 2 80
2 3 2 80
above two rows qualifies my condition row and tier
are same with bay
as odd number and difference between two bay
numebers is 2
and I need to generate a flag which will get generated for both rows, lets say 1,2,3
which uniquely identifies the pairs
My desired dataframe would be
ID bay row tier flag
1 1 2 80 1
2 3 2 80 1
3 2 5 06 NA
4 4 5 06 NA
5 23 6 82 2
6 25 6 82 2
7 24 6 82 NA
8 4 12 08 NA
How can I do it in r?
You can get the subset as follows,
ind <- duplicated(df[c('row', 'tier')]) & df$bay%%2 == 1|
duplicated(df[c('row', 'tier')], fromLast = TRUE) & df$bay%%2 == 1
df1 <- df[ind,]
df1 <- df1[!!with(df1, ave(bay, new, FUN = function(i) c(TRUE, diff(i) == 2))),]
df1
Which gives
ID bay row tier 1 1 1 2 80 2 2 3 2 80 5 5 23 6 82 6 6 25 6 82
To get the flag,
df$flag <- cumsum(c(1, diff(which(ind)) != 1))[match(df$ID, df1$ID)]
df
Which gives,
ID bay row tier flag 1 1 1 2 80 1 2 2 3 2 80 1 3 3 2 5 6 NA 4 4 4 5 6 NA 5 5 23 6 82 2 6 6 25 6 82 2 7 7 24 6 82 NA 8 8 4 12 8 NA
using tidyverse
, you can try something like this:
df %>%
group_by(row,tier) %>%
mutate(flg = if_else(bay %%2 >0, 1, 0)) %>%
filter(flg == 1) %>%
mutate(df2 = lead(bay,1) - bay) %>%
filter(df2 == 2) %>%
select(-df2) %>%
ungroup()%>%
mutate(flg = 1:n()) %>%
right_join(df) %>%
mutate(flg = coalesce(flg,lag(flg,1)))
which gives:
ID bay row tier flg
<int> <int> <int> <int> <int>
1 1 1 2 80 1
2 2 3 2 80 1
3 3 2 5 6 NA
4 4 4 5 6 NA
5 5 23 6 82 2
6 6 25 6 82 2
7 7 24 6 82 NA
8 8 4 12 8 NA
We can use
library(data.table)
i1 <- setDT(df1)[, .I[all(bay%%2 == 1) & diff(bay)==2], .(grp = rleid(bay%%2),row, tier)]$V1
df1[i1, flag := 1
][!is.na(flag), flag := as.numeric(.GRP), .(row, tier)]
df1
# ID bay row tier flag
#1: 1 1 2 80 1
#2: 2 3 2 80 1
#3: 3 2 5 6 NA
#4: 4 4 5 6 NA
#5: 5 23 6 82 2
#6: 6 25 6 82 2
#7: 7 24 6 82 NA
#8: 8 4 12 8 NA
A different approach. You mention you just need a unique identifier. If the numbers don't have to be sequential, it can be achieved like this:
library(dplyr)
df$flag=NA
group = df %>% group_indices(row,tier)
idx = which(df$bay %% 2==1 & (df$bay - lag(df$bay,default=-1)==2 | group != lag(group,default=-1)))
df$flag[idx]=group[idx]
Output:
ID bay row tier flag
1 1 1 2 80 1
2 2 3 2 80 1
3 3 2 5 6 NA
4 4 4 5 6 NA
5 5 23 6 82 3
6 6 25 6 82 3
7 7 24 6 82 NA
8 8 4 12 8 NA
Hope this helps!
I wrote this crappy for loop,but it works
df$flag = NA
for(i in 1:nrow(df)) {
for(j in 2:nrow(df)) {
if(df$row[i] == df$row[j]){
if(df$tier[i] == df$tier[j]){
if(df$bay[i] %% 2 != 0){
if(df$bay[j] %% 2 != 0){
if(abs(df$bay[i] - df$bay[j]) == 2){
df$flag[i] = i
df$flag[j] = i
}
}
}
}
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.