简体   繁体   中英

Compare one row to all other rows

I have following dataframe in R

  ID     bay    row    tier
  1       1      2      80
  2       3      2      80
  3       2      5      06
  4       4      5      06
  5       23     6      82
  6       25     6      82
  7       24     6      82
  8       4      12     08

What I want to find is row and tier values are equal and at the same time bay should be an odd number and bay difference between two same row and tier entries should be 2 .

Eg

 ID     bay    row    tier
 1       1      2      80
 2       3      2      80

above two rows qualifies my condition row and tier are same with bay as odd number and difference between two bay numebers is 2 and I need to generate a flag which will get generated for both rows, lets say 1,2,3 which uniquely identifies the pairs

My desired dataframe would be

 ID     bay    row    tier   flag
 1       1      2      80     1
 2       3      2      80     1
 3       2      5      06     NA
 4       4      5      06     NA
 5       23     6      82     2
 6       25     6      82     2
 7       24     6      82     NA
 8       4      12     08     NA

How can I do it in r?

You can get the subset as follows,

ind <- duplicated(df[c('row', 'tier')]) & df$bay%%2 == 1|
       duplicated(df[c('row', 'tier')], fromLast = TRUE) & df$bay%%2 == 1
df1 <- df[ind,]
df1 <- df1[!!with(df1, ave(bay, new, FUN = function(i) c(TRUE, diff(i) == 2))),]
df1

Which gives

  ID bay row tier 1 1 1 2 80 2 2 3 2 80 5 5 23 6 82 6 6 25 6 82 

To get the flag,

df$flag <- cumsum(c(1, diff(which(ind)) != 1))[match(df$ID, df1$ID)]
df

Which gives,

 ID bay row tier flag 1 1 1 2 80 1 2 2 3 2 80 1 3 3 2 5 6 NA 4 4 4 5 6 NA 5 5 23 6 82 2 6 6 25 6 82 2 7 7 24 6 82 NA 8 8 4 12 8 NA 

using tidyverse , you can try something like this:

df %>%
  group_by(row,tier) %>%
  mutate(flg = if_else(bay %%2 >0, 1, 0)) %>%
  filter(flg == 1) %>%
  mutate(df2 = lead(bay,1) - bay) %>%
  filter(df2 == 2) %>%
  select(-df2) %>%
  ungroup()%>%
  mutate(flg = 1:n()) %>%
  right_join(df) %>%
  mutate(flg = coalesce(flg,lag(flg,1)))

which gives:

     ID   bay   row  tier   flg
  <int> <int> <int> <int> <int>
1     1     1     2    80     1
2     2     3     2    80     1
3     3     2     5     6    NA
4     4     4     5     6    NA
5     5    23     6    82     2
6     6    25     6    82     2
7     7    24     6    82    NA
8     8     4    12     8    NA

We can use

library(data.table)
i1 <- setDT(df1)[, .I[all(bay%%2 == 1) & diff(bay)==2], .(grp = rleid(bay%%2),row, tier)]$V1
df1[i1, flag := 1
  ][!is.na(flag), flag := as.numeric(.GRP), .(row, tier)]
df1
#    ID bay row tier flag
#1:  1   1   2   80    1
#2:  2   3   2   80    1
#3:  3   2   5    6   NA
#4:  4   4   5    6   NA
#5:  5  23   6   82    2
#6:  6  25   6   82    2
#7:  7  24   6   82   NA
#8:  8   4  12    8   NA

A different approach. You mention you just need a unique identifier. If the numbers don't have to be sequential, it can be achieved like this:

library(dplyr)
df$flag=NA
group = df %>% group_indices(row,tier)
idx = which(df$bay %% 2==1 & (df$bay - lag(df$bay,default=-1)==2 | group != lag(group,default=-1)))
df$flag[idx]=group[idx]

Output:

  ID bay row tier flag
1  1   1   2   80    1
2  2   3   2   80    1
3  3   2   5    6   NA
4  4   4   5    6   NA
5  5  23   6   82    3
6  6  25   6   82    3
7  7  24   6   82   NA
8  8   4  12    8   NA

Hope this helps!

I wrote this crappy for loop,but it works

df$flag = NA

for(i in 1:nrow(df)) {
  for(j in 2:nrow(df)) {
    if(df$row[i] == df$row[j]){
      if(df$tier[i] == df$tier[j]){
        if(df$bay[i] %% 2 != 0){
          if(df$bay[j] %% 2 != 0){
            if(abs(df$bay[i] - df$bay[j]) == 2){
              df$flag[i] = i
              df$flag[j] = i
         }
       }

      }
    }
   }
  }
 }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM