简体   繁体   English

将一行与所有其他行进行比较

[英]Compare one row to all other rows

I have following dataframe in R 我在R中有以下数据框

  ID     bay    row    tier
  1       1      2      80
  2       3      2      80
  3       2      5      06
  4       4      5      06
  5       23     6      82
  6       25     6      82
  7       24     6      82
  8       4      12     08

What I want to find is row and tier values are equal and at the same time bay should be an odd number and bay difference between two same row and tier entries should be 2 . 我想要找到的是rowtier值相等,同时bay应该是一个odd ,两个相同的row and tier条目之间的bay差异应该是2

Eg 例如

 ID     bay    row    tier
 1       1      2      80
 2       3      2      80

above two rows qualifies my condition row and tier are same with bay as odd number and difference between two bay numebers is 2 and I need to generate a flag which will get generated for both rows, lets say 1,2,3 which uniquely identifies the pairs 上面的两行符合我的条件row and tier是相同的, bay是奇数, two bay数字之间的差异是2 ,我需要生成一个标志,将为两个行生成,让我们说1,2,3唯一标识对

My desired dataframe would be 我想要的数据帧是

 ID     bay    row    tier   flag
 1       1      2      80     1
 2       3      2      80     1
 3       2      5      06     NA
 4       4      5      06     NA
 5       23     6      82     2
 6       25     6      82     2
 7       24     6      82     NA
 8       4      12     08     NA

How can I do it in r? 我怎么能在r?

You can get the subset as follows, 您可以按如下方式获取子集,

ind <- duplicated(df[c('row', 'tier')]) & df$bay%%2 == 1|
       duplicated(df[c('row', 'tier')], fromLast = TRUE) & df$bay%%2 == 1
df1 <- df[ind,]
df1 <- df1[!!with(df1, ave(bay, new, FUN = function(i) c(TRUE, diff(i) == 2))),]
df1

Which gives 这使

  ID bay row tier 1 1 1 2 80 2 2 3 2 80 5 5 23 6 82 6 6 25 6 82 

To get the flag, 要获得旗帜,

df$flag <- cumsum(c(1, diff(which(ind)) != 1))[match(df$ID, df1$ID)]
df

Which gives, 这使,

 ID bay row tier flag 1 1 1 2 80 1 2 2 3 2 80 1 3 3 2 5 6 NA 4 4 4 5 6 NA 5 5 23 6 82 2 6 6 25 6 82 2 7 7 24 6 82 NA 8 8 4 12 8 NA 

using tidyverse , you can try something like this: 使用tidyverse ,你可以尝试这样的事情:

df %>%
  group_by(row,tier) %>%
  mutate(flg = if_else(bay %%2 >0, 1, 0)) %>%
  filter(flg == 1) %>%
  mutate(df2 = lead(bay,1) - bay) %>%
  filter(df2 == 2) %>%
  select(-df2) %>%
  ungroup()%>%
  mutate(flg = 1:n()) %>%
  right_join(df) %>%
  mutate(flg = coalesce(flg,lag(flg,1)))

which gives: 这使:

     ID   bay   row  tier   flg
  <int> <int> <int> <int> <int>
1     1     1     2    80     1
2     2     3     2    80     1
3     3     2     5     6    NA
4     4     4     5     6    NA
5     5    23     6    82     2
6     6    25     6    82     2
7     7    24     6    82    NA
8     8     4    12     8    NA

We can use 我们可以用

library(data.table)
i1 <- setDT(df1)[, .I[all(bay%%2 == 1) & diff(bay)==2], .(grp = rleid(bay%%2),row, tier)]$V1
df1[i1, flag := 1
  ][!is.na(flag), flag := as.numeric(.GRP), .(row, tier)]
df1
#    ID bay row tier flag
#1:  1   1   2   80    1
#2:  2   3   2   80    1
#3:  3   2   5    6   NA
#4:  4   4   5    6   NA
#5:  5  23   6   82    2
#6:  6  25   6   82    2
#7:  7  24   6   82   NA
#8:  8   4  12    8   NA

A different approach. 一种不同的方法。 You mention you just need a unique identifier. 你提到你只需要一个唯一的标识符。 If the numbers don't have to be sequential, it can be achieved like this: 如果数字不必是连续的,可以这样实现:

library(dplyr)
df$flag=NA
group = df %>% group_indices(row,tier)
idx = which(df$bay %% 2==1 & (df$bay - lag(df$bay,default=-1)==2 | group != lag(group,default=-1)))
df$flag[idx]=group[idx]

Output: 输出:

  ID bay row tier flag
1  1   1   2   80    1
2  2   3   2   80    1
3  3   2   5    6   NA
4  4   4   5    6   NA
5  5  23   6   82    3
6  6  25   6   82    3
7  7  24   6   82   NA
8  8   4  12    8   NA

Hope this helps! 希望这可以帮助!

I wrote this crappy for loop,but it works 我写了这个蹩脚的循环,但它的工作原理

df$flag = NA

for(i in 1:nrow(df)) {
  for(j in 2:nrow(df)) {
    if(df$row[i] == df$row[j]){
      if(df$tier[i] == df$tier[j]){
        if(df$bay[i] %% 2 != 0){
          if(df$bay[j] %% 2 != 0){
            if(abs(df$bay[i] - df$bay[j]) == 2){
              df$flag[i] = i
              df$flag[j] = i
         }
       }

      }
    }
   }
  }
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用R将文件中的一行与所有其他行进行比较 - Compare one row to all other rows in a file using R 将一行中的项目与其他所有行进行比较,并使用data.table-R遍历所有行 - Compare item in one row against all other rows and loop through all rows using data.table - R 在一行中找到最大值,然后与其他行中的最大值进行比较 - Find maximum in one row and compare with max in other rows 将所有行与R数据帧中的特定行进行比较 - Compare all rows to one specific row in r dataframe 有没有办法将数据表中的一个给定行与一组中的其他行进行比较? - Is there any way to compare one given row in a data table with other rows withing a group? t测试将一行(参考样本)与数据框中的所有其他行进行比较 - t test comparing one row (reference sample) to all other rows in the data frame R:按一列分组,然后在其他任何列中返回值大于0的第一行,然后返回此行之后的所有行 - R: Group by one column, and return the first row that has a value greater than 0 in any of the other columns and then return all rows after this row 如何将当前行与r中的所有先前行进行比较 - how to compare a current row with all previous rows in r 一行与所有行的条件语句 - Conditional statements for one row vs all rows R将第一行与所有行相加 - R summing row one with all rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM