[英]Compare one row to all other rows
I have following dataframe in R 我在R中有以下数据框
ID bay row tier
1 1 2 80
2 3 2 80
3 2 5 06
4 4 5 06
5 23 6 82
6 25 6 82
7 24 6 82
8 4 12 08
What I want to find is row
and tier
values are equal and at the same time bay
should be an odd
number and bay difference between two same row and tier
entries should be 2
. 我想要找到的是
row
和tier
值相等,同时bay
应该是一个odd
,两个相同的row and tier
条目之间的bay差异应该是2
。
Eg 例如
ID bay row tier
1 1 2 80
2 3 2 80
above two rows qualifies my condition row and tier
are same with bay
as odd number and difference between two bay
numebers is 2
and I need to generate a flag which will get generated for both rows, lets say 1,2,3
which uniquely identifies the pairs 上面的两行符合我的条件
row and tier
是相同的, bay
是奇数, two bay
数字之间的差异是2
,我需要生成一个标志,将为两个行生成,让我们说1,2,3
唯一标识对
My desired dataframe would be 我想要的数据帧是
ID bay row tier flag
1 1 2 80 1
2 3 2 80 1
3 2 5 06 NA
4 4 5 06 NA
5 23 6 82 2
6 25 6 82 2
7 24 6 82 NA
8 4 12 08 NA
How can I do it in r? 我怎么能在r?
You can get the subset as follows, 您可以按如下方式获取子集,
ind <- duplicated(df[c('row', 'tier')]) & df$bay%%2 == 1|
duplicated(df[c('row', 'tier')], fromLast = TRUE) & df$bay%%2 == 1
df1 <- df[ind,]
df1 <- df1[!!with(df1, ave(bay, new, FUN = function(i) c(TRUE, diff(i) == 2))),]
df1
Which gives 这使
ID bay row tier 1 1 1 2 80 2 2 3 2 80 5 5 23 6 82 6 6 25 6 82
To get the flag, 要获得旗帜,
df$flag <- cumsum(c(1, diff(which(ind)) != 1))[match(df$ID, df1$ID)]
df
Which gives, 这使,
ID bay row tier flag 1 1 1 2 80 1 2 2 3 2 80 1 3 3 2 5 6 NA 4 4 4 5 6 NA 5 5 23 6 82 2 6 6 25 6 82 2 7 7 24 6 82 NA 8 8 4 12 8 NA
using tidyverse
, you can try something like this: 使用
tidyverse
,你可以尝试这样的事情:
df %>%
group_by(row,tier) %>%
mutate(flg = if_else(bay %%2 >0, 1, 0)) %>%
filter(flg == 1) %>%
mutate(df2 = lead(bay,1) - bay) %>%
filter(df2 == 2) %>%
select(-df2) %>%
ungroup()%>%
mutate(flg = 1:n()) %>%
right_join(df) %>%
mutate(flg = coalesce(flg,lag(flg,1)))
which gives: 这使:
ID bay row tier flg
<int> <int> <int> <int> <int>
1 1 1 2 80 1
2 2 3 2 80 1
3 3 2 5 6 NA
4 4 4 5 6 NA
5 5 23 6 82 2
6 6 25 6 82 2
7 7 24 6 82 NA
8 8 4 12 8 NA
We can use 我们可以用
library(data.table)
i1 <- setDT(df1)[, .I[all(bay%%2 == 1) & diff(bay)==2], .(grp = rleid(bay%%2),row, tier)]$V1
df1[i1, flag := 1
][!is.na(flag), flag := as.numeric(.GRP), .(row, tier)]
df1
# ID bay row tier flag
#1: 1 1 2 80 1
#2: 2 3 2 80 1
#3: 3 2 5 6 NA
#4: 4 4 5 6 NA
#5: 5 23 6 82 2
#6: 6 25 6 82 2
#7: 7 24 6 82 NA
#8: 8 4 12 8 NA
A different approach. 一种不同的方法。 You mention you just need a unique identifier.
你提到你只需要一个唯一的标识符。 If the numbers don't have to be sequential, it can be achieved like this:
如果数字不必是连续的,可以这样实现:
library(dplyr)
df$flag=NA
group = df %>% group_indices(row,tier)
idx = which(df$bay %% 2==1 & (df$bay - lag(df$bay,default=-1)==2 | group != lag(group,default=-1)))
df$flag[idx]=group[idx]
Output: 输出:
ID bay row tier flag
1 1 1 2 80 1
2 2 3 2 80 1
3 3 2 5 6 NA
4 4 4 5 6 NA
5 5 23 6 82 3
6 6 25 6 82 3
7 7 24 6 82 NA
8 8 4 12 8 NA
Hope this helps! 希望这可以帮助!
I wrote this crappy for loop,but it works 我写了这个蹩脚的循环,但它的工作原理
df$flag = NA
for(i in 1:nrow(df)) {
for(j in 2:nrow(df)) {
if(df$row[i] == df$row[j]){
if(df$tier[i] == df$tier[j]){
if(df$bay[i] %% 2 != 0){
if(df$bay[j] %% 2 != 0){
if(abs(df$bay[i] - df$bay[j]) == 2){
df$flag[i] = i
df$flag[j] = i
}
}
}
}
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.