简体   繁体   English

在列的数据框中查找相同的观察值但在另一列中不同

[英]Find identical observations in a column´s data frame but different in another column

In R, I have a data frame which includes a ID column.在 R 中,我有一个包含 ID 列的数据框。 I need to find all the rows that have the same ID but are different in the X1 variable.我需要找到所有具有相同 ID 但在 X1 变量中不同的行。

For example,例如,

d

ID    X1     X2
a    19      F
b    19      F
c    16      T
a    16      T 
a    19      T
d    17      T 
b    15      F 
b    19      F
c    17      T
c    17      T
d    17      T
e    15      T
f    14      T
g    16      T

The result will be:结果将是:

df1

ID    X1     X2
a    19      F
b    19      F
c    16      T
a    16      T 
b    15      F 
c    17      T
t      <- table(d$X1, d$ID)
t[t>1] <- 1
t      <- apply(t,2,sum)
t      <- t[t>1]

d1 <- data.frame(ID = names(t))
d1 <- merge(d1, d, by = "ID", all.x=T,all.y=F)
d1 <- unique(d1[,1:2])
d1
 ID X1 1 a 19 2 a 16 4 b 15 5 b 19 7 c 16 8 c 17

We can include the 3rd column as well, but you'd need to give some logic to pick which value of it to retain.我们也可以包含第 3 列,但您需要给出一些逻辑来选择要保留的值。 For instance, there were 2 values of a where X1 was 19, one with X2 T and one where it was F. To choose between the 2 you could keep the first matching row for X2 , the last, or choose T above F, etc.举例来说,有2个取值a ,其中X1为19,一个与X2 T和一个它被F.到2之间选择,你可以保留第一个匹配行的X2 ,最后还是选择T上方楼等.

We can remove the single ids first.我们可以先删除单个 id。 Then get a count of the ids left.然后计算剩余的 id。 If there is a single id left we remove it:如果只剩下一个 id,我们将其删除:

newdf <- df1[duplicated(df1$ID, fromLast=TRUE),]
tbl <- table(newdf$ID)
newdf[!newdf$ID %in% names(tbl[tbl < 2]),]
#   ID X1    X2
# 1  a 19 FALSE
# 2  b 19 FALSE
# 3  c 16  TRUE
# 4  a 16  TRUE
# 7  b 15 FALSE
# 9  c 17  TRUE

这行得通吗?

df1[rownames(unique(df1[,c("ID","X1")])),]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查数据框中的单元格是否与另一列相同 - check if cells in data frame is identical to another column 如何按数据框中的列名重命名观察结果? - How to rename observations by column name in data frame? 进行与相邻列中的观测值相同的NA观测值 - Make NA observations that are identical with observations in adjacent column 在分组数据帧内,基于另一列中多个观测值的比较,有条件地将值分配给该列中的NA - Conditionally assign values to NAs in a column based on comparison of multiple observations in another column, within a grouped data frame 如何转换具有相同列名的数据框 - How to transform a data frame with identical column names 基于不同数据框中的另一列,使用 for 循环填充空列 - filling an empty column with a for loop based on another column in a different data frame 一列中不同观察值的数量 - number of different observations in a column 从一个数据框的不同列创建一个新列,该条件以另一个数据框的另一列为条件 - Create a new column from different columns of one data frame conditioned on another column from another data frame 查找一列中相同但另一列中相同的行 - Find rows that are identical in one column but not another 如何查找列中与另一个数据帧范围匹配的单元格数? - How to find the number of cells in a column that match another data frame's range?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM