简体   繁体   English

R-比较跨多列的组中的行

[英]R-comparing rows in a group across multiple columns

I have a data frame containing samples (in rows) and their values in multiple columns.我有一个包含样本(按行)及其在多列中的值的数据框。 In some cases the sample has been repeated.在某些情况下,样本已被重复。 What I want to do is compare the values in the columns for these repeats and put the output in a new df.我想要做的是比较这些重复的列中的值,并将输出放入新的 df 中。 If the values match I want to indicate this with a 1 and if they do not match a 0. NAs should result in NA.如果值匹配,我想用 1 表示,如果它们不匹配则为 0。NA 应该导致 NA。

What I try to do is similar to here .我尝试做的与此处类似。 However, I only want to compare repeated samples, not all combinations of all rows as they do in the example in the link.但是,我只想比较重复的样本,而不是像链接中的示例那样比较所有行的所有组合。 But I cannot find a way to convert the solution given there to my problem.但是我找不到将那里给出的解决方案转换为我的问题的方法。

Example data:示例数据:

Sample  x.1  x.2  y.1  y.2  z.1  z.2
------------------------------------
ID1     66   66   102  104  33   37
ID2     66   72   100  104  31   35
ID2     66   72   100  104  NA   NA
ID3     64   66   104  104  35   37
ID4     72   72   100  102  31   37
ID4     72   72   NA   NA   31   37
ID4     72   72   100  102  31   31
ID5     66   66   102  102  35   35
ID5     66   72   100  100  31   37

Result I am looking for in a new df:结果我在一个新的 df 中寻找:

Sample  x.1  x.2  y.1  y.2  z.1  z.2
------------------------------------
ID2     1    1    1    1    NA   NA
ID4     1    1    NA   NA   1    0
ID5     1    0    0    0    0    0

I tried something along these lines but it did not work as it only give me 1 as an output, so that is definitely incorrect.我沿着这些方向尝试了一些东西,但它没有用,因为它只给我 1 作为输出,所以这绝对是不正确的。

test <- df %>% 
  group_by(Sample) %>%
  mutate(across(1:6, funs(ifelse(.[1,]==.[2,], 1, 0))))

You can first remove groups which have only 1 row in each group and then summarise rest of the columns by group.您可以先删除每组中只有 1 行的组,然后按组summarise其余列。

library(dplyr)

df %>%
  group_by(Sample) %>%
  filter(n() > 1) %>%
  summarise(across(x.1:z.2, 
            ~if(any(is.na(.))) NA else as.integer(n_distinct(.) == 1)))

#  Sample   x.1   x.2   y.1   y.2   z.1   z.2
#  <chr>  <int> <int> <int> <int> <int> <int>
#1 ID2        1     1     1     1    NA    NA
#2 ID4        1     1    NA    NA     1     0
#3 ID5        1     0     0     0     0     0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM