繁体   English   中英

将 R 分组 dataframe 中的所有行与当前行进行比较

[英]Compare all rows to current in R grouped dataframe

嗨,我正在尝试确定组中是否有任何行的版本比该组的任何其他行小 1,并且 label 它在另一列中。 我已经查看了滞后和领先,但问题是值不同的 1 行可能彼此相邻,也可能不相邻。

这是一个可重现的示例

数据:

library(dplyr)

df <- tibble('Plate' = c("A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A3","A3","A3","A3","A3","A3"),
             'Sample' = c("a", "a","a","b","b","a","a","b","b","b","a","b","b","c","c","c"),
             'Location' = c("x","x","x","y","y","y","y","x","x","x","x","y","y","x","x","x"),
             'Version' = c(1,1.2,2,22,26,9,9.3,11,11.3,12,19,32.2,33.2,14,15,15))

我尝试过的最后一次迭代改编自如何将当前行与 r(和其他)中的所有先前行进行比较

df_test <- df  %>%
  group_by(Plate,Sample,Location) %>% 
  arrange(desc(Version)) %>% 
  mutate(diff = sapply(seq_along(Version), function(i){
    if_else(any(.[1:(i-1),'Version'] - .[[i,'Version']] == 1.0), -1.0, 0)})
    )

预期 output:

   Plate Sample Location Version  diff
   <chr> <chr>  <chr>      <dbl> <dbl>
 1 A3    b      y           33.2     0
 2 A3    b      y           32.2    -1
 3 A1    b      y           26       0
 4 A1    b      y           22       0
 5 A3    a      x           19       0
 6 A3    c      x           15       0
 7 A3    c      x           15       0
 8 A3    c      x           14      -1
 9 A2    b      x           12       0
10 A2    b      x           11.3     0
11 A2    b      x           11      -1
12 A2    a      y            9.3     0
13 A2    a      y            9       0
14 A1    a      x            2       0
15 A1    a      x            1.2     0
16 A1    a      x            1      -1

实际 Output:

   Plate Sample Location Version  diff
   <chr> <chr>  <chr>      <dbl> <dbl>
 1 A3    b      y           33.2     0
 2 A3    b      y           32.2    -1
 3 A1    b      y           26       0
 4 A1    b      y           22      -1
 5 A3    a      x           19       0
 6 A3    c      x           15       0
 7 A3    c      x           15      -1
 8 A3    c      x           14       0
 9 A2    b      x           12       0
10 A2    b      x           11.3    -1
11 A2    b      x           11       0
12 A2    a      y            9.3     0
13 A2    a      y            9      -1
14 A1    a      x            2       0
15 A1    a      x            1.2    -1
16 A1    a      x            1       0

似乎正在查看行索引以进行比较(或忽略组?),我如何让它查看价值? 感觉就像我很接近。 我更喜欢 dplyr 答案,但如果需要,data.table 可以接受。 抱歉,如果我错过了已经回答的相关帖子

我会试试这个:

df  %>%
  group_by(Plate,Sample,Location) %>%
  mutate(diff = if_else((Version + 1) %in% Version, -1, 0))
# # A tibble: 16 x 5
# # Groups:   Plate, Sample, Location [7]
#    Plate Sample Location Version  diff
#    <chr> <chr>  <chr>      <dbl> <dbl>
#  1 A1    a      x            1      -1
#  2 A1    a      x            1.2     0
#  3 A1    a      x            2       0
#  4 A1    b      y           22       0
#  5 A1    b      y           26       0
#  6 A2    a      y            9       0
#  7 A2    a      y            9.3     0
#  8 A2    b      x           11      -1
#  9 A2    b      x           11.3     0
# 10 A2    b      x           12       0
# 11 A3    a      x           19       0
# 12 A3    b      y           32.2    -1
# 13 A3    b      y           33.2     0
# 14 A3    c      x           14      -1
# 15 A3    c      x           15       0
# 16 A3    c      x           15       0

由于您的版本并非都是整数,因此存在一些数值精度问题的风险,但它似乎适用于您的数字相对较低的示例。

数值稳定的版本可能如下所示:

df  %>%
  group_by(Plate,Sample,Location) %>%
  mutate(diff = if_else(apply(abs(outer(Version, Version, "-") + 1) < 1e-10, 1, any), -1, 0))

(与上述结果相同)

要了解它的工作原理/方式,请从x = c(1, 1.2, 2)之类的文本向量开始,然后在其上运行代码片段 - outer(x, x, "-") ,然后添加+ 1 ,等等

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM