[英]Compare all rows to current in R grouped dataframe
嗨,我正在嘗試確定組中是否有任何行的版本比該組的任何其他行小 1,並且 label 它在另一列中。 我已經查看了滯后和領先,但問題是值不同的 1 行可能彼此相鄰,也可能不相鄰。
這是一個可重現的示例
數據:
library(dplyr)
df <- tibble('Plate' = c("A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A3","A3","A3","A3","A3","A3"),
'Sample' = c("a", "a","a","b","b","a","a","b","b","b","a","b","b","c","c","c"),
'Location' = c("x","x","x","y","y","y","y","x","x","x","x","y","y","x","x","x"),
'Version' = c(1,1.2,2,22,26,9,9.3,11,11.3,12,19,32.2,33.2,14,15,15))
我嘗試過的最后一次迭代改編自如何將當前行與 r(和其他)中的所有先前行進行比較
df_test <- df %>%
group_by(Plate,Sample,Location) %>%
arrange(desc(Version)) %>%
mutate(diff = sapply(seq_along(Version), function(i){
if_else(any(.[1:(i-1),'Version'] - .[[i,'Version']] == 1.0), -1.0, 0)})
)
預期 output:
Plate Sample Location Version diff
<chr> <chr> <chr> <dbl> <dbl>
1 A3 b y 33.2 0
2 A3 b y 32.2 -1
3 A1 b y 26 0
4 A1 b y 22 0
5 A3 a x 19 0
6 A3 c x 15 0
7 A3 c x 15 0
8 A3 c x 14 -1
9 A2 b x 12 0
10 A2 b x 11.3 0
11 A2 b x 11 -1
12 A2 a y 9.3 0
13 A2 a y 9 0
14 A1 a x 2 0
15 A1 a x 1.2 0
16 A1 a x 1 -1
實際 Output:
Plate Sample Location Version diff
<chr> <chr> <chr> <dbl> <dbl>
1 A3 b y 33.2 0
2 A3 b y 32.2 -1
3 A1 b y 26 0
4 A1 b y 22 -1
5 A3 a x 19 0
6 A3 c x 15 0
7 A3 c x 15 -1
8 A3 c x 14 0
9 A2 b x 12 0
10 A2 b x 11.3 -1
11 A2 b x 11 0
12 A2 a y 9.3 0
13 A2 a y 9 -1
14 A1 a x 2 0
15 A1 a x 1.2 -1
16 A1 a x 1 0
似乎正在查看行索引以進行比較(或忽略組?),我如何讓它查看價值? 感覺就像我很接近。 我更喜歡 dplyr 答案,但如果需要,data.table 可以接受。 抱歉,如果我錯過了已經回答的相關帖子
我會試試這個:
df %>%
group_by(Plate,Sample,Location) %>%
mutate(diff = if_else((Version + 1) %in% Version, -1, 0))
# # A tibble: 16 x 5
# # Groups: Plate, Sample, Location [7]
# Plate Sample Location Version diff
# <chr> <chr> <chr> <dbl> <dbl>
# 1 A1 a x 1 -1
# 2 A1 a x 1.2 0
# 3 A1 a x 2 0
# 4 A1 b y 22 0
# 5 A1 b y 26 0
# 6 A2 a y 9 0
# 7 A2 a y 9.3 0
# 8 A2 b x 11 -1
# 9 A2 b x 11.3 0
# 10 A2 b x 12 0
# 11 A3 a x 19 0
# 12 A3 b y 32.2 -1
# 13 A3 b y 33.2 0
# 14 A3 c x 14 -1
# 15 A3 c x 15 0
# 16 A3 c x 15 0
由於您的版本並非都是整數,因此存在一些數值精度問題的風險,但它似乎適用於您的數字相對較低的示例。
數值穩定的版本可能如下所示:
df %>%
group_by(Plate,Sample,Location) %>%
mutate(diff = if_else(apply(abs(outer(Version, Version, "-") + 1) < 1e-10, 1, any), -1, 0))
(與上述結果相同)
要了解它的工作原理/方式,請從x = c(1, 1.2, 2)
之類的文本向量開始,然后在其上運行代碼片段 - outer(x, x, "-")
,然后添加+ 1
,等等
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.