[英]efficient way finding pair value in adjacent columns (Python/R/Sql)
这是一个基本的 R 解决方案,通过定义 function isAdjacent
isAdjacent <- function(df,p) {
colnum <- col(df)
diff(sapply(p,function(x) colnum[df==x],USE.NAMES = FALSE))==1
}
其中df
是 data.frame, p
是对。
例子
p1 <- c("b1","c2")
p2 <- rev(p1)
p3 <- c("a1","c3")
> isAdjacent(df,p1)
[1] TRUE
> isAdjacent(df,p2)
[1] FALSE
> isAdjacent(df,p3)
[1] FALSE
数据
> dput(df)
structure(list(A = c("a1", "a2", "a3", "a4"), B = c("b1", "b2",
"b3", "b4"), C = c("c1", "c2", "c3", "c4"), D = c("d1", "d2",
"d3", "d4"), E = c("e1", "e2", "e3", "e4"), F = c("f1", "f2",
"f3", "f4"), G = c("g1", "g2", "g3", "g4")), class = "data.frame", row.names = c(NA,
-4L))
大数据示例(基准测试)
df <- setNames(as.data.frame(sapply(letters[1:20], paste0, 1:1e6)), LETTERS[1:20])
p <- c("a1", "c3")
system.time({
isAdjacent <- function(df, p) {
colnum <- col(df)
diff(sapply(p, function(x) colnum[df == x], USE.NAMES = FALSE)) == 1
}
isAdjacent(df, p)
})
# user system elapsed
# 1.03 0.07 1.11
library(data.table)
system.time({
DT <- data.table(VAL = unlist(df), COL = rep(1L:ncol(df), each = nrow(df)), key = "VAL")
isadj <- function(left, right) {
DT[.(left), .(COL = COL + 1L)][DT[.(right)], on = .(COL), nomatch = 0L, .N > 0L]
}
isadj(p[1], p[2])
})
# user system elapsed
# 35.79 1.91 36.24
在 ThomasIsCoding 的帖子中使用df
,这是在 R 中使用data.table
的选项:
library(data.table)
DT <- data.table(VAL=unlist(df), COL=rep(1L:ncol(df), each=nrow(df)), key="VAL")
isadj <- function(left, right) {
DT[.(left), .(COL=COL+1L)][DT[.(right)], on=.(COL), nomatch=0L, .N > 0L]
}
isadj("a3", "b2")
#[1] TRUE
isadj("b2", "a3")
#[1] FALSE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.