[英]How to sort a data.table based on a set of inequality constraints?
I have a set of "x < y" inequality constraints and I would like to sort the rows of a data.table based on these. 我有一组“x <y”不等式约束,我想基于这些排序data.table的行。
For example, 例如,
library(data.table)
set.seed(0)
ineqs <- unique(data.table(
X = sample(letters, 10, replace = T),
Rel = "<",
Y = sample(letters, 10, replace = T)
))
ineqs
X Rel Y
1: x < b
2: g < f
3: j < e
4: o < r
5: x < j
6: f < u
7: x < m
8: y < s
9: r < z
10: q < j
So, if I start with a table of sorted letters, 所以,如果我从一个排序的字母表开始,
dt <- data.table(Foo = letters)
Foo
1: a
2: b
3: c
---
24: x
25: y
26: z
How can I adjust the row order to satisfy my constraints? 如何调整行顺序以满足我的约束? Also, I am certain that my constraints are valid (ie none of the constraints contradict each other).
此外,我确信我的约束是有效的 (即没有任何约束相互矛盾)。
library(igraph)
g = ineqs[, graph_from_edgelist(cbind(X,Y), directed=TRUE)]
o = names(topo_sort(g))
dt[, v := factor(Foo, levels = o, ordered=TRUE)]
dt[order(v)]
Foo v
1: x x
2: g g
3: o o
4: y y
5: q q
6: b b
7: m m
8: f f
9: r r
10: s s
11: j j
12: u u
13: z z
14: e e
15: a <NA>
16: c <NA>
17: d <NA>
18: h <NA>
19: i <NA>
20: k <NA>
21: l <NA>
22: n <NA>
23: p <NA>
24: t <NA>
25: v <NA>
26: w <NA>
Foo v
All of the terms that aren't in ineqs
are sorted to the end. 所有不在
ineqs
中的术语都被排序到最后。
If the graph of your relation has cycles, you should get a warning in topo_sort
. 如果关系图表有周期,则应在
topo_sort
收到警告。 This tells you your task is not well defined for some terms in ineqs
. 这告诉您,对于
ineqs
某些术语,您的任务没有很好地定义。
Perhaps I misunderstood but this is not a trivial sort, and there doesn't necessarily exist one unique order. 也许我误解了,但这不是一个微不足道的类型,并不一定存在一个独特的顺序。
Let me give you an example. 让我给你举个例子。 Consider the conditions
考虑条件
X Rel Y
1: x < b
2: g < f
Various orders are conceivable 可以想到各种订单
x < g < f < b
g < x < b < f
g < x < f < b
g < f < x < b
x < g < b < f
x < b < g < f
all of which satisfy the conditions laid out in the first two lines. 所有这些都满足前两行中列出的条件。
I was interested in seeing how an exhaustive & crude implementation would do, where we pre-calculate all possible permutations and then eliminate those that do not fulfil the pairwise conditions. 我有兴趣了解详尽无遗的实施方式,我们预先计算所有可能的排列,然后消除那些不符合成对条件的排列。
To illustrate, we will use 4 letters only and the first two lines of the pairwise condition data. 为了说明,我们将仅使用4个字母和成对条件数据的前两行。
Here are my results: 这是我的结果:
To start, we define the four letters and calculate all permutations using gtools::permutations
. 首先,我们定义四个字母并使用
gtools::permutations
计算所有gtools::permutations
。
char <- c("b", "f", "g", "x") library(gtools) perm <- as.data.frame(permutations(length(char), length(char), char))
There are 24 possible permutations. 有24种可能的排列。
We now read in the pairwise condition data 我们现在读入成对条件数据
df <- read.table(text = "X Rel Y x < b g < f", header = T) # Convert factors to character vectors df[] <- sapply(df, as.character)
We now loop throw the permutations and the pairwise conditions and flag those rows in the permutation data that do not satisfy any of the pairwise conditions. 我们现在循环抛出排列和成对条件,并在排列数据中标记那些不满足任何成对条件的行。
rmv <- c() for (i in 1:nrow(perm)) { # Here we loop throw all possible permutations and eliminate those that # do not fulfil the pairwise conditions for (j in 1:nrow(df)) { # Here we loop throw the pairwise conditions cond <- eval(parse(text = sprintf("`%s`", df[j, "Rel"])))( which(perm[i, ] == df[j, "X"]), which(perm[i, ] == df[j, "Y"])) if (cond == FALSE) { rmv <- c(rmv, i) break } } }
The remaining permutations that satisfy the conditions are then 然后是满足条件的剩余排列
perm[-rmv, ] # V1 V2 V3 V4 #16 gfxb #17 gxbf #18 gxfb #20 xbgf #23 xgbf #24 xgfb
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.