简体   繁体   English

一个间隔和一组间隔之间的区别?

[英]Difference between an interval and a set of intervals?

I have a set of non-overlapping, non-adjacent intervals, eg. 我有一组不重叠,不相邻的间隔,例如。 [{10,15}, {30,35}, {20,25}]. [{10,15},{30,35},{20,25}]。 They are not sorted, but I can sort them if necessary. 它们未排序,但如有必要,我可以对其进行排序。

Now I am given some new interval, eg. 现在给我一些新的间隔,例如。 {5,32} and want to generate a new set of intervals describing the difference: the ranges covered by this new interval that aren't in the set. {5,32},并希望生成描述差异的一组新间隔:该新间隔所覆盖的范围不在该范围内。 In this example the answer would be: [{5,9}, {16,19}, {26,29}]. 在此示例中,答案将是:[{5,9},{16,19},{26,29}]。

What's a fast algorithm for calculating this? 什么是快速计算算法? Note that the set will typically have 1, sometimes 2, rarely 3 or more items in it, so I want to optimise for this case. 请注意,该集合中通常包含1个,有时2个,很少3个或更多,因此我想针对这种情况进行优化。

For context, here's the code for initially creating the set from an input stream of start+end data, where I merge as I go: 对于上下文,这是用于从开始+结束数据的输入流中初始创建集合的代码,在此过程中,我将在此处合并:

type Interval struct {
    start int
    end   int
}

func (i *Interval) OverlapsOrAdjacent(j Interval) bool {
    return i.end+1 >= j.start && j.end+1 >= i.start
}

func (i *Interval) Merge(j Interval) bool {
    if !i.OverlapsOrAdjacent(j) {
        return false
    }
    if j.start < i.start {
        i.start = j.start
    }
    if j.end > i.end {
        i.end = j.end
    }
    return true
}

type Intervals []Interval

func (ivs Intervals) Len() int           { return len(ivs) }
func (ivs Intervals) Swap(i, j int)      { ivs[i], ivs[j] = ivs[j], ivs[i] }
func (ivs Intervals) Less(i, j int) bool { return ivs[i].start < ivs[j].start }

func (ivs Intervals) Merge(iv Interval) Intervals {
    ivs = append(ivs, iv)
    merged := make(Intervals, 0, len(ivs))
    for _, iv := range ivs {
        for i := 0; i < len(merged); {
            if iv.Merge(merged[i]) {
                merged = append(merged[:i], merged[i+1:]...)
            } else {
                i++
            }
        }
        merged = append(merged, iv)
    }
    return merged
}

func (ivs Intervals) MergeUsingSort(iv Interval) Intervals {
    ivs = append(ivs, iv)
    sort.Sort(ivs)
    merged := make(Intervals, 0, len(ivs))
    merged = append(merged, ivs[0])
    for i := 1; i < len(ivs); i++ {
        last := len(merged) - 1
        if !merged[last].Merge(ivs[i]) {
            merged = append(merged, ivs[i])
        }
    }
    return merged
}

func (ivs Intervals) Difference(iv Interval) Intervals {
    // ???
    return ivs
}

func main() {
    var ivs Intervals
    for _, input := range inputsFromSomewhere { // in reality, I don't have all these inputs at once, they come in one at a time
        iv := Interval{input.start, input.end}
        diffs := ivs.Difference(iv) // not yet implemented...
        // do something with diffs
        ivs = ivs.Merge(iv)
    }
}

I find that the above Intervals.Merge() is 2x faster than MergeUsingSort(), so I wonder if there's also a simple non-sorting way of answering my question. 我发现上面的Intervals.Merge()比MergeUsingSort()快2倍,所以我想知道是否还有一种简单的非排序方式来回答我的问题。

The question and answer code is incomplete and doesn't compile. 问答代码不完整,无法编译。 There are no benchmarks. 没有基准。 From a quick glance at the code, it's likely inefficient. 快速浏览一下代码,可能效率很低。

Usable code for interval.go and interval_test.go was obtained from https://github.com/VertebrateResequencing/wr/tree/develop/minfys . interval.gointerval_test.go可用代码从https://github.com/VertebrateResequencing/wr/tree/develop/minfys获得。

Let's start by writing a benchmark for the interval difference example. 首先,为间隔差异示例编写一个基准。

package minfys

import (
    "fmt"
    "testing"
)

// Example
var (
    xA = Intervals{{10, 15}, {30, 35}, {20, 25}}
    xB = Interval{5, 32}
    xD = Intervals{{5, 9}, {16, 19}, {26, 29}}
    xR = Intervals{}
)

func BenchmarkExample(b *testing.B) {
    b.ReportAllocs()
    a := make(Intervals, len(xA))
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        copy(a, xA)
        xR = a.Difference(xB)
    }
    b.StopTimer()
    if fmt.Sprint(xD) != fmt.Sprint(xR) {
        b.Fatal(xD, xR)
    }
}

Next, write a Difference method. 接下来,编写一个Difference方法。

package minfys

func (a Intervals) Difference(b Interval) Intervals {
    // If A and B are sets, then the relative complement of A in B
    // is the set of elements in B but not in A.
    // The relative complement of A in B is denoted B ∖  A:
    //     B \ A = {x ∈ A | x ∉ B}
    //     B \ A = B ∩ A'
    //
    // For example. d = a\b,
    //     a: [{10, 15}, {30, 35}, {20, 25}]
    //     b: {5,32}
    //     d: [{5,9}, {16,19}, {26,29}]
    // The elements of set a are non-overlapping, non-adjacent,
    // and unsorted intervals.

    if len(a) <= 0 {
        return Intervals{b}
    }

    d := make(Intervals, 0, 3)
    for ; len(a) > 0; a = a[1:] {
        for i := 1; i < len(a); i++ {
            if a[i].Start < a[0].Start {
                a[i], a[0] = a[0], a[i]
            }
        }

        if b.Start < a[0].Start {
            if b.End < a[0].Start {
                d = append(d, b)
                break
            }
            d = append(d, Interval{b.Start, a[0].Start - 1})
            b.Start = a[0].Start
        }
        if b.End <= a[0].End {
            break
        }
        if b.Start <= a[0].End {
            b.Start = a[0].End + 1
        }
        if len(a) == 1 {
            if b.Start <= a[0].End {
                b.Start = a[0].End + 1
            }
            d = append(d, b)
            break
        }
    }
    return d
}

Now, benchmark the Difference method. 现在,对差异方法进行基准测试。

BenchmarkExample-4     20000000     62.4 ns/op    48 B/op      1 allocs/op

sbs wrote a Difference method. sbs编写了一种Difference方法。

// Interval struct is used to describe something with a start and end. End must
// be greater than start.
type Interval struct {
    Start int64
    End   int64
}

// Overlaps returns true if this interval overlaps with the supplied one.
func (i *Interval) Overlaps(j Interval) bool {
    // https://nedbatchelder.com/blog/201310/range_overlap_in_two_compares.html
    return i.End >= j.Start && j.End >= i.Start
}

// Intervals type is a slice of Interval.
type Intervals []Interval

// Difference returns any portions of iv that do not overlap with any of our
// intervals. Assumes that all of our intervals have been Merge()d in.
func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
    diffs = append(diffs, iv)
    for _, prior := range ivs {
        for i := 0; i < len(diffs); {
            if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
                if len(diffs) == 1 {
                    diffs = nil
                } else {
                    diffs = append(diffs[:i], diffs[i+1:]...)
                }

                if left != nil {
                    diffs = append(diffs, *left)
                }
                if right != nil {
                    diffs = append(diffs, *right)
                }
            } else {
                i++
            }
        }
        if len(diffs) == 0 {
            break
        }
    }

    return
}

Benchmark sbs's Difference method. 基准某人的差异方法。

BenchmarkExample-4      5000000    365 ns/op     128 B/op      4 allocs/op

peterSO's Difference method is significantly faster. peterSO的差分方法明显更快。

old.txt (sbs) versus new.txt (peterSO):

benchmark              old ns/op     new ns/op     delta
BenchmarkExample-4     365           62.4          -82.90%

benchmark              old allocs     new allocs   delta
BenchmarkExample-4     4              1            -75.00%

benchmark              old bytes     new bytes     delta
BenchmarkExample-4     128           48            -62.50%

This is just a beginning. 这仅仅是一个开始。 There are likely other improvements that can be made. 可能还有其他可以改进的地方。

There were some errors in interval_test.go . interval_test.go存在一些错误。 ShouldBeNil is for pointers; ShouldBeNil用于指针; ShouldBeEmpty is for collections. ShouldBeEmpty用于集合。 ShouldResemble does not handle set equality (two sets which contain the same elements are the same set). ShouldResemble不处理集合相等性(两个包含相同元素的集合是同一集合)。 Change ShouldResemble order to match implementation dependent order. 更改ShouldResemble顺序以匹配依赖于实现的顺序。

$ go test
..........................................................................................................................x......................................................x................x
Failures:

  * interval_test.go 
  Line 247:
  Expected: nil
  Actual:   '[]'

  * interval_test.go 
  Line 375:
  Expected: 'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:31, End:32}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}}'
  Actual:   'minfys.Intervals{minfys.Interval{Start:5, End:6}, minfys.Interval{Start:11, End:14}, minfys.Interval{Start:19, End:19}, minfys.Interval{Start:31, End:32}}'
  (Should resemble)!

  * interval_test.go 
  Line 413:
  Expected: 'minfys.Intervals{minfys.Interval{Start:7, End:10}, minfys.Interval{Start:1, End:3}, minfys.Interval{Start:15, End:17}}'
  Actual:   'minfys.Intervals{minfys.Interval{Start:1, End:3}, minfys.Interval{Start:7, End:10}, minfys.Interval{Start:15, End:17}}'
  (Should resemble)!


195 total assertions

...
198 total assertions

--- FAIL: TestIntervals (0.04s)
FAIL

.

$ diff -a -u ../interval_test.go interval_test.go
--- ../interval_test.go 2017-04-29 20:23:29.365344008 -0400
+++ interval_test.go    2017-04-29 20:54:14.349344903 -0400
@@ -244,19 +244,19 @@
            So(len(ivs), ShouldEqual, 1)

            newIvs = ivs.Difference(twoSix)
-           So(newIvs, ShouldBeNil)
+           So(newIvs, ShouldBeEmpty)
            ivs = ivs.Merge(twoSix)
            So(len(ivs), ShouldEqual, 1)

            newIvs = ivs.Difference(oneThree)
-           So(newIvs, ShouldBeNil)
+           So(newIvs, ShouldBeEmpty)
            ivs = ivs.Merge(oneThree)
            So(len(ivs), ShouldEqual, 1)

            oneSeven := Interval{1, 7}

            newIvs = ivs.Difference(oneSix)
-           So(newIvs, ShouldBeNil)
+           So(newIvs, ShouldBeEmpty)
            ivs = ivs.Merge(oneSix)
            So(len(ivs), ShouldEqual, 1)

@@ -372,7 +372,7 @@

            fiveThirtyTwo := Interval{5, 32}
            newIvs = ivs.Difference(fiveThirtyTwo)
-           So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{31, 32}, Interval{11, 14}, Interval{19, 19}})
+           So(newIvs, ShouldResemble, Intervals{Interval{5, 6}, Interval{11, 14}, Interval{19, 19}, Interval{31, 32}})
            ivs = ivs.Merge(fiveThirtyTwo)
            So(len(ivs), ShouldEqual, 3)

@@ -409,7 +409,7 @@

            ivs = ivs.Truncate(17)

-           expected := Intervals{sevenTen, oneThree, Interval{15, 17}}
+           expected := Intervals{oneThree, sevenTen, Interval{15, 17}}
            So(ivs, ShouldResemble, expected)
        })
    })

.

$ go test
.............................................................................................................................................................................................................
205 total assertions

...
208 total assertions

PASS
$ 

I [ @sbs ] confirm it's faster than my solution. 我[ @sbs ]确认它比我的解决方案要快。 Though if you just measure the wall-time that using Difference() takes (put a before := time.Now() before the last Difference() call in the interval_test.go, and a time.Since(before) after it and sum those durations over the loop), it seems to make surprisingly little difference (on my machine it takes ~31ms with my solution and ~29ms with yours). 尽管如果仅测量使用Difference()的时间(在interval_test.go中,在最后一个Difference()调用之前放置:= time.Now()之前,然后在它之后放置time.Since(before),将循环中的持续时间相加),似乎没有什么不同(在我的机器上,我的解决方案需要31毫秒,而您的解决方案则需要29毫秒)。

As requested, interval_test.go was modified to measure wall time: 根据要求,对interval_test.go进行了修改以测量墙壁时间:

$ diff -a -u ../interval_test.go walltime_test.go
--- ../interval_test.go 2017-04-29 20:23:29.365344008 -0400
+++ walltime_test.go    2017-04-30 13:39:29.000000000 -0400
@@ -24,6 +24,7 @@
    "math/rand"
    "testing"
    "time"
+   "fmt"
 )

 func TestIntervals(t *testing.T) {
@@ -459,16 +460,20 @@

        var ivs Intervals
        errors := 0
+       var diffTime time.Duration
        t := time.Now()
        for i, input := range inputs {
            iv := NewInterval(int64(input), int64(readSize))
+           before := time.Now()
            newIvs := ivs.Difference(iv)
+           diffTime += time.Since(before)
            if (len(newIvs) == 1) != exepectedNew[i] {
                errors++
            }
            ivs = ivs.Merge(iv)
        }
-       // fmt.Printf("\ntook %s\n", time.Since(t))
+       fmt.Printf("took %s\n", time.Since(t))
+       fmt.Printf("\n  Difference took %s\n", diffTime)
        So(errors, ShouldEqual, 0)
        So(len(ivs), ShouldEqual, 1)
        So(time.Since(t).Seconds(), ShouldBeLessThan, 1) // 42ms on my machine
$ 

The interval_test.go benchmark input sizes and frequencies were interval_test.go基准输入大小和频率分别为

size    frequency
0       1
1       94929
2       50072
3       4998

Output sizes and frequencies were 输出大小和频率分别为

size    frequency
0       50000
1       100000

Tuning peterSo's Difference method for this distribution gives 调整peterSo的分布差异方法可以得出

package minfys

func (a Intervals) Difference(b Interval) Intervals {
    // If A and B are sets, then the relative complement of A in B
    // is the set of elements in B but not in A.
    // The relative complement of A in B is denoted B ∖  A:
    //     B \ A = {x ∈ A | x ∉ B}
    //     B \ A = B ∩ A'
    //
    // For example. d = a\b,
    //     a: [{10, 15}, {30, 35}, {20, 25}]
    //     b: {5,32}
    //     d: [{5,9}, {16,19}, {26,29}]
    // The elements of set a are non-overlapping, non-adjacent,
    // and unsorted intervals.

    if len(a) <= 0 {
        return Intervals{b}
    }

    var d Intervals
    for ; len(a) > 0; a = a[1:] {
        for i := 1; i < len(a); i++ {
            if a[i].Start < a[0].Start {
                a[i], a[0] = a[0], a[i]
            }
        }

        if b.Start < a[0].Start {
            if b.End < a[0].Start {
                d = append(d, b)
                break
            }
            d = append(d, Interval{b.Start, a[0].Start - 1})
            b.Start = a[0].Start
        }
        if b.End <= a[0].End {
            break
        }
        if b.Start <= a[0].End {
            b.Start = a[0].End + 1
        }
        if len(a) == 1 {
            if b.Start <= a[0].End {
                b.Start = a[0].End + 1
            }
            d = append(d, b)
            break
        }
    }
    return d
}

Running the interval_test.go benchmark for peterSO's and sbs's Difference methods gives 运行peterSO和sbs的Difference方法的interval_test.go基准可以得出

$ go test -v

  Merging many intervals is fast took 26.208614ms

  Difference took 10.706858ms

and

$ go test -v

  Merging many intervals is fast took 30.799216ms

  Difference took 14.414488ms

peterSo's Difference method is significantly faster than sbs's: 10.706858ms versus 14.414488ms or minus 25.7 percent. peterSo的差分方法比sbs的显着更快:10.706858ms对14.414488ms或负25.7%。

Updating the earlier example benchmark results for peterSO's revised Difference method gives 更新peterSO修订后的Difference方法的示例示例基准测试结果后,

old.txt (sbs) versus new.txt (peterSO):

benchmark              old ns/op     new ns/op     delta
BenchmarkExample-4     365           221           -39.45%

benchmark              old allocs     new allocs   delta
BenchmarkExample-4     4              3            -25.00%

benchmark              old bytes     new bytes     delta
BenchmarkExample-4     128           112           -12.50%

To answer my own question, here's my implementation of Difference() that is faster (on my input data) than eg. 为了回答我自己的问题,这是我的Difference()的实现(在我的输入数据上)比例如更快的实现。 JimB's suggestion that required a sort. JimB的建议需要进行排序。

func (i *Interval) Overlaps(j Interval) bool {
    return i.End >= j.Start && j.End >= i.Start
}

func (i *Interval) Difference(j Interval) (left *Interval, right *Interval, overlapped bool) {
    if !i.Overlaps(j) {
        return
    }

    overlapped = true
    if j.Start < i.Start {
        left = &Interval{j.Start, i.Start - 1}
    }
    if j.End > i.End {
        right = &Interval{i.End + 1, j.End}
    }
    return
}

func (ivs Intervals) Difference(iv Interval) (diffs Intervals) {
    diffs = append(diffs, iv)
    for _, prior := range ivs {
        for i := 0; i < len(diffs); {
            if left, right, overlapped := prior.Difference(diffs[i]); overlapped {
                if len(diffs) == 1 {
                    diffs = nil
                } else {
                    diffs = append(diffs[:i], diffs[i+1:]...)
                }

                if left != nil {
                    diffs = append(diffs, *left)
                }
                if right != nil {
                    diffs = append(diffs, *right)
                }
            } else {
                i++
            }
        }
        if len(diffs) == 0 {
            break
        }
    }
    return
}

It works on the data I've tried, though I'm a little worried I might have missed an edge case where it gets the wrong answer. 它可以在我尝试过的数据上运行,尽管我有点担心我可能会错过一个错误答案的极端情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 求两个整数区间的差集 - Find set difference between two integer intervals 在间隔集合中查找间隔模式 - Find interval pattern in set of intervals 将间隔插入不相交的间隔中 - Insert Interval into a disjoint set of intervals 使用Binary Search将Interval插入不相交的间隔中 - Insert Interval into a disjoint set of intervals with Binary Search 从时间间隔列表中查找所有时间间隔集,其中一个集合中的每个EACH时间间隔与该集合中的所有时间间隔重叠 - From a List of Intervals, Finding all Sets of Intervals where EACH Interval in One Set Overlaps with All Intervals in that Set 用于查询给定区间是否被一组其他区间包围的数据结构 - Data structure for querying if a given interval is enclosed by a set of other intervals 给定一组区间,找到具有最大交点数的区间 - Given a set of intervals, find the interval which has the maximum number of intersections 间隔表中的间隔 - interval in intervals table 合并2组间隔,同时将每个间隔从set1限制到set2的边界 - Merge 2 sets of intervals while constraining each interval from set1 to the bounds of set2 给定一组间隔S.您必须在最小时间复杂度中找到S中包含在给定间隔(a,b)中的所有间隔 - You are given a set of intervals S. You have to find all intervals in S that are contained in a given interval (a, b) in minimum time complexity
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM