简体   繁体   English

给定每个列表中最多N个元素的K个排序列表,请对所有项目返回一个排序的迭代器

[英]Given K sorted lists of up to N elements in each list, return a sorted iterator over all the items

Example: List 1: [1, 4, 5, 8, 9]
     List 2: [3, 4, 4, 6]
     List 3: [0, 2, 8]
    Would yield the following result:

    Iterator -> [0, 1, 2, 3, 4, 4, 4, 5, 6, 8, 8, 9]

I am reluctant to create a "merge" method that accepts the k lists and merges the contents of the List to another List in the spirit of space complexity. 我不愿意创建一个“合并”方法来接受k个列表,并出于空间复杂性的考虑将List的内容合并到另一个List中。 Is this a k-way merge problem that can be implemented using "min Heap". 这是可以使用“最小堆”实现的k路合并问题。 Any pointers would be very helpful. 任何指针将非常有帮助。

public class CustomListIterator<E> implements Iterator<E>{

private boolean canAddIterators = true;
private boolean balanceTreeIteratorFlag = false;
private E f_element;
private E s_element;
private Iterator<E> left;
private Iterator<E> right;
private final Comparator<E> comparator;

public CustomListIterator(Comparator<E> comparator){
    this.comparator = comparator;
}

public CustomListIterator(Iterator<E> left, Iterator<E> right, Comparator<E> comparator){
    this.left = left;
    this.right = right;
    this.comparator = comparator;
}

public void addIterator(Iterator<E> iterator){
    if (!canAddIterators)
        throw new ConcurrentModificationException();

    if (right == null){
        right = iterator;
        return;
    }else if (left == null){
        left = iterator;
        return;
    }

    if (!balanceTreeIteratorFlag){
        right = balanceTreeOfIterators(iterator, right);
    }else{
        left = balanceTreeOfIterators(iterator, left);
    }

    balanceTreeIteratorFlag = !balanceTreeIteratorFlag;
}

private Iterator<E> balanceTreeOfIterators(Iterator<E> iterator_1, Iterator<E> iterator_2){
    if (iterator_2 instanceof CustomListIterator){
        ((CustomListIterator<E>)iterator_2).addIterator(iterator_1);
    } else{
        iterator_2 = new CustomListIterator<E>(iterator_1, iterator_2, comparator);
    }
    return iterator_2;
}

public boolean hasNext() {
    if (canAddIterators){
        if (left != null && left.hasNext()){
            f_element = left.next();
        }
        if (right != null && right.hasNext()){
            s_element = right.next();
        }
    }
    canAddIterators = false;
    return f_element != null || s_element != null;
}

public E next() {
    E next;
    if (canAddIterators){
        if (left.hasNext()){
            f_element = left.next();
        }
        if (right.hasNext()){
            s_element = right.next();
        }
    }

    canAddIterators = false;

    if (s_element == null && f_element == null){
        throw new NoSuchElementException();
    }

    if (f_element == null){
        next = s_element;
        s_element = right.hasNext() ? right.next() : null;
        return next;
    }

    if (s_element == null){
        next = f_element;
        f_element = left.hasNext() ? left.next() : null;
        return next;
    }

    return findNext();
}

public void remove() {

}

private E findNext(){
    E next;
    if (comparator.compare(f_element, s_element) < 0){
        next = f_element;
        f_element = left.hasNext() ? left.next() : null;
        return next;
    }
    next = s_element;
    s_element = right.hasNext() ? right.next() : null;
    return next;
}

} }

I don't this is the most optimal way of doing it (using a tree). 我不是这样做的最佳方法(使用树)。 Any suggestions on how this can be implemented only by overriding next() hasNext() and remove()? 关于如何仅通过覆盖next()hasNext()和remove()可以实现的任何建议?

There are basically three different ways to merge multiple sorted lists: 基本上有三种不同的方式来合并多个排序列表:

  1. Successive two-way merges 连续两路合并
  2. Divide and conquer 分而治之
  3. Priority queue based 基于优先队列

In the discussion below, n refers to the total number of items in all lists combined. 在下面的讨论中, n所有列表的总和。 k refers to the number of lists. k是指列表数。

Case 1 is the easiest to envision, but also the least efficient. 情况1最容易设想,但效率最低。 Imagine you're given four lists, A, B, C, and D. With this method, you merge A and B to create AB. 假设您得到四个列表,A,B,C和D。使用此方法,您可以合并A和B来创建AB。 Then you merge AB and C to create ABC. 然后,将AB和C合并以创建ABC。 Finally, you merge ABC with D to create ABCD. 最后,将ABC与D合并以创建ABCD。 The complexity of this algorithm approaches O(n*k). 该算法的复杂度接近O(n * k)。 You iterate over A and B three times, C two times, and D one time. 您对A和B进行了三次迭代,对C进行了两次迭代,对D进行了一次迭代。

The divide and conquer solution is to merge A and B to create AB. 分而治之的解决方案是将A和B合并以创建AB。 Then merge C and D to create CD. 然后合并C和D以创建CD。 Then merge AB and CD to create ABCD. 然后合并AB和CD以创建ABCD。 In the best case, which occurs when the lists have similar numbers of items, this method is O(n * log(k)). 在最佳情况下(当列表具有相似数量的项目时发生),此方法为O(n * log(k))。 But if the lists' lengths vary widely, this algorithm's running time can approach O(n*k). 但是,如果列表的长度相差很大,则该算法的运行时间可以接近O(n * k)。

For more information about these two algorithms, see my blog entry, A closer look at pairwise merging . 有关这两种算法的更多信息,请参阅我的博客条目“ 成对合并” For more details about the divide and conquer approach specifically, see A different way to merge multiple lists . 有关具体的分而治之方法的更多详细信息,请参见合并多个列表的一种方法

The priority queue based merge works as follows: 基于优先级队列的合并工作如下:

Create a priority queue to hold the iterator for each list
while the priority queue is not empty
    Remove the iterator that references the smallest current number
    Output the referenced value
    If not at end of iterator
        Add the iterator back to the queue

This algorithm is proven to be O(n * log(k)) in the worst case . 在最坏的情况下,该算法被证明为O(n * log(k))。 You can see that every item in every list is added to the priority queue exactly once, and removed from the priority queue exactly once. 您可以看到,每个列表中的每个项目仅一次添加到优先级队列中,并仅一次从优先级队列中删除。 But the queue only contains k items at any time. 但是该队列在任何时候仅包含k项目。 So the memory requirements are very small. 因此,内存需求非常小。

The implementation of iterators in Java makes the priority queue implementation slightly inconvenient, but it's easily fixed with some helper classes. Java中迭代器的实现使优先级队列的实现略有不便,但是可以通过一些帮助器类轻松解决。 Most importantly, we need an iterator that lets us peek at the next item without consuming it. 最重要的是,我们需要一个迭代器,使我们可以在不消耗下一项的情况下窥视下一项。 I call this a PeekableIterator , which looks like this: 我称其为PeekableIterator ,它看起来像这样:

// PeekableIterator is an iterator that lets us peek at the next item
// without consuming it.
public class PeekableIterator<E> implements Iterator<E> {
    private final Iterator<E> iterator;
    private E current;
    private boolean hasCurrent;

    public PeekableIterator(Iterator<E> iterator) {
        this.iterator = iterator;
        if (iterator.hasNext()) {
            current = iterator.next();
            hasCurrent = true;
        }
        else {
            hasCurrent = false;
        }
    }

    public E getCurrent() {
        // TODO: Check for current item
        return current;
    }

    public boolean hasNext() {
        return hasCurrent;
    }

    public E next() {
        // TODO: Error check to see if there is a current
        E rslt = current;
        if (iterator.hasNext()) {
            current = iterator.next();
        }
        else {
            hasCurrent = false;
        }
        return rslt;
    }

    public void remove() {
        iterator.remove();
    }

Then, since the priority queue will hold iterators rather than individual items, we need a comparator that will compare the current items of two PeekableIterator interfaces. 然后,由于优先级队列将保存迭代器而不是单个项目,因此我们需要一个比较器,该比较器将比较两个PeekableIterator接口的当前项目。 That's easy enough to create: 创建起来很容易:

// IteratorComparator lets us compare the next items for two PeekableIterator instances.
public class IteratorComparator<E> implements Comparator<PeekableIterator<E>> {
    private final Comparator<E> comparator;

    public IteratorComparator(Comparator<E> comparator) {
        this.comparator = comparator;
    }

    public int compare(PeekableIterator<E> t1, PeekableIterator<E> t2) {
        int rslt = comparator.compare(t1.getCurrent(), t2.getCurrent());
        return rslt;
    }
}

Those two classes are more formal implementations of the code you wrote to get and compare the next items for individual iterators. 这两个类是您编写的代码的更正式的实现,用于获取和比较各个迭代器的下一项。

Finally, the MergeIterator initializes a PriorityQueue<PeekableIterator> so that you can call the hasNext and next methods to iterate over the merged lists: 最后, MergeIterator初始化一个PriorityQueue<PeekableIterator>以便您可以调用hasNextnext方法来迭代合并后的列表:

// MergeIterator merges items from multiple sorted iterators
// to produce a single sorted sequence.
public class MergeIterator<E> implements Iterator<E> {
    private final IteratorComparator<E> comparator;
    private final PriorityQueue<PeekableIterator<E>> pqueue;

    // call with an array or list of sequences to merge
    public MergeIterator(List<Iterator<E>> iterators, Comparator<E> comparator) {
        this.comparator = new IteratorComparator<E>(comparator);

        // initial capacity set to 11 because that's the default,
        // and there's no constructor that lets me supply a comparator without the capacity.
        pqueue = new PriorityQueue<PeekableIterator<E>>(11, this.comparator);

        // add iterators to the priority queue
        for (Iterator<E> iterator : iterators) {
            // but only if the iterator actually has items
            if (iterator.hasNext())
            {
                pqueue.offer(new PeekableIterator(iterator));
            }
        }
    }

    public boolean hasNext() {
        return pqueue.size() > 0;
    }

    public E next() {
        PeekableIterator<E> iterator = pqueue.poll();
        E rslt = iterator.next();
        if (iterator.hasNext()) {
            pqueue.offer(iterator);
        }
        return rslt;
    }

    public void remove() {
        // TODO: Throw UnsupportedOperationException
    }
}

I've created a little test program to demonstrate how this works: 我创建了一个小测试程序来演示其工作原理:

private void DoIt() {
    String[] a1 = new String[] {"apple", "cherry", "grape", "peach", "strawberry"};
    String[] a2 = new String[] {"banana", "fig", "orange"};
    String[] a3 = new String[] {"cherry", "kumquat", "pear", "pineapple"};

    // create an ArrayList of iterators that we can pass to the
    // MergeIterator constructor.
    ArrayList<Iterator<String>> iterators = new ArrayList<Iterator<String>> (
            Arrays.asList(
                    Arrays.asList(a1).iterator(),
                    Arrays.asList(a2).iterator(),
                    Arrays.asList(a3).iterator())
    );

    // String.CASE_INSENSITIVE_ORDER is a Java 8 way to get
    // a String comparator. If there's a better way to do this,
    // I don't know what it is.
    MergeIterator<String> merger = new MergeIterator(iterators, String.CASE_INSENSITIVE_ORDER);
    while (merger.hasNext())
    {
        String s = merger.next();
        System.out.println(s);
    }
}

My performance comparisons of the divide-and-conquer and priority queue merges shows that the divide-and-conquer approach can be faster than using the priority queue, depending on the cost of comparisons. 我对分治法和优先级队列合并的性能比较表明,分治法可能比使用优先级队列更快,这取决于比较的成本。 When comparisons are cheap (primitive types, for example), the pairwise merge is faster even though it does more work. 当比较便宜时(例如,原始类型),成对合并会更快,即使它会做更多的工作。 As key comparisons become more expensive (like comparing strings), the priority queue merge has the advantage because it performs fewer comparisons. 随着键比较变得更加昂贵(例如比较字符串),优先级队列合并具有优势,因为它执行的比较较少。

More importantly, the pairwise merge requires twice the memory of the priority queue approach. 更重要的是,成对合并需要优先级队列方法的两倍内存。 My implementation used a FIFO queue, but even if I built a tree the pairwise merge would require more memory. 我的实现使用了FIFO队列,但是即使我构建了树,成对合并也将需要更多内存。 Also, as your code shows, you still need the PeekableIterator and IteratorComparator classes (or something similar) if you want to implement the pairwise merge. 另外,如代码所示,如​​果要实现成对合并,仍然需要PeekableIteratorIteratorComparator类(或类似的类)。

See Testing merge performance for more details about the relative performance of these two methods. 有关这两种方法的相对性能的更多详细信息,请参见测试合并性能

For the reasons I detailed above, I conclude that the priority queue merge is the best way to go. 由于上面我详述的原因,我得出结论,优先级队列合并是最好的选择。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 给定一组n个整数,返回总和为0的k个元素的所有子集 - given a set of n integers, return all subsets of k elements that sum to 0 给定两个排序的列表(或数组)和一个数字k,创建一种算法来获取两个列表中最少的k个数字 - Given two sorted lists (or arrays) and a number k, create an algorithm to fetch the least k numbers of the two lists 将元素放入排序列表 - Putting elements into a sorted list 用Java返回排序列表 - Return sorted list in Java 鉴于0 &lt; k &lt; n,并且在java中的O(k log n)时间,如何在大小为n的排序数组中获得超过n / k次的任何integer? - how to get any integer in a sorted array of size n that appear more than n/k times, given that 0 < k < n, and in O(k log n) time in java? 从排序数组的数组中获取N个排序项 - Getting N sorted items from an array of sorted arrays k个没有重复项的排序数组的迭代器实现-面试问题 - Iterator implementaion for k sorted arrays with no duplicates - interview question 将许多短排序列表有效地合并到长排序列表中 - Merge many short sorted lists into a long sorted list efficiently 如何创建2个排序列表的递归合并,从而产生排序的合并列表 - how to create recursive merge of 2 sorted lists resulting in sorted merged list 给定排序列表如何创建在 O(log(N)) 时间内不会落在某个范围内的整数列表? - Given sorted list how to create list of integers that wont fall in certain range in O(log(N)) time?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM