简体   繁体   English

排序后获取未排序的双精度数组的索引

[英]Getting the indices of an unsorted double array after sorting

This question comes as a companion of this one that regarded fastest sorting of a double array. 这个问题来,因为这的同伴一个是认为最快的双数组排序。

Now I want to get the top- k indices corresponding to the unsorted array. 现在,我想获取对应于未排序数组的前k索引。

I have implemented this version which (unfortunately) uses autoboxing and HashMap as proposed in some answers including this one : 我已经实现了这个版本,它(不幸)使用自动装箱和HashMap ,如一些答案,包括这个建议一个

HashMap<Double, Integer> map = new HashMap<Double, Integer>();
for(int i = 0; i < numClusters; i++) {
    map.put(scores[i], i);
}
Arrays.sort(scores);
HashSet<Integer> topPossibleClusters = new HashSet<Integer>();
for(int i = 0; i < numClusters; i++) {
    topPossibleClusters.add(map.get(scores[numClusters - (i+1)]));
}

As you can see this uses a HashMap with keys the Double values of the original array and as values the indices of the original array. 如您所见,这使用HashMap ,其键具有原始数组的Double值和原始数组的索引作为键的值。 So, after sorting the original array I just retrieve it from the map . 因此,在对原始数组进行排序之后,我只是从map检索它。

I also use HashSet as I am interested in deciding if an int is included in this set, using .contains() method. 我也使用HashSet因为我有兴趣使用.contains()方法确定此集合中是否包含int (I don't know if this makes a difference since as I mentioned in the other question my arrays are small -50 elements-). (我不知道这是否有区别,因为正如我在另一个问题中提到的那样,我的数组很小-50个元素-)。 If this does not make a difference point it out though. 如果没有什么区别,请指出。

I am not interested in the value per se, only the indices. 我对价值本身不感兴趣,仅对指数感兴趣。

My question is whether there is a faster approach to go with it? 我的问题是,是否有更快的方法?

This sort of interlinking/interlocking collections lends itself to fragile, easily broken, hard to debug, unmaintainable code. 这种相互链接/互锁的集合使自己容易碎,容易损坏,难以调试,无法维护的代码。

Instead create an object: 而是创建一个对象:

class Data {
    double value;
    int originalIndex;
}

Create an array of Data objects storing the original value and index. 创建一个存储原始值和索引的Data对象数组。

Sort them using a custom comparator that looks at data.value and sorts descending. 使用自定义比较器对它们进行排序,该比较器查看data.value并对降序进行排序。

Now the top X items in your array are the ones you want and you can just look at the value and originalIndex as you need them. 现在,数组中最重要的X项就是所需的项,您可以根据需要查看valueoriginalIndex

As Tim points out linking a multiple collections is rather errorprone. 正如Tim指出的那样,链接多个集合很容易出错。 I would suggest using a TreeMap as this would allow for a standalone solution. 我建议使用TreeMap因为这将允许一个独立的解决方案。

Lets say you have double[] data , first copy it to a TreeMap : 假设您有double[] data ,首先将其复制到TreeMap

final TreeMap<Double, Integer> dataWithIndex = new TreeMap<>();
for(int i = 0; i < data.length; ++i) {
    dataWithIndex.put(data[i], i);
}

NB You can declare dataWithIndex as a NavigableMap to be less specific, but it's so much longer and it doesn't really add much as there is only one implementation in the JDK. 注意:您可以将dataWithIndex声明为NavigableMap不太具体,但是它要长得多,并且实际上并没有增加太多,因为JDK中只有一个实现。

This will populate the Map in O(n lg n) time as each put is O(lg n) - this is the same complexity as sorting. 这将在O(n lg n)时间内填充Map ,因为每个put均为O(lg n) -这与排序的复杂度相同。 In reality it will be probably be a little slower, but it will scale in the same way . 实际上,它可能会稍微慢一些,但它会以相同的方式扩展

Now, say you need the first k elements, you need to first find the k th element - this is O(k) : 现在,假设您需要第k元素,首先需要找到第k个元素-这是O(k)

final Iterator<Double> keyIter = dataWithIndex.keySet().iterator();
double kthKey;
for (int i = 0; i < k; ++i) {
    kthKey = keyIter.next();
}

Now you just need to get the sub-map that has all the entries upto the kth entry: 现在,您只需要获取具有所有条目直到第k个条目的子映射:

final Map<Double, Integer> topK = dataWithIndex.headMap(kthKey, true);

If you only need to do this once, then with Java 8 you can do something like this: 如果只需要执行一次,那么使用Java 8可以执行以下操作:

List<Entry<Double, Integer>> topK = IntStream.range(0, data.length).
        mapToObj(i -> new SimpleEntry<>(data[i], i)).
        sorted(comparing(Entry::getKey)).
        limit(k).
        collect(toList());

ie take an IntStream for the indices of data and mapToObj to an Entry of the data[i] => i (using the AbsractMap.SimpleEntry implementation). IntStream ,使用IntStream获取data索引,并将mapToObjdata[i] => iEntry (使用AbsractMap.SimpleEntry实现)。 Now sort that using Entry::getKey and limit the size of the Stream to k entries. 现在使用Entry::getKey排序,并将Stream的大小限制为k个条目。 Now simply collect the result to a List . 现在,只需将结果收集到List This has the advantage of not clobbering duplicate entries in the data array. 这具有不破坏data阵列中重复项的优点。

It is almost exactly what Tim suggests in his answer, but using an existing JDK class. 这几乎完全是Tim在他的答案中所建议的,但是使用了现有的JDK类。

This method is also O(n lg n) . 该方法也是O(n lg n) The catch is that if the TreeMap approach is reused then it's O(n lg n) to build the Map but only O(k) to reuse it. 要注意的是,如果重用TreeMap方法,则O(n lg n)来构建Map而只有O(k)可以重用它。 If you want to use the Java 8 solution with reuse then you can do: 如果要重复使用Java 8解决方案,则可以执行以下操作:

List<Entry<Double, Integer>> sorted = IntStream.range(0, data.length).
        mapToObj(i -> new SimpleEntry<>(data[i], i)).
        sorted(comparing(Entry::getKey)).
        collect(toList());

ie don't limit the size to k elements. 即不要将大小限制为k元素。 Now, to get the first k elements you just need to do: 现在,要获取前k元素,您只需要做:

List<Entry<Double, Integer>> subList = sorted.subList(0, k);

The magic of this is that it's O(1) . 这样做的魔力在于它是O(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM