[英]Getting the indices of an unsorted double array after sorting
This question comes as a companion of this one that regarded fastest sorting of a double array. 这个问题来,因为这的同伴一个是认为最快的双数组排序。
Now I want to get the top- k
indices corresponding to the unsorted array. 现在,我想获取对应于未排序数组的前k
索引。
I have implemented this version which (unfortunately) uses autoboxing and HashMap
as proposed in some answers including this one : 我已经实现了这个版本,它(不幸)使用自动装箱和HashMap
,如一些答案,包括这个建议一个 :
HashMap<Double, Integer> map = new HashMap<Double, Integer>();
for(int i = 0; i < numClusters; i++) {
map.put(scores[i], i);
}
Arrays.sort(scores);
HashSet<Integer> topPossibleClusters = new HashSet<Integer>();
for(int i = 0; i < numClusters; i++) {
topPossibleClusters.add(map.get(scores[numClusters - (i+1)]));
}
As you can see this uses a HashMap
with keys the Double
values of the original array and as values the indices of the original array. 如您所见,这使用HashMap
,其键具有原始数组的Double
值和原始数组的索引作为键的值。 So, after sorting the original array I just retrieve it from the map
. 因此,在对原始数组进行排序之后,我只是从map
检索它。
I also use HashSet
as I am interested in deciding if an int
is included in this set, using .contains()
method. 我也使用HashSet
因为我有兴趣使用.contains()
方法确定此集合中是否包含int
。 (I don't know if this makes a difference since as I mentioned in the other question my arrays are small -50 elements-). (我不知道这是否有区别,因为正如我在另一个问题中提到的那样,我的数组很小-50个元素-)。 If this does not make a difference point it out though. 如果没有什么区别,请指出。
I am not interested in the value per se, only the indices. 我对价值本身不感兴趣,仅对指数感兴趣。
My question is whether there is a faster approach to go with it? 我的问题是,是否有更快的方法?
This sort of interlinking/interlocking collections lends itself to fragile, easily broken, hard to debug, unmaintainable code. 这种相互链接/互锁的集合使自己容易碎,容易损坏,难以调试,无法维护的代码。
Instead create an object: 而是创建一个对象:
class Data {
double value;
int originalIndex;
}
Create an array of Data objects storing the original value and index. 创建一个存储原始值和索引的Data对象数组。
Sort them using a custom comparator that looks at data.value and sorts descending. 使用自定义比较器对它们进行排序,该比较器查看data.value并对降序进行排序。
Now the top X items in your array are the ones you want and you can just look at the value
and originalIndex
as you need them. 现在,数组中最重要的X项就是所需的项,您可以根据需要查看value
和originalIndex
。
As Tim points out linking a multiple collections is rather errorprone. 正如Tim指出的那样,链接多个集合很容易出错。 I would suggest using a TreeMap
as this would allow for a standalone solution. 我建议使用TreeMap
因为这将允许一个独立的解决方案。
Lets say you have double[] data
, first copy it to a TreeMap
: 假设您有double[] data
,首先将其复制到TreeMap
:
final TreeMap<Double, Integer> dataWithIndex = new TreeMap<>();
for(int i = 0; i < data.length; ++i) {
dataWithIndex.put(data[i], i);
}
NB You can declare dataWithIndex
as a NavigableMap
to be less specific, but it's so much longer and it doesn't really add much as there is only one implementation in the JDK. 注意:您可以将dataWithIndex
声明为NavigableMap
不太具体,但是它要长得多,并且实际上并没有增加太多,因为JDK中只有一个实现。
This will populate the Map
in O(n lg n)
time as each put
is O(lg n)
- this is the same complexity as sorting. 这将在O(n lg n)
时间内填充Map
,因为每个put
均为O(lg n)
-这与排序的复杂度相同。 In reality it will be probably be a little slower, but it will scale in the same way . 实际上,它可能会稍微慢一些,但它会以相同的方式扩展 。
Now, say you need the first k
elements, you need to first find the k
th element - this is O(k)
: 现在,假设您需要第k
元素,首先需要找到第k
个元素-这是O(k)
:
final Iterator<Double> keyIter = dataWithIndex.keySet().iterator();
double kthKey;
for (int i = 0; i < k; ++i) {
kthKey = keyIter.next();
}
Now you just need to get the sub-map that has all the entries upto the kth entry: 现在,您只需要获取具有所有条目直到第k个条目的子映射:
final Map<Double, Integer> topK = dataWithIndex.headMap(kthKey, true);
If you only need to do this once, then with Java 8 you can do something like this: 如果只需要执行一次,那么使用Java 8可以执行以下操作:
List<Entry<Double, Integer>> topK = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
limit(k).
collect(toList());
ie take an IntStream
for the indices of data
and mapToObj
to an Entry
of the data[i] => i
(using the AbsractMap.SimpleEntry
implementation). IntStream
,使用IntStream
获取data
索引,并将mapToObj
到data[i] => i
的Entry
(使用AbsractMap.SimpleEntry
实现)。 Now sort that using Entry::getKey
and limit the size of the Stream
to k
entries. 现在使用Entry::getKey
排序,并将Stream
的大小限制为k
个条目。 Now simply collect the result to a List
. 现在,只需将结果收集到List
。 This has the advantage of not clobbering duplicate entries in the data
array. 这具有不破坏data
阵列中重复项的优点。
It is almost exactly what Tim suggests in his answer, but using an existing JDK class. 这几乎完全是Tim在他的答案中所建议的,但是使用了现有的JDK类。
This method is also O(n lg n)
. 该方法也是O(n lg n)
。 The catch is that if the TreeMap
approach is reused then it's O(n lg n)
to build the Map
but only O(k)
to reuse it. 要注意的是,如果重用TreeMap
方法,则O(n lg n)
来构建Map
而只有O(k)
可以重用它。 If you want to use the Java 8 solution with reuse then you can do: 如果要重复使用Java 8解决方案,则可以执行以下操作:
List<Entry<Double, Integer>> sorted = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
collect(toList());
ie don't limit the size to k
elements. 即不要将大小限制为k
元素。 Now, to get the first k
elements you just need to do: 现在,要获取前k
元素,您只需要做:
List<Entry<Double, Integer>> subList = sorted.subList(0, k);
The magic of this is that it's O(1)
. 这样做的魔力在于它是O(1)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.