简体   繁体   English

在Java矩阵中查找“匹配项”的有效方法

[英]Efficient way to find “matches” in a matrix in java

I have an existing (java) application that models an order book, as it stands each order is visible to every other. 我有一个为订单簿建模的现有(java)应用程序,因为它可以使每个订单彼此可见。 There is now a requirement to put (what is effectively) a per order ACL in place. 现在需要放置(有效的)每个订单的ACL。

An example to illustrate, lets say I have access groups [VZ] and orders [AF] 举例说明,假设我有访问组[VZ]和订单[AF]

      A B C D E F
   V  1 0 0 1 1 0
   W  0 1 1 0 0 1
   X  0 0 0 0 1 1
   Y  1 1 0 0 0 1
   Z  0 1 0 1 0 0

A new order comes in that specifies visibility as W & Y. What would be a fast way to return the set of values that can be seen by the incoming order? 出现一个新订单,将可见性指定为W&Y。什么是返回传入订单可以看到的一组值的快速方法?

One implementation that has been suggested is to represent each row as a BitSet and do W | 建议的一种实现方式是将每一行表示为一个BitSet并执行W | | | | |。 Y though I wonder what will happen to performance as the size of the matrix increases. 是的,尽管我想知道随着矩阵尺寸的增加性能会发生什么变化。

A nice to have but not essential feature is to allow for a parent-child relationship on one dimension like 一个不错但不是必不可少的功能是允许在一个维度上建立父子关系,例如

        A B C D E F
   V    1 0 0 1 1 0
   W    0 1 1 0 0 1
   X-1  0 0 0 0 1 1
   X-2  1 0 0 0 1 1
   X-3  0 1 0 0 1 1
   Y    1 1 0 0 0 1
   Z    0 1 0 1 0 0

It would be ideal if it were similarly efficient to retrieve "W | X" as "W | X-1" 如果将“ W | X”检索为“ W | X-1”同样有效,那将是理想的

Any hints in the direction of an algorithm and/or appropriate data structure much appreciated. 非常感谢在算法和/或适当的数据结构方向上的任何提示。

The simple solution: 简单的解决方案:

class AccessGroupName { ... }
class Order { ... }

Map<AccessGroupName, Collection<Order>> visibility = new HashMap<AccessGroupName, Collection<Order>>();

addVisibility(AccessGroupName group, Order order) {
    Collection<Order> orders = visibilities.get(group);
    if (orders == null) {
        orders = new ArrayList<Order>();
        visibility.put(group, orders);
    }
    if (!orders.contains(order)) orders.add(order);
}

public Set<Order> getVisibility(Collection<AccessGroupName> names) {
    Set<Order> visible = new HashSet<Order>();
    for (AccessGroupName name: names) {
        visible.addAll(visibilities.get(name));
    }
    return visible;
}

HashMap lookups are O(1). HashMap查找为O(1)。 Iterating an ArrayList is O(n). 迭代ArrayList为O(n)。 Adding items to a HashSet is O(n). 将项目添加到HashSet是O(n)。 Overall, this will be O(n) where n is the total number of elements in the added lists (which might be more than the number of elements in the resulting set if there's overlap). 总体而言,这将是O(n),其中n是添加列表中元素的总数(如果有重叠,则可能会超过结果集中元素的数量)。 The constant is, roughly, the time it takes to get an element from an ArrayList iterator plus the time it takes to add something to a HashSet - the former is on the order of 10 cycles, the latter nearer 100. 大致来说,该常数是从ArrayList迭代器获取元素所花费的时间加上向HashSet中添加内容所花费的时间-前者约为10个周期,后者接近100个周期。

Memory use, over and above the AccessGroupName and Order instances themselves, is about 14-15 words per group plus 1-2 words per order. 除了AccessGroupName和Order实例本身之外,内存使用情况每组大约14-15个字,每个订单大约1-2个字。 Mostly object headers. 主要是对象标头。

This code doesn't do anything clever, but i think you'll be pretty hard pressed to beat O(n) with a constant of <200 cycles. 这段代码没有做任何聪明的事情,但是我认为要以小于200个周期的常数击败O(n)会非常困难。

In particular, if the notional matrix is sparse (that is, if there are lots of access groups with a few orders each), this will beat the pants off a bitset approach, which will waste a hell of a lot of space on zeroes, and time on ORing zeroes together. 尤其是,如果概念矩阵稀疏(也就是说,如果有很多访问组,每个访问组有几个顺序),那么这将使比特集方法不堪一击,这将浪费大量零空间,和在一起进行或运算的时间为零。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM