简体   繁体   English

.contains()方法对Hashtable的速度将有多快 <ArrayList<String> ,boolean&gt;比ArrayList <ArrayList<String> &gt;?

[英]How much faster will a .contains() method be for a Hashtable<ArrayList<String>,boolean> than an ArrayList<ArrayList<String>>?

I basically am doing the following: 我基本上是在做以下事情:

  • Dumping an entire row of data from a DB table as Strings into an ArrayList< ArrayList< String>> . 将数据库表中的整个数据行作为字符串转储到ArrayList< ArrayList< String>>
  • Doing the same thing for another DB table. 对另一个数据库表执行相同的操作。

  • Finding all the rows ArrayList< String> in the first DB in the second one by iterating across it and doing a.contains(b.get(i)) . 通过遍历第二个数据库并执行a.contains(b.get(i))a.contains(b.get(i))第二个数据库中第一个数据库中的所有行ArrayList< String> If the contains is true then I do a.remove(b.get(i)) 如果包含为true则执行a.remove(b.get(i))

Now, how much faster would it be if I instead used an Hashtable< Arraylist< String>> instead of the ArrayList mentioned above using a.containsKey(i.getKey()) where i is an iterator over b and then removing by using i.remove ? 现在,如果我改为使用Hashtable <Arraylist <String >>而不是上面使用a.containsKey(i.getKey())提到的ArrayList的速度要快多少,其中i是b上的迭代器,然后使用i删除。去掉 ? Will it be a good enough increase to make the change? 进行更改是否足够好?

Also, would using a Hashmap be more prudent? 另外,使用Hashmap会更谨慎吗? If so why... 如果是这样,为什么...

My bottom-up answer: 我的自下而上的答案:

  • The difference between Hashtable and HashMap has been (thoroughly) discussed in Differences between HashMap and Hashtable? Hashtable和HashMap之间的区别已经(彻底) 在HashMap和Hashtable之间的区别中进行了讨论 . Short summary: HashMap is more efficient and should be used instead of Hashtable. 简短摘要:HashMap效率更高,应该代替Hashtable使用。

  • Finding data in a hash data structure (the contains() and remove() operations) is of the order O(log2) - that is, it is proportional to the 2-logarithm of the number of data points in the structure. 在哈希数据结构(contains()和remove()操作)中查找数据的顺序为O(log2)-也就是说,它与结构中数据点数量的2对数成正比。 If there are 4 data elements it takes X time; 如果有4个数据元素,则需要X的时间。 if there are 8 elements it takes 2X time, 16 elements, 3X time and so on. 如果有8个元素,则需要2倍的时间,16个元素和3倍的时间,依此类推。 The data access time of hash structures grows very slowly. 哈希结构的数据访问时间增长非常缓慢。
    Finding data in a list is of the order O(N) - that is, directly proportional to the number of elements in the list. 在列表中查找数据的顺序为O(N)-即与列表中元素的数量成正比。 1 element takes Y time, 2 elements takes 2Y time, 4 elements takes 4Y time and so on. 1个元素花费Y时间,2个元素花费2Y时间,4个元素花费4Y时间,依此类推。 So the time consumption grows linearly with the size of the list. 因此,时间消耗随列表的大小线性增长。

  • So: if you have to find a large number of elements randomly from a data structure, a hash data structure is the best choice, as long as: 因此:如果必须从数据结构中随机查找大量元素,则哈希数据结构是最佳选择,只要:
    - the data has a decent hashCode() implementation (the one for ArrayList is OK) -数据具有不错的hashCode()实现(用于ArrayList的实现是可以的)
    - the data has hashCode() and equals() implementations that match each other, ie. -数据具有彼此匹配的hashCode()和equals()实现,即。 if a.equals(b) then a.hashCode() == b.hashCode(). 如果a.equals(b),则a.hashCode()== b.hashCode()。 This is also true for ArrayList. 对于ArrayList也是如此。

  • If, on the other hand, you're working with ordered data, there are other algorithms that can reduce the search and remove time substantially. 另一方面,如果您正在使用有序数据,则可以使用其他算法来减少搜索并显着减少时间。 If the data in the database is indexed it may be worthwhile to use ORDER BY when fetching the data and then use an algorithm for ordered data. 如果数据库中的数据已建立索引,则在提取数据时可能值得使用ORDER BY,然后对有序数据使用算法。

To summarize: use HashMap instead of ArrayList for list a. 总结一下:使用HashMap代替ArrayList作为列表a。

I wrote a small program to benchmark the problem. 我编写了一个小程序来对问题进行基准测试。 Results first: program ran on Sun JVM 1.6.0_41 for Windows 7, 32 bits, on a Core i5 2.40 GHz CPU. 结果优先:程序在Core i5 2.40 GHz CPU上的Windows 7(32位)的Sun JVM 1.6.0_41上运行。 Printout: 打印:

For 1000 words: List: 1 ms, Map: 2 ms
For 5000 words: List: 15 ms, Map: 12 ms
For 10000 words: List: 57 ms, Map: 12 ms
For 20000 words: List: 217 ms, Map: 37 ms
For 30000 words: List: 485 ms, Map: 45 ms
For 50000 words: List: 1365 ms, Map: 61 ms

The performance characteristics reveal themselves pretty well in a simple test like this. 在这样的简单测试中,性能特征很好地展现了自己。 I ran the map version with more data and got the following: 我使用更多数据运行了地图版本,并得到了以下内容:

For 100000 words: List: - ms, Map: 166 ms
For 500000 words: List: - ms, Map: 1130 ms
For 1000000 words: List: - ms, Map: 3540 ms

Finally the benchmarking code: 最后是基准代码:

public void benchmarkListVersusMap() {
    for (int count : new int[]{1000, 5000, 10000, 20000, 30000, 50000}) {
        // Generate random sample data
        List<List<String>> words = generateData(count, 10, count);

        // Create ArrayList
        List<List<String>> list = new ArrayList<List<String>>();
        list.addAll(words);

        // Create HashMap
        Map<List<String>, Boolean> map = new HashMap<List<String>, Boolean>();
        for (List<String> row : words) {
            map.put(row, true);
        }

        // Measure:
        long timer = System.currentTimeMillis();
        for (List<String> row: words) {
            if (list.contains(row)) {
                list.remove(row);
            }
        }
        long listTime = System.currentTimeMillis() - timer;
        timer = System.currentTimeMillis();
        for (List<String> row : words) {
            if (map.containsKey(row)) {
                map.remove(row);
            }
        }
        long mapTime = System.currentTimeMillis() - timer;
        System.out.printf("For %s words: List: %s ms, Map: %s ms\n", count, listTime, mapTime);
    }
}

private List<List<String>> generateData(int rows, int cols, int noOfDifferentWords) {
    List<List<String>> list = new ArrayList<List<String>>(rows);
    List<String> dictionary = generateRandomWords(noOfDifferentWords);
    Random rnd = new Random();
    for (int row = 0; row < rows; row++) {
        List<String> l2 = new ArrayList<String>(cols);
        for (int col = 0; col < cols; col++) {
            l2.add(dictionary.get(rnd.nextInt(noOfDifferentWords)));
        }
        list.add(l2);
    }
    return list;
}

private static final String CHARS = "abcdefghijklmnopqrstuvwxyz0123456789";
private List<String> generateRandomWords(int count) {
    Random rnd = new Random();
    List<String> list = new ArrayList<String>(count);
    while (list.size() < count) {
        StringBuilder sb = new StringBuilder(20);
        for (int i = 0; i < 10; i++) {
            sb.append(CHARS.charAt(rnd.nextInt(CHARS.length())));
        }
        list.add(sb.toString());
    }
    return list;
}

Little excerpt from the Javadoc comment of ArrayList : ArrayListJavadoc注释摘录:

The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. size isEmpty,get,set,iterator和listIterator操作在恒定时间内运行。 The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. 加法运算以固定的固定时间运行,也就是说,添加n个元素需要O(n)时间。 All of the other operations run in linear time (roughly speaking). 所有其他操作均以线性时间运行(大致而言)。 The constant factor is low compared to that for the LinkedList implementation. 与LinkedList实现相比,常数因子较低。

That means, the get operation on your second list runs in constant time O(1), which should be ok from a performance point of view. 这意味着,第二个列表上的get操作在恒定时间O(1)上运行,从性能角度来看应该可以。 But the contains and the remove operation (on the first list) run in linear time O(n). 但是contains和remove操作(在第一个列表上)在线性时间O(n)中运行。 Calling these operations as many times as the second list's size could last very long, especially if both lists are large. 以第二个列表的大小多次调用这些操作可能会持续很长时间,尤其是在两个列表都很大的情况下。

Using a hashing data structure for the first one would result in constant time - O(1) - for calling the operations contains and remove. 对于第一个使用哈希数据结构将导致恒定的时间-O(1)-调用操作包含和删除。 I would suggest to use a HashSet for the first "list". 我建议对第一个“列表”使用HashSet。 But that only works, if all rows do not equal. 但这仅在所有行都不相等的情况下有效。

But you should always always do a profiling before trying to optimize something. 但是,在尝试优化某些东西之前,您应该始终进行概要分析。 First make sure you are optimizing the right place. 首先,请确保您在正确的位置进行优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM