用于检查重复项的String和HashSet之间的最佳（性能+内存）是什么

Question

I want to do a simple implementation to do some operations based on a distinct Codes (aCode) among the bigCodeList contains duplicates. 我想做一个简单的实现，根据bigCodeList中包含重复项的不同代码（aCode）做一些操作。 Below I have mentioned two approaches what i want to know is what is the more effective one among them in performance vice + memory consumption wise? 下面我提到了两种方法，我想知道的是哪种方法在性能副+内存消耗方面更有效？

Approach 1 : 方法1：

    String tempStr = "";

    for(String aCode : bigCodeList){
        if(tempStr.indexOf(aCode) == -1) {
            // deal With the aCode related work
            tempStr += aCode+"-"
        }
    }

Approach 2 : 方法2：

        HashSet<String> tempHSet = new HashSet<String>();

        for(String aCode : bigCodeList){

            if(tempHSet.add(aCode)){

                // deal With the aCode related work

            }

        }

Note : aCode is a Three Letter code like LON 注意：aCode是一个像LON一样的三字母代码

Answer 1

Approach 2 is by far better. 方法2到目前为止更好。 You should not even consider approach 1. 你甚至不应该考虑方法1。

First of all, approach 1 has linear time in searching. 首先，方法1在搜索中具有线性时间。 That means that when tempStr becomes twice as long, the time to search it becomes twice as long (on average, of course, if you always find the first element, it stays short). 这意味着当tempStr变长两倍时，搜索它的时间变长两倍（当然，平均来说，如果你总是找到第一个元素，它会保持短暂）。

Next: you're copying the entire tempStr each time your appending to it (because String objects are immutable and that's the only way to create a new one from an existing one). 下一步：每次附加时都要复制整个tempStr （因为String对象是不可变的，这是从现有对象创建新对象的唯一方法）。 So the adding option takes ages as well. 所以添加选项也需要很长时间。

Third (not a performance concern): Mixing data ( aCode ) and meta-data (the separator - ) like this leads to all kinds of undesired effects. 第三（不是性能问题）：像这样混合数据（ aCode ）和元数据（分隔符- ）会导致各种不良影响。 You might be sure that now aCode can never contain a dash, but what if that changes in two weeks? 您可能确定现在的 aCode永远不会包含短划线，但如果在两周内发生变化会怎么样？

Fourth: HashSet is built for pretty much exactly this use case ! 第四： HashSet是内置的几乎正是这种使用情况 ！ That's what it does best: hold a set of distinct objects, check if it already exists and add a new one. 这就是它最擅长的：保存一组不同的对象，检查它是否已经存在并添加一个新对象。

Answer 2

I think, that the first one approach is worse: indexOf operation has O(n) , while for HashSet complexity could be O(1) for unique String keys look-up. 我认为，第一种方法更糟糕： indexOf操作有O(n) ，而对于HashSet复杂性可能是O(1)用于唯一的String键查找。

Furthermore, in the first approach you are using string concatenation operation, which implies new String object creation each time, which gives additional performance draw. 此外，在第一种方法中，您使用的是字符串连接操作，这意味着每次都会创建新的String对象，从而提供额外的性能提取。

Answer 3

java.util.Set不允许重复，但它在拒绝重复方面相当“安静”。

Answer 4

Performance and memory wise Hashset is best than String to use in ur coding. 性能和内存明智的Hashset比在编码时使用的字符串最好。

Appending values into string variable will take time 将值附加到字符串变量需要时间

用于检查重复项的String和HashSet之间的最佳（性能+内存）是什么

问题描述

4 个解决方案

解决方案1
7 已采纳 2013-06-06 08:35:37

解决方案2
1 2013-06-06 08:36:52

解决方案3
0 2013-06-06 08:34:45

解决方案4
0 2013-06-06 08:47:22

用于检查重复项的String和HashSet之间的最佳（性能+内存）是什么

问题描述

4 个解决方案

解决方案1 7 已采纳 2013-06-06 08:35:37

解决方案2 1 2013-06-06 08:36:52

解决方案3 0 2013-06-06 08:34:45

解决方案4 0 2013-06-06 08:47:22

解决方案1
7 已采纳 2013-06-06 08:35:37

解决方案2
1 2013-06-06 08:36:52

解决方案3
0 2013-06-06 08:34:45

解决方案4
0 2013-06-06 08:47:22