简体   繁体   English

用于检查重复项的String和HashSet之间的最佳(性能+内存)是什么

[英]What is the best (performance + memory) between String and HashSet for checking duplicates

I want to do a simple implementation to do some operations based on a distinct Codes (aCode) among the bigCodeList contains duplicates. 我想做一个简单的实现,根据bigCodeList中包含重复项的不同代码(aCode)做一些操作。 Below I have mentioned two approaches what i want to know is what is the more effective one among them in performance vice + memory consumption wise? 下面我提到了两种方法,我想知道的是哪种方法在性能副+内存消耗方面更有效?


Approach 1 : 方法1:

    String tempStr = "";

    for(String aCode : bigCodeList){
        if(tempStr.indexOf(aCode) == -1) {
            // deal With the aCode related work
            tempStr += aCode+"-"
        }
    }

Approach 2 : 方法2:

        HashSet<String> tempHSet = new HashSet<String>();

        for(String aCode : bigCodeList){

            if(tempHSet.add(aCode)){

                // deal With the aCode related work

            }

        }

Note : aCode is a Three Letter code like LON 注意:aCode是一个像LON一样的三字母代码

Approach 2 is by far better. 方法2到目前为止更好。 You should not even consider approach 1. 你甚至不应该考虑方法1。

First of all, approach 1 has linear time in searching. 首先,方法1在搜索中具有线性时间。 That means that when tempStr becomes twice as long, the time to search it becomes twice as long (on average, of course, if you always find the first element, it stays short). 这意味着当tempStr变长两倍时,搜索它的时间变长两倍(当然,平均来说,如果你总是找到第一个元素,它会保持短暂)。

Next: you're copying the entire tempStr each time your appending to it (because String objects are immutable and that's the only way to create a new one from an existing one). 下一步:每次附加时都要复制整个tempStr (因为String对象是不可变的,这是从现有对象创建新对象的唯一方法)。 So the adding option takes ages as well. 所以添加选项也需要很长时间。

Third (not a performance concern): Mixing data ( aCode ) and meta-data (the separator - ) like this leads to all kinds of undesired effects. 第三(不是性能问题):像这样混合数据( aCode )和元数据(分隔符- )会导致各种不良影响。 You might be sure that now aCode can never contain a dash, but what if that changes in two weeks? 可能确定现在的 aCode永远不会包含短划线,但如果在两周内发生变化会怎么样?

Fourth: HashSet is built for pretty much exactly this use case ! 第四: HashSet内置几乎正是这种使用情况 That's what it does best: hold a set of distinct objects, check if it already exists and add a new one. 这就是它最擅长的:保存一组不同的对象,检查它是否已经存在并添加一个新对象。

I think, that the first one approach is worse: indexOf operation has O(n) , while for HashSet complexity could be O(1) for unique String keys look-up. 我认为,第一种方法更糟糕: indexOf操作有O(n) ,而对于HashSet复杂性可能是O(1)用于唯一的String键查找。

Furthermore, in the first approach you are using string concatenation operation, which implies new String object creation each time, which gives additional performance draw. 此外,在第一种方法中,您使用的是字符串连接操作,这意味着每次都会创建新的String对象,从而提供额外的性能提取。

java.util.Set不允许重复,但它在拒绝重复方面相当“安静”。

Performance and memory wise Hashset is best than String to use in ur coding. 性能和内存明智的Hashset比在编码时使用的字符串最好。

Appending values into string variable will take time 将值附加到字符串变量需要时间

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从大型集合中获取重复项的最佳性能方式是什么<string> ?</string> - What it is the best performance way to grab duplicates from a large Set<String>? 将String类型的HashSet转换为Long类型的HashSet的最佳/有效方法是什么? - What is the best/efficient way to convert HashSet of type String to HashSet of type Long 从字符串数组中删除重复项-使用HashSet - removing duplicates from string array - using HashSet Vector和HashSet之间的巨大性能差异 - Huge performance difference between Vector and HashSet 在Java中,HashSet有什么区别 <Integer> =新的HashSet(2)和HashSet <Integer> =新的HashSet <Integer> (2)? - In Java, what's the difference between HashSet<Integer> = new HashSet(2) and HashSet<Integer> = new HashSet<Integer>(2)? 各种小HashSet和1个大HashSet之间的搜索区别是什么? - What is the Searching difference between various small HashSet and 1 large HashSet? 将集合类型转换为 HashSet 和使用集合初始化 HashSet 有什么区别? - What is the difference between type casting a set to HashSet and initializing a HashSet with a set? 哈希集发现难以置信的快速复制的背后的魔力是什么? - What's the magic behind a Hashset finding duplicates that incredibly fast? LinkedList、HashSet 和 HashMap 之间的主要区别是什么? - What is the main difference between LinkedList, HashSet and HashMap? HashSet和Set之间有什么区别? - What's the difference between HashSet and Set?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM