[英]What is the best (performance + memory) between String and HashSet for checking duplicates
I want to do a simple implementation to do some operations based on a distinct Codes (aCode) among the bigCodeList contains duplicates. 我想做一个简单的实现,根据bigCodeList中包含重复项的不同代码(aCode)做一些操作。 Below I have mentioned two approaches what i want to know is what is the more effective one among them in performance vice + memory consumption wise? 下面我提到了两种方法,我想知道的是哪种方法在性能副+内存消耗方面更有效?
Approach 1 : 方法1:
String tempStr = "";
for(String aCode : bigCodeList){
if(tempStr.indexOf(aCode) == -1) {
// deal With the aCode related work
tempStr += aCode+"-"
}
}
Approach 2 : 方法2:
HashSet<String> tempHSet = new HashSet<String>();
for(String aCode : bigCodeList){
if(tempHSet.add(aCode)){
// deal With the aCode related work
}
}
Note : aCode is a Three Letter code like LON 注意:aCode是一个像LON一样的三字母代码
Approach 2 is by far better. 方法2到目前为止更好。 You should not even consider approach 1. 你甚至不应该考虑方法1。
First of all, approach 1 has linear time in searching. 首先,方法1在搜索中具有线性时间。 That means that when tempStr
becomes twice as long, the time to search it becomes twice as long (on average, of course, if you always find the first element, it stays short). 这意味着当tempStr
变长两倍时,搜索它的时间变长两倍(当然,平均来说,如果你总是找到第一个元素,它会保持短暂)。
Next: you're copying the entire tempStr
each time your appending to it (because String
objects are immutable and that's the only way to create a new one from an existing one). 下一步:每次附加时都要复制整个tempStr
(因为String
对象是不可变的,这是从现有对象创建新对象的唯一方法)。 So the adding option takes ages as well. 所以添加选项也需要很长时间。
Third (not a performance concern): Mixing data ( aCode
) and meta-data (the separator -
) like this leads to all kinds of undesired effects. 第三(不是性能问题):像这样混合数据( aCode
)和元数据(分隔符-
)会导致各种不良影响。 You might be sure that now aCode
can never contain a dash, but what if that changes in two weeks? 您可能确定现在的 aCode
永远不会包含短划线,但如果在两周内发生变化会怎么样?
Fourth: HashSet
is built for pretty much exactly this use case ! 第四: HashSet
是内置的几乎正是这种使用情况 ! That's what it does best: hold a set of distinct objects, check if it already exists and add a new one. 这就是它最擅长的:保存一组不同的对象,检查它是否已经存在并添加一个新对象。
I think, that the first one approach is worse: indexOf
operation has O(n)
, while for HashSet
complexity could be O(1)
for unique String keys look-up. 我认为,第一种方法更糟糕: indexOf
操作有O(n)
,而对于HashSet
复杂性可能是O(1)
用于唯一的String键查找。
Furthermore, in the first approach you are using string concatenation operation, which implies new String
object creation each time, which gives additional performance draw. 此外,在第一种方法中,您使用的是字符串连接操作,这意味着每次都会创建新的String
对象,从而提供额外的性能提取。
java.util.Set不允许重复,但它在拒绝重复方面相当“安静”。
Performance and memory wise Hashset is best than String to use in ur coding. 性能和内存明智的Hashset比在编码时使用的字符串最好。
Appending values into string variable will take time 将值附加到字符串变量需要时间
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.