简体   繁体   English

与equals相比,如何使用==运算符带来性能提升?

[英]How use of == operator brings in performance improvements compared to equals?

In Effective JAVA by Joshua Bloch, when I was reading about static factory methods , there was a statement as follows 在约书亚布洛赫的有效JAVA中,当我读到关于静态工厂方法时,有一个声明如下

The ability of static factory methods to return the same object from repeated invocations allows classes to maintain strict control over what instances exist at any time. 静态工厂方法从重复调用返回同一对象的能力允许类在任何时候保持对存在的实例的严格控制。 Classes that do this are said to be instance-controlled. 执行此操作的类称为实例控制。 There are several reasons to write instance-controlled classes. 编写实例控制类有几个原因。 Instance control allows a class to guarantee that it is a singleton (Item 3) or noninstantiable (Item 4). 实例控制允许类保证它是单例(第3项)或不可实例化(第4项)。 Also, it allows an immutable class (Item 15) to make the guarantee that no two equal instances exist: a.equals(b) if and only if a==b. 此外,它允许不可变类(第15项)保证不存在两个相等的实例:a.equals(b)当且仅当a == b时。 If a class makes this guarantee, then its cli- ents can use the == operator instead of the equals(Object) method, which may result in improved performance. 如果一个类提供了这种保证,那么它的客户端可以使用==运算符而不是equals(Object)方法,这可以提高性能。 Enum types (Item 30) provide this guarantee. 枚举类型(第30项)提供此保证。

To investigate how == operator brings in performance improvements , I got to look at String.java 为了研究==运算符如何带来性能改进,我得看看String.java

I saw this snippet 我看到了这个片段

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String) anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                            return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

By performance improvement what does he mean here ? 通过性能提升,他的意思是什么? how it brings performance improvement . 它如何带来性能提升。

Does he mean to say the following 他的意思是说以下内容

If every class can assure that a.equals(b) if and only if a==b , it means it brings in an indirect requirement that there cannot be objects referring to 2 different memory spaces and still hold the same data , which is memory wastage . 如果每个类都可以确保a.equals(b)当且仅当a == b时,这意味着它带来间接要求,即不能有对象引用2个不同的存储空间并且仍然保存相同的数据,即内存浪费。 If they hold same data they are one and the same object .That is they point to same memory location. 如果它们拥有相同的数据,则它们是同一个对象。这就是它们指向相同的内存位置。

Am I right in this inference ? 我在这个推论中是对的吗?

If I am wrong can you guide me in understanding this ? 如果我错了你可以指导我理解这个吗?

If every class can assure that a.equals(b) if and only if a==b , it means it brings in an indirect requirement that there cannot be objects referring to 2 different memory spaces and still hold the same data , which is memory wastage . 如果每个类都可以确保a.equals(b)当且仅当a == b时,这意味着它带来间接要求,即不能有对象引用2个不同的存储空间并且仍然保存相同的数据,即内存浪费。 If they hold same data they are one and the same object .That is they point to same memory location. 如果它们拥有相同的数据,则它们是同一个对象。这就是它们指向相同的内存位置。

Yes, that is what the author is driving at. 是的,这就是作者所推动的。

If you can (for a given class, this won't be possible for all, in particular it cannot work for mutable classes) call == (which is single JVM opcode) instead of equals (which is a dynamically dispatched method call), it saves (some) overhead. 如果你可以(对于给定的类,这对所有人来说都是不可能的,特别是它不适用于可变类)call == (这是单个JVM操作码)而不是equals (这是一个动态调度的方法调用),它节省了(一些)开销。

It works this way for enum s for example. 例如,它以这种方式用于enum

And even if someone called the equals method (which would be good defensive programming practice, you don't want to get into the habit of using == for objects IMHO), that method could be implemented as a simple == (instead of having to look at potentially complex object state). 即使有人调用equals方法(这将是一个很好的防御性编程实践,你不想养成使用==对象IMHO的习惯),该方法可以实现为一个简单的== (而不是看看潜在的复杂对象状态)。

Incidentally, even for "normal" equals methods (such as String's), it is probably a good idea in their implementation to first check for object identity and then short-cut looking at object state (which is what String#equals does, as you have found out). 顺便说一句,即使对于“普通”等于方法(例如String的),在它们的实现中首先检查对象标识然后快速查看对象状态(这是String#equals所做的,因为你)可能是个好主意。已经发现了)。

What the quoted portion means is that an immutable class can choose to intern its instances. 什么引用部分指的是不可变的类可以选择以实习生的实例。 This is easy to implement via Guava's Interner , for example: 这很容易通过Guava的Interner ,例如:

public class MyImmutableClass {
    private static final Interner<MyImmutableClass> INTERN_POOL = Interners.newWeakInterner();
    private final String foo;
    private final int bar;

    private MyImmutableClass(String foo, int bar) {
        this.foo = foo;
        this.bar = bar;
    }

    public static MyImmutableClass of(String foo, int bar) {
        return INTERN_POOL.intern(new MyImmutableClass(foo, bar));
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(foo, bar);
    }

    @Override
    public boolean equals(Object o) {
        if (o == this)
            return true;        // fast path for interned instances
        if (o instanceof MyImmutableClass) {
            MyImmutableClass rhs = (MyImmutableClass) o;
            return Objects.equal(foo, rhs.foo)
                    && bar == rhs.bar;
        }
        return false;
    }
}

Here, the constructor is made private: all instances have to be through the MyImmutableClass.of() factory method, which uses the Interner to ensure that if the new instance is equals() to an existing instance, the existing instance is returned instead. 这里,构造函数是私有的:所有实例都必须通过MyImmutableClass.of()工厂方法,该方法使用Interner来确保如果新实例是equals()到现有实例,则返回现有实例。

Interning can only be used for immutable objects, by which I mean objects whose observable state (ie, the behaviour of all its externally-accessible methods, in particular equals() and hashCode() ) does not change for the objects' lifetimes. 实习只能用于不可变对象,我指的是对象的可观察状态(即所有外部可访问方法的行为,特别是equals()hashCode() )对象的生命周期不会改变。 If you intern mutable objects, the behaviour will be wrong when an instance is modified. 如果您是实习可变对象,则在修改实例时行为将是错误的。

As many other people have already stated, you should carefully choose which objects to intern, even if they're immutable. 正如许多其他人已经说过的那样,你应该仔细选择实习的对象,即使它们是不可变的。 Only do it if the set of interned values is small relative to the number of duplicates you are likely to have. 只有当实习值的集合相对于您可能具有的重复数量较小时才这样做。 For example, it's not worth interning Integer generally, because there are over 4 billion possible values. 例如,通常不值得实习Integer ,因为有超过40亿个可能的值。 But it is worth interning the most commonly-used Integer values, and in fact, Integer.valueOf() interns values between -128 and 127. On the other hand, enums are great to intern (and they are interned, by definition) because the set of possible values is small. 但值得实际使用最常用的Integer值,实际上, Integer.valueOf()实习值介于-128和127之间。另一方面,枚举对于实习生来说很棒(根据定义,它们是实习的)因为可能值的集合很小。

For most classes in general, you'd have to do heap analysis, such as by using jhat (or, to plug my own project, fasthat ), to decide if there are enough duplicates to warrant interning. 对于大多数类,一般情况下,您必须进行堆分析,例如使用jhat (或者,插入我自己的项目, fasthat ),以确定是否有足够的重复项来保证实习。 In other cases, just keep it simple and don't intern. 在其他情况下,只是保持简单,不要实习。

If you can guarantee that no two instances of an object exist such that their semantic values are equivalent (ie if x and y refer to different instances [ x != y ] then x.equals(y) == false for all x and y ), then this implies that you can compare two references' objects for equality simply by checking to see if they refer to the same instance, which is what == does. 如果你可以保证不存在对象的两个实例,使得它们的语义值是等价的(即,如果xy引用不同的实例[ x != y ],那么对于所有xy x.equals(y) == false ),这意味着你可以简单地通过检查它们是否引用相同的实例来比较两个引用的对象是否相等,这是==作用。

The implementation of == essentially just compares two integers (memory addresses) and generally would be faster than virtually all nontrivial implementations of .equals() . ==的实现基本上只是比较两个整数(内存地址),并且通常比几乎所有非平凡的.equals()实现都要快。

It is worth noting that this is not a jump that can be made for String s, as you cannot guarantee that any two instances of a String are not equivalent, eg: 值得注意的是,这是不是可以进行跳跃String S,因为你不能保证一个的任何两个实例String是不等价的,例如:

String x = new String("hello");
String y = new String("hello");

Since x != y && x.equals(y) , it is not sufficient to just do x == y to check for equality. 由于x != y && x.equals(y) ,仅仅执行x == y检查是否相等是不够的。

To answer your questions ... 回答你的问题......

By performance improvement what does he mean here [ String ]? 通过性能提升,他的意思是什么[ String ]? How it brings performance improvement. 它如何带来性能提升。

This is NOT an example of what Bloch is talking about. 这不是布洛赫所说的例子。 Bloch is talking about instance-controlled classes , and String is not such a class! Bloch正在讨论实例控制的类String不是这样的类!

Am I right in this inference? 我在这个推论中是对的吗?

Yes that is correct. 对,那是正确的。 An instance-controlled class for which the instances are immutable can ensure that objects that are "the same" will always be equal according to the == operator. 实例不可变的实例控制类可以确保根据==运算符,“相同”的对象始终相等。

Some observations though: 但有些观察结果:

  • This only applies to immutable objects. 这仅适用于不可变对象。 Or more precisely to objects where mutation does not affect the semantics of equality. 或者更确切地说,对于突变不影响相等语义的对象。

  • This only applies to fully instance-controlled classes. 这仅适用于完全由实例控制的类。

  • Instance control can be expensive. 实例控制可能很昂贵。 Consider the form of (partial) instance control provided by the String class's intern method and the string pool. 考虑String类的intern方法和字符串池提供的(部分)实例控制的形式。

    • The string pool is effectively a hash table of weak references to String objects. 字符串池实际上是对String对象的弱引用的哈希表。 This occupies extra memory. 这占用了额外的内存。

    • Each time you intern a String, it will calculate the string's hash code and probe the hash table to see if a similar string has already been intern'd 每次你实习一个字符串,它将计算字符串的哈希码并探测哈希表,看看是否已经实习了类似的字符串

    • Each time a full GC is performed, the weak references in the string pool result in extra "tracing" work for the GC, and then potentially more work if the GC decides to break references. 每次执行完整的GC时,字符串池中的弱引用会导致GC的额外“跟踪”工作,如果GC决定中断引用,则可能会更有效。

    You typically get similar overheads when you implement your own instance-controlled classes. 实现自己的实例控制类时,通常会得到类似的开销。 When you do cost-benefit analysis, these overheads count against the benefits of faster instance comparison. 当您进行成本效益分析时,这些开销会快速实例比较的好处相悖。

I think it means this: 我认为这意味着:

If you need to test two complex structures for equality you generally need to do a lot of tests to make sure they are the same. 如果您需要测试两个复杂结构的相等性,通常需要进行大量测试以确保它们是相同的。

But if because of some trick of the language you knew that two complex but equal structures can't exist simultaneously then instead of verifying equality by comparing them bit by bit you can just verify that they are in the same location in memory and return false if they are not. 但是,如果由于语言的一些技巧,你知道两个复杂但相同的结构不能同时存在,那么通过逐位比较它们而不是验证相等,你可以只验证它们在内存中的相同位置并返回false他们不是。

If anyone can create objects then you can't guarantee that two objects can't be created that are the same but are distinct instances.. but if you control the creation of objects and only create distinct objects then you don't need complex equality tests. 如果任何人都可以创建对象,那么你不能保证不能创建两个相同但是不同实例的对象..但是如果你控制对象的创建并且只创建不同的对象,那么你不需要复杂的相等试验。

In cases where complicated values are encapsulated using references to immutable objects, there are generally three scenarios that can arise when comparing two references: 如果使用对不可变对象的引用来封装复杂值,则在比较两个引用时通常会出现三种情况:

  • They are references to the same object (very fast) 它们是对同一对象的引用(非常快)

  • They are references to different objects which encapsulate different values (often fast, but sometimes slow) 它们是对不同对象的引用,这些对象封装了不同的值(通常很快,但有时很慢)

  • They are references to different objects which encapsulate the same value (generally always slow) 它们是对不同对象的引用,它们封装了相同的值(通常总是很慢)

If objects will be found to be equal more often than not, there can be substantial value to minimizing the frequency of case 3. If objects will often be very nearly equal, there can also be substantial value to ensuring that the slow subcases of case 2 don't happen very often. 如果发现对象经常是相等的,那么最小化案例3的频率可能具有实质价值。如果对象通常非常接近,那么确保案例2的慢速子句也具有实质价值。不要经常发生。

If one makes certain that for any given value there will never be more than one object which holds that value, code which observes that two references identify different objects may infer that they encapsulate different values, without having to actually examine the values in question. 如果确定对于任何给定值,将永远不会有多个持有该值的对象,则观察到两个引用标识不同对象的代码可以推断它们封装不同的值,而不必实际检查所讨论的值。 The value of doing this is often somewhat limited, however. 然而,这样做的价值往往有些限制。 If the objects in question are large, complicated, nested collections which will sometimes be very similar, one may have each collection compute and cache a 128-bit hash of its contents; 如果所讨论的对象是大而复杂的嵌套集合(有时非常相似),则可以让每个集合计算并缓存其内容的128位散列; two collections with different content are unlikely to have matching hash values, and collections with different hash values may quickly recognized as unequal. 具有不同内容的两个集合不太可能具有匹配的散列值,并且具有不同散列值的集合可能很快被识别为不相等。 On the other hand, having references that encapsulate the same content generally identify to the same object, even if a few references to identical collections exist, can improve the performance of the otherwise-always-bad "equals" case. 另一方面,具有封装相同内容的引用通常标识到同一对象,即使存在对相同集合的一些引用,也可以改善其他总是坏的“等于”情况的性能。

An approach that one could use if one didn't want to use a separate interning collection would be to have each object keep a long sequence number such that one can always determine which of two otherwise-identical objects was created first, along with a reference to the oldest object which is known to hold the same content. 如果一个人不想使用单独的实习集合,可以使用的方法是让每个对象保持一个long序列号,这样一个人总能确定首先创建两个相同的对象中的哪一个,以及一个参考到已知保存相同内容的最旧对象。 To compare two references, start by identifying the oldest object known to be equivalent to each. 要比较两个引用,首先要确定已知与每个引用等效的最旧对象。 If oldest object known to match the first isn't the same as the oldest object known to match the second, compare the objects' contents. 如果已知与第一个匹配的最旧对象与已知与第二个匹配的最旧对象不同,则比较对象的内容。 If they match, one will be newer than the other, and that object can regard the other as the "oldest object known to match". 如果它们匹配,则一个将比另一个更新,并且该对象可以将另一个视为“已知匹配的最旧对象”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM