简体   繁体   中英

Multithreading: Threads manipulating different fields of the same object

Say I have a class X with two variables.

class X {
    Integer a;
    Y b;
    Integer c;
}

The class Y

class Y {
    Integer y1;
    String y2;
}

Say, we have 4 threads, T1, T2, T3 and T4.

T1 operates on a , T2 operates on b.y1 (does something like x.getB().setY1() ), T3 operates on b.y2 , and T4 operates on c .

I won't be reading any of the "deepest" values ( a, y1, y2, c ) in any thread till all of them are executed (T2 and T3 will do the x.getB() however).

Would I face any of the typical issues associated with multithreading?

My questions

  1. I think I might not face any race condition with respect to a and c, given that they are not read by threads other than "their" thread. Is this reasoning right?
  2. What about x.getB() by T2 and T3?
  3. What about caching by processors in a multi-core environment? Do they cache the entire object? Or do they cache only the field that they modify? Or do they cache the whole thing but update only the field that they changed?
  4. Do they even recognise objects and fields? Or do they just work on chunks of memory? In that case does Java tell them the memory address they would need to cache?

When the processors reconcile their cache with the main memory after the processing is done, do they only update the chunk of memory they changed, or do they overwrite the main memory with the entire block of memory that they cached?

For example, say, initially, both a and c have values a = 1 , and c = 1 . P1 and P4 cache these values ( a=1 and c=1 ). T1 changes it to a = 2 , T4 changes to c = 2 .

Now, the values in the cache C1 are a=2, c=1 ; in C2, a=1, c=2 .

So, when writing back to main memory, say P1 finishes first and then updates the main memory. So, now the values are a=2, c=1 .

Now, when P4 finishes, does it update the value of only c , because it has modified only c ? Or does it simply overwrite the main memory with the value in its cache, making a=1, c=2 ?

Or do they simply cache the values they will read or write, meaning T1 will never cache the value of c , and T4 will never cache the value of a .

Would I face any of the typical issues associated with multithreading?

You are only reading, so, of course not. Your question is not even relevant. The Java Memory Model describes how changes to fields are propagated to other threads. This requires actual changes first.

do they only update the chunk of memory they changed, or do they overwrite the main memory with the entire block of memory that they cached?

Only what they changed.

What about caching by processors in a multi-core environment? Do they cache the entire object? Or do they cache only the field that they modify?

Your question makes no sense. You cannot put objects in fields or variables at all, that is not possible. The only thing you can stick in fields/variables/parameters is references to objects. In String x = "foo" , you didn't put "foo" in x . "foo" lives on the heap someplace. You ensured it exists in the heap, and you assigned a reference to it to x . This reference is fairly simple, usually 64-bit, and atomic.

The only thing that you can share between threads where updates are relevant are fields . methods cannot be changed (you can't modify a method of an instance of something; java is not like python or javascript, you can't write someRef.flibbledyboo = thingie; where 'flibbledyboo' is something you just made up. Local variables (which includes parameters) cannot possibly be shared with other threads; java is pass-by-value in all things, so if you do, inside a method, someOtherMethod(variable); , you're passing a copy, making the point of 'what happens if I change my variable, and someOtherMethod hands it to another thread?' irrelevant.

You can seemingly share a local var with a thread if you make a lambda or local class, but java will refuse to compile this unless the var is final, or effectively final. If it is (effectively) final, the point is moot - it cannot be changed, therefore the question 'what happens if one thread updates this value, when does the other thread see the update' is irrelevant.

Thus: It is just about fields , and fields can only contain primitive values, or references . References are simple things. If you're familiar with C, they are pointers, but that's a dirty word, so java calls them references. Potayto, Potahto. Same thing.

Any field (be it primitive or a reference) can be cached by any thread, or not, dealer's choice. They can 'sync' this back to main memory any time they wish. If your code changes how it executes depending on this, you wrote a hard to find bug. Try not to do so:)

Do they even recognise objects and fields? Or do they just work on chunks of memory?

Again a non-sensical question, as per the previous point: It's values you are looking for: Primitives and references. That's what the JMM is about (and not 'chunks of memory'). objects cannot be in fields. Only references. A reference goes to an object, but that object is just another bag o' fields. It's fields all the way down.

Imagine thread A does: foo.getX().setY() and thread B does: foo.getX().getY() . assuming foo never changes, then presumably foo.getX() also never changes. This is just a reference, and '.' is java-ese for: Follow it, find the bag o' fields there, and operate on those. So, both threads find the same object and the bag-o-fields that it really is. Now thread A has modified one of the fields it found there, and B is reading one of em. That's a problem - those are fields. threads may cache them, dealer's choice. You need to establish HB/HA relationships or you wrote a bug here.

Now, when P4 finishes, does it update the value of only c, because it has modified only c? Or does it simply overwrite the main memory with the value in its cache, making a=1, c=2?

No; but this doesn't seem particularly relevant. An unrelated thread with no HBHA (Happens before/after) relationship can legally observe a=1/c=1, a=2/c=1, or a=1/c=2. However, if they have somehow observed a=2/c=1, then afterwards they will continue to observe a=2. It won't go back to 1 due to a 'overwrite entire block' style overwrite.

Or do they simply cache the values they will read or write, meaning T1 will never cache the value of c, and T4 will never cache the value of a.

Dealer's choice. The JMM is best understood as follows:

Anytime any thread ever updates any field (and values are always primitives or references), it flips the evil coin. If the flip lands heads, it updates this value in its local cache, and does not 'distribute' this to any other code that interacts with this field unless that thread has an established HBHA rule. On tails, it does update an arbitrary selection of other threads' caches.

Whenever a thread reads any field, it again flips the coin. On heads, it just carries on with its cache. On tails it updates from the central value.

The coin is evil: It is not a 50/50 shot. In fact, today, on your laptop, writing out this code, it lands tails every time - even though you reran the test 1 million times. On your CI server, same deal. tails. Then in production - tails, every time. Then next week when that important client comes in and you're giving the demo? Lots of heads.

Thus:

  • It is hard to detect you made code whose execution depends on the coinflip.
  • Nevertheless, if you write code whose execution depends on the flip, you failed. That's a bug.

The solution is usually to forget about threads in this fashion entirely, and do your inter-thread communication either 'channeled' or up-front and afterwards.

Channelled communication

There are communication channel systems that are much better suited. For example, databases: Don't update fields; send DB queries. use transactions and isolationlevel.SERIALIZABLE, with RetryException capable frameworks (like JDBI oR JOOQ - do not roll your own, do not use JDBC directly). You get finegrained control about the data channel.

Other options are message busses like rabbitmq.

up-front / afterwards

Use frameworks like fork/join and friends, or anything else that eg follows the map/reduce model. They set up some data structures, only then fire up your thread (or rather, they have a pool of em, and will execute your code within one thread of the pool, handing you the data structure). Your code just looks at this data structure and then returns something. It touches no other fields at all. The framework creates the data and integrate what you return. Then trust the framework; they probably don't have memory model bugs.

I really want to modify threads in a multithread environment.

Lordy, here be dragons.

If you must, look up 'happens-before/happens-after': For any 2 lines of code, if there is an HB/HA relationship, (as in, as per JMM rules, one line is guaranteed to have occurred before another), then any updates to fields the earlier line has caused will guaranteed be visible by the later lane - no evil coinflips.

A very quick overview:

  • Within one thread, any later executed line 'happens after'. This is the obvious one - java is imperative. What you can observe from code is as if each line within one thread runs one after another.
  • synchronized: When you have 2 threads, and one thread hits a block of code guarded by synchronized(X) , and then exits this block, and another thread later enters a block guarded by a synchronize on the same reference, then the exitpoint of thread A is guarantee to 'happen before' the entrypoint of B: Whatever A changed inside, you'll see in B, guaranteed.
  • volatile - similar rules, but volatile is tricky.
  • Thread starts: someThread.start() has an HB/HA relationship with the code in that thread.
  • Constructors and setting of final fields is more or less guaranteed to work out (the field setting 'happens before' the constructor returns, even if you then hand the object ref that you got by invoking that constructor to another thread without HB/HA protection and they somehow get it because the evil coin landed heads).
  • the class loader system will never load the same class twice within the same class loader. This is very fast and an easy way to make safe singletons.

If some code X updates a field, and some other code Y reads that field, and X and Y have no HB/HA relationship, you're completely hosed. You wrote a bug, and it'll be quite difficult to test for it, and the test will not be reliable.

You question touches on a number of interesting topics. I will try to reformulate your questions and answer them in order.

On your first question : if different threads only modify different objects, can this pose consistency issues?

You need to make a distinction between modifying an object (or "writing") and making such changes visible to other threads. In the case you present, your various threads deal with the various objects independently from each-other and never need to "read" other objects. So yes, this is fine.

However, if a thread needs to read the value of a variable which may have been modified by another thread, you need to introduce some synchronization such that the modification to that variable happens before the first thread reads it (synchronized block / access to a volatile variable / semaphores etc). I cannot recommend enough this article Fixing the Java Memory Model .

On your second question:

Same answer to your first question: as long as no thread modifies member b of your X instance, there is no cause for concern; both thread T2 and T3 will obtain the same object.

On your third and fourth question what about cache consistency?

How the Java Virtual Machine handles memory allocation is a little obscure from a programmer's perspective. What you are concerned about is called false-sharing. The Java virtual machine will make sure that what is stored in memory is consistent with your program. You do not need to worry about a bad cache overwriting changes made by another thread.

However, if there is enough contention on the members, you may face a performance penalty. Fortunately you can reduce this impact by using the @Contended annotation on the members that pose problems to indicate to the Java Virtual Machine that they should be allocated on different cache lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM