简体   繁体   中英

Is my attemp to solve a concurrency issue when using java FileWriter correct?


I have a java program that writes a File using java.io.FileWriter. In production environment this program is instanced several times an each instance writes a unique file over the same directory in a NFS. Sometimes there so many instances at the same time that some of them (just a few) are failing with IOException and the message reads "Error: Permission denied".
I think the problem here is concurrency over the directory and I want to solve it by adding a retry logic, it has to write the file, no matter what.
What I have in mind is something as follows, but I don't know if it will work because I have no way to test in my development environment:
 try { fw.write(str); } catch (IOException e) { boolean retry = true; while(retry) { TimeUnit.MILLISECONDS.sleep(2); fw.write(str); retry = false; } } finally { fw.close(); }

I'd appreciate any help you can give me.

Sometimes there so many instances at the same time that some of them (just of few) are failing with IOException and the message reads "Error: Permission denied".

I want to solve it by adding a retry logic, it has to write the file, no matter what.

You're setting yourself up for disaster here.

Your theory (which is quite plausible) is that there is some mechanism in the underlying OS and/or filesystem driver that does allow you to make new file handles but will eventually throw errors, if too many handles are up concurrently.

That means that you have a system that sometimes 'sends more than the system can handle', and your proposed solution involves just repeatedly trying those writes every 2 milliseconds until they succeed. If the system is in a state that it is sending more than the underlying system can handle that means you have a ton of those 'producers of stuff' resending every 2 millis all concurrently, and given that the 'receiving system' (the OS/filesystem driver) clearly CAN be overwhelmed, you're.. not helping it by having eg 100 threads each firing off a request to write every 2 millis.

A much better system is that you adopt the queueing mechanism: Instead of having 100 threads all repeatedly shoving write requests at the OS/filesystem, retrying at a 2millis ratio, you should have a queue where a single thread, or a hard-limited pool of threads is trying instead.

Read up on java.util.concurrent and friends on how to create a pool that deals with jobs + a way to offer a job and wait for it to be handled.

Alternatively, if you really do want to go down this path and damn the consequences, you have 4 problems with the pasted code:

The 'retry' isn't protected

A catch block does not apply to itself. Any exceptions that occur inside it, just happen, they aren't 'caught'. Rewrite it all to a while loop instead; the catch block should not itself trigger the write , it should cause the loop to run again.

No exponential backoff

If you retry @ 2 millis, and it hasn't worked a few times, it's a good idea to start waiting a little longer than 2 millis. You should keep waiting longer and longer and longer, and it also helps to introduce a random factor. (See Exponential Backoff )

No limit counter

Eventually after enough tries, it's better that your app just gives up and crashes (with no write having occurred at all), than that it hangs forever.

No recreation of the FileWriter

Once fw.write starts throwing exceptions it is somewhat likely it will continue to do so, forever, even if the underlying OS/fs now has the time/room to handle your write request. Better to recreate the entire thing every time.

Putting it all together

int count = 0;
while (true) {
    try {
        try (FileWriter fw = ...) {
          fw.write(str);
        }
        break; // break out of the while.
    } catch (IOException e) {
        // you may want to test if that IOException indeed
        // indicates a problem that 'retry' can fix.
        // because if it isn't, this code runs a very very
        // long time before the 1000 hard-limit kicks in!

        // after 1000 retries, give up.
        if (count++ >= 1000) throw e;

        TimeUnit.MILLISECONDS.sleep(rnd.nextInt(count * 4));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM