简体   繁体   中英

Processing sub-streams of a stream in Java using executors

I have a program that processes a huge stream (not in the sense of java.util.stream , but rather InputStream ) of data coming in through the network. The stream consists of objects, each having a sort of sub-stream identifier. Right now the whole processing is done in a single thread, but it takes a lot of CPU time and each sub-stream can easily be processed independently, so I'm thinking of multi-threading it.

However, each sub-stream requires to keep a lot of bulky state, including various buffers, hash maps and such. There is no particular reason to make it concurrent or synchronized since sub-streams are independent of each other. Moreover, each sub-stream requires that its objects are processed in the order they arrive, which means that probably there should be a single thread for each sub-stream (but possibly one thread processing multiple sub-streams).

I'm thinking of several approaches to this, but they are not quite elegant.

  1. Create a single ThreadPoolExecutor for all tasks. Each task will contain the next object to process and the reference to a Processor instance which keeps all the state. That would ensure the necessary happens-before relationship thus ensuring that the processing thread will see the up-to-date state for this sub-stream. This approach has no way to make sure that the next object of the same sub-stream will be processed in the same thread, as far as I can see. Moreover, it needs some guarantee that objects will be processed in the order they come in, which will require additional synchronization of the Processor objects, introducing unnecessary delays.

  2. Create multiple single-thread executors manually and a sort of hash-map that maps sub-stream identifiers to executor. This approach requires manual management of executors, creating or shutting down them as new sub-streams begin or end, and distributing the tasks between them accordingly.

  3. Create a custom executor that processes a special subclass of tasks each having a sub-stream ID. This executor would use it as a hint to use the same thread for executing this task as the previous one with the same ID. However, I don't see an easy way to implement such executor. Unfortunately, it doesn't seem possible to extend any of the existing executor classes, and implementing an executor from scratch is kind of overkill.

  4. Create a single ThreadPoolExecutor , but instead of creating a task for each incoming object, create a single long-running task for each sub-stream that would block in a concurrent queue, waiting for the next object. Then put objects in queues according to their sub-stream IDs. This approach needs as many threads as there are sub-streams because the tasks will be blocked. The expected number of sub-streams is about 30-60, so that may be acceptable.

  5. Alternatively, proceed as in 4, but limit the number of threads, assigning multiple sub-streams to a single task. This is sort of a hybrid between 2 and 4. As far as I can see, this is the best approach of these, but it still requires some sort of manual sub-stream distribution between tasks and some way to shut the extra tasks down as sub-streams end.

What would be the best way to ensure that each sub-stream is processed in its own thread without a lot of error-prone code? So that the following pseudo-code will work:

// loop {
    Item next = stream.read();
    int id = next.getSubstreamID();
    Processor processor = getProcessor(id);
    SubstreamTask task = new SubstreamTask(processor, next, id);
    executor.submit(task); // This makes sure that the task will
                           // be executed in the same thread as the
                           // previous task with the same ID.
// } // loop

I suggest having an array of single threaded executors. If you can devise a consistent hashing strategy for sub-streams, you can map sub-streams to individual threads. eg

final ExecutorsService[] es = ...

public void submit(int id, Runnable run) {
   es[(id & 0x7FFFFFFF) % es.length].submit(run);
}

The key could be an String or long but some way to identify the sub-stream. If you know a particular sub-stream is very expensive, you could assign it a dedicated thread.

The solution I finally chose looks like this:

private final Executor[] streamThreads
        = new Executor[Runtime.getRuntime().availableProcessors()];
{
    for (int i = 0; i < streamThreads.length; ++i) {
        streamThreads[i] = Executors.newSingleThreadExecutor();
    }
}
private final ConcurrentHashMap<SubstreamId, Integer>
        threadById = new ConcurrentHashMap<>();

This code determines which executor to use:

    Message msg = in.readNext();
    SubstreamId msgSubstream = msg.getSubstreamId();
    int exe = threadById.computeIfAbsent(msgSubstream,
            id -> findBestExecutor());
    streamThreads[exe].execute(() -> {
        // processing goes here
    });

And the findBestExecutor() function is this:

private int findBestExecutor() {
    // Thread index -> substream count mapping:
    final int[] loads = new int[streamThreads.length];
    for (int thread : threadById.values()) {
        ++loads[thread];
    }
    // return the index of the minimum load
    return IntStream.range(0, streamThreads.length)
            .reduce((i, j) -> loads[i] <= loads[j] ? i : j)
            .orElse(0);
}

This is, of course, not very efficient, but note that this function is only called when a new sub-stream shows up (which happens several times every few hours, so it's not a big deal in my case). My real code looks a bit more complicated because I have a way to determine whether two sub-streams are likely to finish simultaneously, and if they are, I try to assign them to different threads in order to maintain even load after they do finish. But since I never mentioned this detail in the question, I guess it doesn't belong to the answer either.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM