简体   繁体   中英

Microsoft Orleans grain communication performance

I am working on a workflow engine using Mircosoft Orleans as the base, as it offers a number of useful features such as automatically distributing the work and handling fail over.

I have three types of grains:

  • Workflow - Holds information in the workflow and what order work blocks should be executed in
  • Work Block - The parts that actually do the work
  • Execution - A single execution of the workflow

My problem is that when running a large number of current executions, ie > 1000 the performance really suffers. I have done a bit of profiling and narrowed this down to the communication that happens between the grains. Is there anyway I can improve this any more?

Here is the outline of my code and how the grains interact

The execution grain sits in a loop, getting the next work block from the workflow and then calling execute on the workblock. It is this constant calling between grains that is causing the execution time for one of my test workflows to go from 10 seconds when running a single execution to around 5 minutes when running over 1000. Can this be improved or should I re-architect the solution to remove the grain communication?

[StorageProvider(ProviderName = "WorkflowStore")]
[Reentrant]
[StatelessWorker]
public class Workflow : Grain<WorkflowState>, IWorkflow
{
    public Task<BlockRef> GetNext(Guid currentBlockId, string connectionName)
    {
         //Lookup the next work block
    }
}

[Reentrant]
[StatelessWorker]
public class WorkBlock : Grain<WorkBlock State>, IWorkBlock 
{
    public Task<string> Execute(IExecution execution)
    {
         //Do some work
    }
}


[StorageProvider(ProviderName = "ExecutionStore")]
public class Execution : Grain<ExecutionState>, IExecution, IRemindable
{
    private async Task ExecuteNext(bool skipBreakpointCheck = false)
    {            
        if (State.NextBlock == null)
        {
            await FindAndSetNext(null, null);
        }

        ...

        var outputConnection = await workblock.Execute();

        if (!string.IsNullOrEmpty(outputConnection))
        {
            await FindAndSetNext(State.NextBlock.Id, outputConnection);
            ExecuteNext().Ignore();
        }
    }

    private async Task FindAndSetNext(Guid? currentId, string outputConnection)
    {
        var next = currentId.HasValue ? await _flow.GetNextBlock(currentId.Value, outputConnection) : await _flow.GetNextBlock();
        ...
    }
}

A couple of issues here:

1) It does not seem right that Workflow is a StatelessWorker AND uses StorageProvider. StorageProvider means it has state that it cares too persist, StatelessWorker means it does not have any state. Instead use regular non StatelessWorker grains.

2) lets look top down at modelling: Workflow is just the data about workflow and the code to execute, WorkBlock is one block of the multi block workflow (one step of the multi step workflow), correct? In such a case none of them should be grains. They are just state. The Execution is the only one that needs to be grain. Execution receives the Workflow, Workflow encodes inside its data what is the next block, and Execution just executes the block.

3) From scalability perspective you just want a lot of Execution grains. If a Workflow has an id, then you can use an Execution grain for each Workflow id. If you want to execute the same Workflow (with same id) multiple times in parallel, now it depends. If its not too much in parallel, maybe one Execution grain for all will be enough. If not, you can use a pool of X Execution grains (the id for Execution grain will be "WorkflowId-NumberBetween0AndX).

In my opinion these functions shouldn't be standalone grains, aggregating them would eliminate the expensive inter-grain communication.

If you rename Work Block to Activity and Execution to WorkflowInstance , your concept gets very-very similar to Microsoft's Workflow Foundation. I've started a project on GitHub ( Orleans.Activities ) to run WF4 workflows on Orleans. Though it is not production ready, no performance tests, but at least works. Maybe you should give it a try.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM