简体   繁体   中英

Dynamically handling longer running concurrent jobs in F#

I'm struggling with the right approach to handle longer running requests/jobs in F#.

Requirement:

  • A job consists of multiple steps (which need to be performed sequentially).
  • A job can take several minutes, let's say up to 10 minutes.
  • A step may involve IO operations and waiting time eg until files created by the step are processed by other applications and then returned.
  • It is possible that a step fails or that a state is reached where the job should end early.
  • It should be possible to process multiple jobs in parallel.
  • Jobs are started / added by user request.
  • I want to be able to track the status of the jobs (current step, result of previous steps) upon request.

Current solution:

Currently I use a FileSystemWatcher to monitor an "inbox" with job requests. A request results in a job being added to a list which is managed by an agent (MailboxProcessor). As soon as the job is added to the list, a new thread is started (let t = new Thread(…) -> t.Start()) and a reference to the thread is kept with the job parameters (and in the list). In the thread, all the steps are executed sequentially. This way I can keep track of the job status (check if the thread is still alive or not) and have the jobs be processed concurrently.

However, this seems not to allow me to get information about the steps within a job / the thread.

Desired Solution:

In addition, I want to switch from a FileSystemWatcher to an REST API based on Suave. It seems the problem I'm facing (parallel job execution and gathering information about the steps, communicating status upon request) is the same in both worlds (requests triggered by FileSystemWatcher events or REST API), but I use the REST approach to explain my desired functionality:

I want to be able to start jobs (POST) (with response: job accepted, job ID = xyz), check the status of the jobs (GET with job id, response containing the step results and the current step) and if processing is done get the result of the job (GET with job id).

At least this setup seems convenient and would fulfills the current needs.

Can anyone help me by pointing me to the right tools / approach to handle such a requirement? Am I totally off the right direction?

I hope the explanation can be understood by others than me as well.

Thanks and best regards cil

If I was tackling a set of requirements like this then then the tools I'd be looking at are:

Using a .NET core worker service, there's a C# template dotnet new worker -lang c# -o CSharpService creates a C# long running program. The same long running service can be created in F#.

The following to create the project:

dotnet new console -lang F# -o FsharpService
cd FsharpService
dotnet add package Microsoft.Extensions.Hosting
dotnet add package Microsoft.Extensions.Hosting.WindowsServices
dotnet add package System.Net.NameResolution

And then replace Program.fs with:

open System
open System.Threading.Tasks
open Microsoft.Extensions.DependencyInjection
open Microsoft.Extensions.Hosting
open Microsoft.Extensions.Logging

type Worker(logger : ILogger<Worker>) =
    inherit BackgroundService()
    let _logger = logger
    override bs.ExecuteAsync stoppingToken =
        let f : Async<unit> = async {
            while not stoppingToken.IsCancellationRequested do
                _logger.LogInformation("Worker running at: {time}", DateTime.Now)
                do! Async.Sleep(1000)
        }
        Async.StartAsTask f :> Task

let CreateHostBuilder argv : IHostBuilder =
    let builder = Host.CreateDefaultBuilder(argv)
    builder.UseWindowsService()
        .ConfigureServices(fun hostContext services -> services.AddHostedService<Worker>() 
                                                        |> ignore<IServiceCollection>)
[<EntryPoint>]
let main argv =
    let hostBuilder = CreateHostBuilder argv
    hostBuilder.Build().Run()
    0 // return an integer exit code

And finally, if you are on windows to build register and start the service:

dotnet publish -r win-x64 -c Release /p:PublishSingleFile=true /p:Trimmed=true -o "./published"
sc create FsharpService binPath= "%cd%\published\FsharpService.exe"
services.msc

There's more details on this on my blog .

The other technology I'd look at would be the Task Parallel Library . This allows you to build a workflow some or all of which can be parallel but with the concurrency and message passing between the blocks taken care of. It's straightforward to call from F# and the model where each block has an input type and (in some cases) an output type lends itself to the F# designing with types approach.

Here is a simple example I put together when first looking at TPL and F#. NB: I haven't had a chance to run this and confirm that it still works, also you will need to amends the #r command to work on your machine if you try and use it.

#r @"System.Threading.Tasks.Dataflow.dll"

open System
open System.IO
open System.Threading.Tasks.Dataflow

let buildPropagateLinkOption () =
    let mutable linkOption = new DataflowLinkOptions()
    linkOption.PropagateCompletion <- true
    linkOption

let buildParallelExecutionOption noThreads =
    let mutable executionOption = new ExecutionDataflowBlockOptions()
    executionOption.MaxDegreeOfParallelism <- noThreads
    executionOption

type TPLRequest = {
    path:string ;
    filter:string ;
}

type TPLFile = {
    fileName : string ;
}

type TPLResponse = {
    fileName : string ;
    size : int64 ;
}

let b1Impl (inReq:TPLRequest) : TPLFile seq = 
    printfn "Directory %s %A" inReq.path System.Threading.Thread.CurrentThread.ManagedThreadId
    Directory.EnumerateFiles(inReq.path, inReq.filter) |> Seq.map(fun x -> {fileName = x})

let b2Impl (inReq:TPLFile) : TPLResponse =
    let fInfo = FileInfo(inReq.fileName)
    printfn "File %s %A" inReq.fileName System.Threading.Thread.CurrentThread.ManagedThreadId
    {fileName = inReq.fileName; size = fInfo.Length }

let b3Impl (inReq:TPLResponse) =
    printfn "%s %d %A" inReq.fileName inReq.size System.Threading.Thread.CurrentThread.ManagedThreadId

let buildFlow () =
    let parallelExecutionOption = buildParallelExecutionOption 4
    let b1 = new TransformManyBlock<TPLRequest,TPLFile>((fun x -> b1Impl x),parallelExecutionOption)
    let b2 = new TransformBlock<TPLFile,TPLResponse>((fun x -> b2Impl x),parallelExecutionOption)
    let b3 = new ActionBlock<TPLResponse>((fun x ->b3Impl x),parallelExecutionOption)
    let propagateLinkOption = buildPropagateLinkOption ()
    b1.LinkTo(b2,propagateLinkOption) |> ignore<IDisposable>
    b2.LinkTo(b3,propagateLinkOption) |> ignore<IDisposable>
    b1

let runFlow () =
    let flow = buildFlow ()
    flow.Post {path="C:\\temp"; filter = "*.txt"} |> ignore<bool>
    flow.Post {path="C:\\temp"; filter = "*.zip"} |> ignore<bool>
    flow.Complete()
    flow.Completion.Wait()
    ()

runFlow ()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM