简体   繁体   中英

Cannot start job with foreach-object in parallel

I have prepared this script to try to execute in parallel the same function multiple times with different parameters:

$myparams = "A", "B","C", "D"

$doPlan = {
    Param([string] $myparam)
        echo "print $myparam"
        # MakeARestCall is a function calling a web service
        MakeARestCall -myparam $myparam
        echo "done"
}

$myparams | Foreach-Object { 
    Start-Job -ScriptBlock $doPlan  -ArgumentList $_
}

When I run it, the output is

Id     Name            PSJobTypeName   State         HasMoreData     Location             Command                  
--     ----            -------------   -----         -----------     --------             -------                  
79     Job79           BackgroundJob   Running       True            localhost            ...                      
81     Job81           BackgroundJob   Running       True            localhost            ...                      
83     Job83           BackgroundJob   Running       True            localhost            ...                      
85     Job85           BackgroundJob   Running       True            localhost            ...

but the actual call to the block (and then to the web service) is not done. If I remove the foreach-object and replace it with a normal sequential foreach block without Start-Job, the webservices are correctly invoked. This means that my issue when I try to run the block in parallel.

What am I doing wrong?

Background jobs run in independent child processes that share virtually no state with the caller ; specifically:

  • They see none of the functions and aliases defined in the calling session, nor manually imported modules, nor manually loaded .NET assemblies.

  • They do not load (dot-source) your $PROFILE file(s), so they won't see any definitions from there.

  • In PowerShell versions 6.x and below (which includes Windows PowerShell), not even the current location (directory) was inherited from the caller (it defaulted to [Environment]::GetFolderPath('MyDocuments') ); this was fixed in v7.0.

  • The only aspect of the calling session's state they do see are copies of the calling process' environment variables .

  • To make variable values from the caller's session available to the background job, they must be referenced via the $using:scope (see about_Remote_Variables ).

    • Note that with values other than strings, primitive types (such as numbers), and a handful of other well-known types, this can involve a loss of type fidelity , because the values are marshaled across process boundaries using PowerShell's XML-based serialization and deserialization; this potential loss of type fidelity also affects output from the job - see this answer for background information.
    • Using the much faster and less resource-intensive thread jobs, via Start-ThreadJob , avoids this problem (although all the other limitations apply); Start-ThreadJob comes with PowerShell [Core] 6+ and can be installed on demand in Windows PowerShell (eg, Install-Module -Scope CurrentUser ThreadJob ) - see this answer for background information.

Important : Whenever you use jobs for automation , such as in a script called from the Windows Task Scheduler or in the context of CI / CD, be sure that you wait for all jobs to finish before exiting the script (via Receive-Job -Wait or Wait-Job ), because a script invoked via PowerShell's CLI exits the PowerShell process as a whole, which kills any incomplete jobs.

Therefore, unless command MakeARestCall :

  • happens to be a script file ( MakeARestCall.ps1 ) or executable ( MakeARestCall.exe ) located in one of the directories listed in $env:Path

  • happens to be a function defined in a module that is auto-loaded ,

your $doJob script block will fail when executing in the job process', given that neither a MakeARestCall function nor alias will be defined.

Your comments suggest that MakeARestCall is indeed a function , so in order to make your code work, you'll have to (re)define the function as part of the script block executed by the job ( $doJob , in your case):

The following simplified example demonstrates the technique:

# Sample function that simply echoes its argument.
function MakeARestCall { param($MyParam) "MakeARestCall: $MyParam" }

'foo', 'bar' | ForEach-Object {
  # Note: If Start-ThreadJob is available, use it instead of Start-Job,
  #       for much better performance and resource efficiency.
  Start-Job -ArgumentList $_ { 

    Param([string] $myparam)

    # Redefine the function via its definition in the caller's scope.
    # $function:MakeARestCall returns MakeARestCall's function body
    # which $using: retrieves from the caller's scope, assigning to
    # it defines the function in the job's scope.
    $function:MakeARestCall = $using:function:MakeARestCall

    # Call the recreated MakeARestCall function with the parameter.
    MakeARestCall -MyParam $myparam
  }
} | Receive-Job -Wait -AutoRemove

The above outputs MakeARestCall: foo and MakeARestCall: bar , demonstrating that the (redefined) MakeARestCall function was successfully called in the job's process.

An alternative approach :

Make MakeARestCall a script ( MakeARestCall.ps1 ) and call that via its full path , to be safe.

Eg, if your script is in the same folder as the calling script, invoke it as
& $using:PSScriptRoot\\MakeARestCall.ps1 -MyParam $myParam

Of course, if you either don't mind duplicating the function definition or only need it in the context of the background jobs, you can simply embed the function definition directly in the script block.


Simpler and faster PowerShell [Core] 7+ alternative, using ForEach-Object -Parallel :

The -Parallel parameter, introduced to ForEach-Object in PowerShell 7 , runs the given script block in a separate runspace (thread) for each pipeline input object.

In essence, it is a simpler, pipeline-friendly way to use thread jobs ( Start-ThreadJob ), with the same performance and resource-usage advantages over background jobs , and with the added simplicity of directly reporting the threads' output .

However, the lack of state sharing discussed with respect to background jobs above also applies to thread jobs (even though they run in the same process, they do so in isolated PowerShell runspaces ), so here too the MakARestCall function must be (re)defined (or embedded) inside the script block [1] .

# Sample function that simply echoes its argument.
function MakeARestCall { param($MyParam) "MakeARestCall: $MyParam" }

# Get the function definition (body) *as a string*.
# This is necessary, because the ForEach-Object -Parallel explicitly
# disallows referencing *script block* values via $using:
$funcDef = $function:MakeARestCall.ToString()

'foo', 'bar' | ForEach-Object -Parallel {
  $function:MakeARestCall = $using:funcDef
  MakeARestCall -MyParam $_
}

Syntax pitfall: -Parallel is not a switch (flag-type parameter), but takes the script block to run in parallel as its argument; in other words: -Parallel must be placed directly before the script block.

The above directly emits outputs from the parallel threads, as it arrives - but note that this means that the output is not guaranteed to arrive in input order; that is, a thread created later may situationally returns its output before an earlier thread.

A simple example:

PS> 3, 1 | ForEach-Object -Parallel { Start-Sleep $_; "$_" }
1  # !! *Second* input's thread produced output *first*.
3

In order to show the outputs in input order - which invariably requires waiting for all threads to finish before showing output, you can add the -AsJob switch :

  • Instead of direct output, a single, lightweight (thread-based) job object is then returned, which returns a single job of type PSTaskJob comprising multiple child jobs, one for each parallel runspace (thread); you can manage it with the usual *-Job cmdlets and access the individual child jobs via the .ChildJobs property.

By waiting for the overall job to complete , receiving its outputs via Receive-Job then shows them in input order :

PS> 3, 1 | ForEach-Object -AsJob -Parallel { Start-Sleep $_; "$_" } |
      Receive-Job -Wait -AutoRemove
3  # OK, first input's output shown first, due to having waited.
1

[1] Alternatively, redefine your MakeARestCall function as a filter function ( Filter ) that implicitly operates on pipeline input, via $_ , so you can use its definition as the ForEach-Object -Parallel script block as-is:

# Sample *filter* function that echoes the pipeline input it is given.
Filter MakeARestCall { "MakeARestCall: $_" }

# Pass the filter function's definition (which is a script block)
# directly to ForEach-Object -Parallel
'foo', 'bar' | ForEach-Object -Parallel $function:MakeARestCall

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM