简体   繁体   English

如何将项目动态添加到 PowerShell ArrayList 并使用运行空间池递归处理它们?

[英]How to dynamically add items to a PowerShell ArrayList and process them recursively using Runspace pool?

I have a for loop that iterates through an ArrayList and during the process, adds more items to the list and processes them as well (iteratively).我有一个for循环,它遍历ArrayList并在此过程中,将更多项目添加到列表中并(迭代地)处理它们。 I am trying to convert this function to run concurrently using Runspacepool.我正在尝试将此函数转换为使用 Runspacepool 同时运行。

Here is the normal code without runspace:这是没有运行空间的正常代码:

$array = [System.Collections.ArrayList]@(1, 2, 3, 4, 5)
Write-Host "Number of items in array before loop: $($array.Count)"
for ($i = 0; $i -lt $array.Count; $i++) {
    Write-Host "Counter: $i`tArray: $array"
    if ($array[$i] -in @(1, 2, 3, 4, 5)) {
        $array.Add($array[$i] + 3) | Out-Null
    }
}
Write-Host "Array: $array"
Write-Host "Number of items in array after loop: $($array.Count)"

Output is:输出是:

Number of items in array before loop: 5
Counter: 0      Array: 1 2 3 4 5
Counter: 1      Array: 1 2 3 4 5 4
Counter: 2      Array: 1 2 3 4 5 4 5
Counter: 3      Array: 1 2 3 4 5 4 5 6
Counter: 4      Array: 1 2 3 4 5 4 5 6 7
Counter: 5      Array: 1 2 3 4 5 4 5 6 7 8
Counter: 6      Array: 1 2 3 4 5 4 5 6 7 8 7
Counter: 7      Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 8      Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 9      Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 10     Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 11     Array: 1 2 3 4 5 4 5 6 7 8 7 8
Array: 1 2 3 4 5 4 5 6 7 8 7 8
Number of items in array after loop: 12

Here is the Runspace function that I am trying to implement:这是我要实现的运行空间功能

$pool = [RunspaceFactory]::CreateRunspacePool(1, 10)
$pool.Open()
$runspaces = @()

$scriptblock = {
    Param ($i, $array)
    # Start-Sleep 1 # <------ Output varies significantly if this is enabled
    Write-Output "$i value: $array"
    if ($i -in @(1, 2, 3, 4, 5)) {
        $array.Add($i + 3) | Out-Null
    }
}

$array = [System.Collections.ArrayList]::Synchronized(([System.Collections.ArrayList]$(1, 2, 3, 4, 5)))
Write-Host "Number of items in array before loop: $($array.Count)"
for ($i = 0; $i -lt $array.Count; $i++) {
    $runspace = [PowerShell]::Create().AddScript($scriptblock).AddArgument($array[$i]).AddArgument($array)
    $runspace.RunspacePool = $pool
    $runspaces += [PSCustomObject]@{ Pipe = $runspace; Status = $runspace.BeginInvoke() }
}

while ($runspaces.Status -ne $null) {
    $completed = $runspaces | Where-Object { $_.Status.IsCompleted -eq $true }
    foreach ($runspace in $completed) {
        $runspace.Pipe.EndInvoke($runspace.Status)
        $runspace.Status = $null
    }
}
Write-Host "array: $array"
Write-Host "Number of items in array after loop: $($array.Count)"
$pool.Close()
$pool.Dispose()

Output without sleep function is as expected:没有睡眠功能的输出如预期:

Number of items in array before loop: 5
Current value: 1        Array: 1 2 3 4 5
Current value: 2        Array: 1 2 3 4 5 4
Current value: 3        Array: 1 2 3 4 5 4 5
Current value: 4        Array: 1 2 3 4 5 4 5 6
Current value: 5        Array: 1 2 3 4 5 4 5 6 7
Current value: 4        Array: 1 2 3 4 5 4 5 6 7 8
Current value: 5        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 6        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 7        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 8        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 7        Array: 1 2 3 4 5 4 5 6 7 8 7 8
Current value: 8        Array: 1 2 3 4 5 4 5 6 7 8 7 8
Array: 1 2 3 4 5 4 5 6 7 8 7 8
Number of items in array after loop: 12

Output with Sleep:睡眠输出:

Number of items in array before loop: 5
Current value: 1        Array: 1 2 3 4 5
Current value: 2        Array: 1 2 3 4 5 4
Current value: 3        Array: 1 2 3 4 5 4 5
Current value: 4        Array: 1 2 3 4 5 4 5 6
Current value: 5        Array: 1 2 3 4 5 4 5 6 7
Array: 1 2 3 4 5 4 5 6 7 8
Number of items in array after loop: 10

I understand that this is happening because the for loop exits before the sleep time is completed and therefore, only the first 5 items are added to the runspace pool.我知道发生这种情况是因为for循环在睡眠时间完成之前退出,因此只有前 5 个项目被添加到运行空间池中。

Is there a way to add more items to the ArrayList dynamically and still process them concurrently using runspaces?有没有办法动态地将更多项目添加到 ArrayList 并仍然使用运行空间同时处理它们?

The core of your "working" behaviour is that PowerShell was running your "non-sleep" scriptblocks faster than it could create them in the for loop, so the loop was seeing the new items being added by previous iterations before it reached the end of the array. “工作”行为的核心是 PowerShell运行“非睡眠”脚本块的速度比它在for循环中创建它们的速度要快,因此循环在到达结束之前看到以前的迭代添加的新项目数组。 As a result it had to process all of the items before it exited and moved on to the while loop.因此,它必须在退出并进入while循环之前处理所有项目。

When you added a Start-Sleep it shifted the balance, and it took longer to run the scriptblocks than it did to create them, so the for loop reached the end of the array before the new items were added by the earliest iterations.当您添加Start-Sleep时,它改变了平衡,运行脚本块比创建脚本块花费的时间更长,因此for循环在最早的迭代添加新项目之前到达数组的末尾。

The following script fixes this by combining your for and while loops to repeatedly alternate between (i) creating new threads and (ii) checking if they've finished, and only exiting when all the work is done.以下脚本通过组合您的forwhile循环在 (i) 创建新线程和 (ii) 检查它们是否已完成以及仅在所有工作完成后退出之间反复交替来解决此问题。

However multi-threading is hard so it's best to assume I've made mistakes somewhere, and test properly before you release it to your live workflow...然而,多线程很难,所以最好假设我在某个地方犯了错误,并在你将它发布到你的实时工作流程之前进行适当的测试......

$scriptblock = {
    Param ($i, $array)
    # random sleep to simulate variable-length workloads. this is
    # more likely to flush out error conditions than a fixed sleep 
    # period as threads will finish out-of-turn more often
    Start-Sleep (Get-Random -Minimum 1 -Maximum 10)
    Write-Output "$i value: $array"
    if ($i -in @(1, 2, 3, 4, 5)) {
        $array.Add($i + 3) | Out-Null
    }
}

$pool = [RunspaceFactory]::CreateRunspacePool(1, 10)
$pool.Open()

# note - your "$runspaces" variable is misleading as you're creating 
# "PowerShell" objects, and a "Runspace" is a different thing entirely,
# so I've called it $instances instead
# see https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.powershell?view=powershellsdk-7.0.0
#  vs https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.runspaces.runspace?view=powershellsdk-7.0.0
$instances = @()

$array = [System.Collections.ArrayList]::Synchronized(([System.Collections.ArrayList]$(1, 2, 3, 4, 5)))
Write-Host "Number of items in array before loop: $($array.Count)"

while( $true )
{

    # start PowerShell instances for any items in $array that don't already have one.
    # on the first iteration this will seed the initial instances, and in
    # subsequent iterations it will create new instances for items added to
    # $array since the last iteration.
    while( $instances.Length -lt $array.Count )
    {
        $instance = [PowerShell]::Create().AddScript($scriptblock).AddArgument($array[$instances.Length]).AddArgument($array);
        $instance.RunspacePool = $pool
        $instances += [PSCustomObject]@{ Value = $instance; Status = $instance.BeginInvoke() }
    }

    # watch out because there's a race condition here. it'll need very unlucky 
    # timing, *but* an instance might have added an item to $array just after
    # the while loop finished, but before the next line runs, so there *could* 
    # be an item in $array that hasn't had an instance created for it even
    # if all the current instances have completed

    # is there any more work to do? (try to mitigate the race condition
    # by checking again for any items in $array that don't have an instance
    # created for them)
    $active = @( $instances | Where-Object { -not $_.Status.IsCompleted } )
    if( ($active.Length -eq 0) -and ($instances.Length -eq $array.Count) )
    {
        # instances have been created for every item in $array,
        # *and* they've run to completion, so there's no more work to do
        break;
    }

    # if there are incomplete instances, wait for a short time to let them run
    # (this is to avoid a "busy wait" - https://en.wikipedia.org/wiki/Busy_waiting)
    Start-Sleep -Milliseconds 250;

}

# all the instances have completed, so end them
foreach ($instance in $instances)
{
    $instance.Value.EndInvoke($instance.Status);
}

Write-Host "array: $array"
Write-Host "Number of items in array after loop: $($array.Count)"
$pool.Close()
$pool.Dispose()

Example output:示例输出:

Number of items in array before loop: 5
1 value: 1 2 3 4 5 6 5 7
2 value: 1 2 3 4 5 6
3 value: 1 2 3 4 5
4 value: 1 2 3 4 5 6 5
5 value: 1 2 3 4 5 6 5 7 4
6 value: 1 2 3 4 5 6 5 7
5 value: 1 2 3 4 5 6 5 7 4 8
7 value: 1 2 3 4 5 6 5 7
4 value: 1 2 3 4 5 6 5 7 4 8 8
8 value: 1 2 3 4 5 6 5 7 4 8 8
8 value: 1 2 3 4 5 6 5 7 4 8 8
7 value: 1 2 3 4 5 6 5 7 4 8 8 7

Note the order of items in the array will vary depending on the length of the random sleeps in the $scriptblock .请注意,数组中项目的顺序将根据$scriptblock中随机睡眠的长度而有所不同。

There are probably additional improvements that could be made, but this at least seems to work...可能还可以进行其他改进,但这至少似乎可行……

This answer attempts to provide a better solution to the producer-consumer problem using a BlockingCollection<T> which provides an implementation of the producer/consumer pattern .这个答案试图使用BlockingCollection<T>生产者-消费者问题提供更好的解决方案,它提供了生产者/消费者模式的实现

To clarify on the issue with my previous answer , as OP has noted in a comment:正如 OP 在评论中指出的那样,用我之前的回答澄清这个问题:

If the starting count of the queue (say, 2) is less than the max number of threads (say 5), then only that many (2, in this case) threads remain active no matter how many ever items are added to the queue later.如果队列的起始计数(比如 2)小于最大线程数(比如 5),那么无论有多少项目被添加到队列中,只有那么多(在这种情况下为 2)线程保持活动状态之后。 Only the starting number of threads process the rest of the items in the queue.只有起始数量的线程处理队列中的其余项目。 In my case, the starting count is usually one.就我而言,起始计数通常是一。 Then I make a irm ( alias for Invoke-RestMethod ) request, and add some 10~20 items.然后我提出一个irmInvoke-RestMethod别名)请求,并添加了一些 10~20 项。 These are processed by only the first thread.这些仅由第一个线程处理。 The other threads go to Completed state right at the start .其他线程一开始就进入 Completed 状态 Is there a solution to this?有针对这个的解决方法吗?

For this example, the runspaces will be using the TryTake(T, TimeSpan) method overload which blocks the thread and waits for the specified timeout.对于此示例,运行空间将使用TryTake(T, TimeSpan)方法重载,该方法会阻塞线程并等待指定的超时。 On each loop iteration the runspaces will also be updating a Synchronized Hashtable with their TryTake(..) result.在每次循环迭代中,运行空间也将使用它们的TryTake(..)结果更新同步哈希表。

The main thread will be using the Synchronized Hashtable to wait until all runspaces had sent a $false status, when this happens an exit signal is sent to the threads to with .CompleteAdding() .主线程将使用同步哈希表等到所有运行空间都发送了$false状态,当这种情况发生时,将向线程发送退出信号以使用.CompleteAdding()

Even though not perfect, this solves the problem where some of the threads might exit early from the loop and attempts to ensure that all threads end at the same time (when there are no more items in the collection) .即使不完美,这也解决了一些线程可能会提前退出循环并尝试确保所有线程同时结束(当集合中没有更多项目时)的问题

The producer logic will be very similar to the previous answer, however, in this case each thread will wait random amount of time between $timeout.Seconds - 5 and $timeout.Seconds + 5 on each loop iteration.生产者逻辑将与前面的答案非常相似,但是,在这种情况下,每个线程将在每次循环迭代中等待$timeout.Seconds - 5$timeout.Seconds + 5之间的随机时间。

The results one can expect from this demo can be found on this gist .可以在这个 gist上找到可以从这个演示中得到的结果。

using namespace System.Management.Automation.Runspaces
using namespace System.Collections.Concurrent
using namespace System.Threading

try {
    $threads = 20
    $bc      = [BlockingCollection[int]]::new()
    $status  = [hashtable]::Synchronized(@{ TotalCount = 0 })

    # set a timer, all threads will wait for it before exiting
    # this timespan should be tweaked depending on the task at hand
    $timeout = [timespan]::FromSeconds(5)

    foreach($i in 1, 2, 3, 4, 5) {
        $bc.Add($i)
    }


    $scriptblock = {
        param([timespan] $timeout, [int] $threads)

        $id = [runspace]::DefaultRunspace
        $status[$id.InstanceId] = $true
        $syncRoot = $status.SyncRoot
        $release  = {
            [Threading.Monitor]::Exit($syncRoot)
            [Threading.Monitor]::PulseAll($syncRoot)
        }

        # will use this to simulate random delays
        $min = $timeout.Seconds - 5
        $max = $timeout.Seconds + 5

        [ref] $target = $null
        while(-not $bc.IsCompleted) {
            # NOTE from `Hashtable.Synchronized(Hashtable)` MS Docs:
            #
            #    The Synchronized method is thread safe for multiple readers and writers.
            #    Furthermore, the synchronized wrapper ensures that there is only
            #    one writer writing at a time.
            #
            #    Enumerating through a collection is intrinsically not a
            #    thread-safe procedure. Even when a collection is synchronized,
            #    other threads can still modify the collection, which causes the
            #    enumerator to throw an exception.

            # Mainly doing this (lock on the sync hash) to get the Active Count
            # Not really needed and only for demo porpuses

            # if we can't lock on this object in 200ms go next iteration
            if(-not [Threading.Monitor]::TryEnter($syncRoot, 200)) {
                continue
            }

            # if there are no items in queue, send `$false` to the main thread
            if(-not ($status[$id.InstanceId] = $bc.TryTake($target, $timeout))) {
                # release the lock and signal the threads they can get a handle
                & $release
                # and go next iteration
                continue
            }

            # if there was an item in queue, get the active count
            $active = @($status.Values -eq $true).Count
            # add 1 to the total count
            $status['TotalCount'] += 1
            # and release the lock
            & $release

            Write-Host (
                ('Target Value: {0}' -f $target.Value).PadRight(20) + '|'.PadRight(5) +
                ('Items in Queue: {0}' -f $bc.Count).PadRight(20)   + '|'.PadRight(5) +
                ('Runspace Id: {0}' -f $id.Id).PadRight(20)         + '|'.PadRight(5) +
                ('Active Runspaces [{0:D2} / {1:D2}]' -f $active, $threads)
            )

            $ran = [random]::new()
            # start a simulated delay
            Start-Sleep $ran.Next($min, $max)

            # get a random number between 0 and 10
            $ran = $ran.Next(11)
            # if the number is greater than the Dequeued Item
            if ($ran -gt $target.Value) {
                # enumerate starting from `$ran - 2` up to `$ran`
                foreach($i in ($ran - 2)..$ran) {
                    # enqueue each item
                    $bc.Add($i)
                }
            }

            # Send 1 to the Success Stream, this will help us check
            # if the test succeeded later on
            1
        }
    }

    $iss    = [initialsessionstate]::CreateDefault2()
    $rspool = [runspacefactory]::CreateRunspacePool(1, $threads, $iss, $Host)
    $rspool.ApartmentState = [ApartmentState]::STA
    $rspool.ThreadOptions  = [PSThreadOptions]::UseNewThread
    $rspool.InitialSessionState.Variables.Add([SessionStateVariableEntry[]]@(
        [SessionStateVariableEntry]::new('bc', $bc, 'Producer Consumer Collection')
        [SessionStateVariableEntry]::new('status', $status, 'Monitoring hash for signaling `.CompleteAdding()`')
    ))
    $rspool.Open()

    $params = @{
        Timeout = $timeout
        Threads = $threads
    }

    $rs = for($i = 0; $i -lt $threads; $i++) {
        $ps = [powershell]::Create($iss).AddScript($scriptblock).AddParameters($params)
        $ps.RunspacePool = $rspool

        @{
            Instance    = $ps
            AsyncResult = $ps.BeginInvoke()
        }
    }

    while($status.ContainsValue($true)) {
        Start-Sleep -Milliseconds 200
    }

    # send signal to stop
    $bc.CompleteAdding()

    [int[]] $totalCount = foreach($r in $rs) {
        try {
            $r.Instance.EndInvoke($r.AsyncResult)
            $r.Instance.Dispose()
        }
        catch {
            Write-Error $_
        }
    }
    Write-Host ("`nTotal Count [ IN {0} / OUT {1} ]" -f $totalCount.Count, $status['TotalCount'])
    Write-Host ("Items in Queue: {0}" -f $bc.Count)
    Write-Host ("Test Succeeded: {0}" -f (
        [Linq.Enumerable]::Sum($totalCount) -eq $status['TotalCount'] -and
        $bc.Count -eq 0
    ))
}
finally {
    ($bc, $rspool).ForEach('Dispose')
}

Note, this answer DOES NOT provide a good solution to OP's problem.请注意,此答案不能很好地解决 OP 的问题。 See this answer for a better take on theproducer-consumer problem .请参阅此答案以更好地了解生产者-消费者问题


This is a different approach from mclayton's helpful answer , hopefully both answers can lead you to solve your problem.这是与mclayton 的有用答案不同的方法,希望这两个答案都能引导您解决问题。 This example uses a ConcurrentQueue<T> and consists in multiple threads performing the same action.此示例使用ConcurrentQueue<T>并包含执行相同操作的多个线程。

As you may see, in this case we start only 5 threads that will be trying to dequeue the items concurrently.如您所见,在这种情况下,我们只启动了5 个线程,它们将尝试同时使项目出队。

If the randomly generated number between 0 and 10 is greater than the dequeued item , it creates an array starting from the random number - 2 up to the given random number and enqueues them (tries to simulate, badly , what you have posted in comments, "The actual problem involves Invoke-RestMethod ( irm ) towards multiple endpoints, based on the results of which, I may have to query more similar endpoints" ).如果0 到 10 之间的随机生成的数字大于出列项,它会创建一个从随机数 - 2 到给定随机数的数组并将它们排入队列(尝试模拟,糟糕的是,您在评论中发布的内容, “实际问题涉及到多个端点的Invoke-RestMethod ( irm ),基于其结果,我可能必须查询更多相似的端点” )。

Do note , for this example I'm using $threads = $queue.Count , however this should not be always the case .请注意,对于此示例,我使用的是$threads = $queue.Count但情况并非总是如此 Don't start too many threads or you might kill your session!不要启动太多线程,否则您可能会终止会话! Also be aware you might overload your network if querying multiple endpoints at the same time.另请注意,如果同时查询多个端点,您的网络可能会过载。 I would say, keep the threads always below $queue.Count .我想说,保持线程始终低于$queue.Count

The results you can expect from below code should vary greatly on each runtime.您可以从下面的代码中获得的结果在每个运行时都会有很大差异。

using namespace System.Management.Automation.Runspaces
using namespace System.Collections.Concurrent

try {
    $queue = [ConcurrentQueue[int]]::new()
    foreach($i in 1, 2, 3, 4, 5) {
        $queue.Enqueue($i)
    }
    $threads = $queue.Count

    $scriptblock = {
        [ref] $target = $null
        while($queue.TryDequeue($target)) {
            [pscustomobject]@{
                'Target Value'      = $target.Value
                'Elements in Queue' = $queue.Count
            }

            # get a random number between 0 and 10
            $ran = Get-Random -Maximum 11
            # if the number is greater than the Dequeued Item
            if ($ran -gt $target.Value) {
                # enumerate starting from `$ran - 2` up to `$ran`
                foreach($i in ($ran - 2)..$ran) {
                    # enqueue each item
                    $queue.Enqueue($i)
                }
            }
        }
    }

    $iss    = [initialsessionstate]::CreateDefault2()
    $rspool = [runspacefactory]::CreateRunspacePool(1, $threads, $iss, $Host)
    $rspool.InitialSessionState.Variables.Add([SessionStateVariableEntry]::new(
        'queue', $queue, ''
    ))
    $rspool.Open()

    $rs = for($i = 0; $i -lt $threads; $i++) {
        $ps = [powershell]::Create().AddScript($scriptblock)
        $ps.RunspacePool = $rspool

        @{
            Instance    = $ps
            AsyncResult = $ps.BeginInvoke()
        }
    }

    foreach($r in $rs) {
        try {
            $r.Instance.EndInvoke($r.AsyncResult)
            $r.Instance.Dispose()
        }
        catch {
            Write-Error $_
        }
    }
}
finally {
    $rspool.ForEach('Dispose')
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM