繁体   English   中英

如何将项目动态添加到 PowerShell ArrayList 并使用运行空间池递归处理它们?

[英]How to dynamically add items to a PowerShell ArrayList and process them recursively using Runspace pool?

我有一个for循环,它遍历ArrayList并在此过程中,将更多项目添加到列表中并(迭代地)处理它们。 我正在尝试将此函数转换为使用 Runspacepool 同时运行。

这是没有运行空间的正常代码:

$array = [System.Collections.ArrayList]@(1, 2, 3, 4, 5)
Write-Host "Number of items in array before loop: $($array.Count)"
for ($i = 0; $i -lt $array.Count; $i++) {
    Write-Host "Counter: $i`tArray: $array"
    if ($array[$i] -in @(1, 2, 3, 4, 5)) {
        $array.Add($array[$i] + 3) | Out-Null
    }
}
Write-Host "Array: $array"
Write-Host "Number of items in array after loop: $($array.Count)"

输出是:

Number of items in array before loop: 5
Counter: 0      Array: 1 2 3 4 5
Counter: 1      Array: 1 2 3 4 5 4
Counter: 2      Array: 1 2 3 4 5 4 5
Counter: 3      Array: 1 2 3 4 5 4 5 6
Counter: 4      Array: 1 2 3 4 5 4 5 6 7
Counter: 5      Array: 1 2 3 4 5 4 5 6 7 8
Counter: 6      Array: 1 2 3 4 5 4 5 6 7 8 7
Counter: 7      Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 8      Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 9      Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 10     Array: 1 2 3 4 5 4 5 6 7 8 7 8
Counter: 11     Array: 1 2 3 4 5 4 5 6 7 8 7 8
Array: 1 2 3 4 5 4 5 6 7 8 7 8
Number of items in array after loop: 12

这是我要实现的运行空间功能

$pool = [RunspaceFactory]::CreateRunspacePool(1, 10)
$pool.Open()
$runspaces = @()

$scriptblock = {
    Param ($i, $array)
    # Start-Sleep 1 # <------ Output varies significantly if this is enabled
    Write-Output "$i value: $array"
    if ($i -in @(1, 2, 3, 4, 5)) {
        $array.Add($i + 3) | Out-Null
    }
}

$array = [System.Collections.ArrayList]::Synchronized(([System.Collections.ArrayList]$(1, 2, 3, 4, 5)))
Write-Host "Number of items in array before loop: $($array.Count)"
for ($i = 0; $i -lt $array.Count; $i++) {
    $runspace = [PowerShell]::Create().AddScript($scriptblock).AddArgument($array[$i]).AddArgument($array)
    $runspace.RunspacePool = $pool
    $runspaces += [PSCustomObject]@{ Pipe = $runspace; Status = $runspace.BeginInvoke() }
}

while ($runspaces.Status -ne $null) {
    $completed = $runspaces | Where-Object { $_.Status.IsCompleted -eq $true }
    foreach ($runspace in $completed) {
        $runspace.Pipe.EndInvoke($runspace.Status)
        $runspace.Status = $null
    }
}
Write-Host "array: $array"
Write-Host "Number of items in array after loop: $($array.Count)"
$pool.Close()
$pool.Dispose()

没有睡眠功能的输出如预期:

Number of items in array before loop: 5
Current value: 1        Array: 1 2 3 4 5
Current value: 2        Array: 1 2 3 4 5 4
Current value: 3        Array: 1 2 3 4 5 4 5
Current value: 4        Array: 1 2 3 4 5 4 5 6
Current value: 5        Array: 1 2 3 4 5 4 5 6 7
Current value: 4        Array: 1 2 3 4 5 4 5 6 7 8
Current value: 5        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 6        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 7        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 8        Array: 1 2 3 4 5 4 5 6 7 8 7
Current value: 7        Array: 1 2 3 4 5 4 5 6 7 8 7 8
Current value: 8        Array: 1 2 3 4 5 4 5 6 7 8 7 8
Array: 1 2 3 4 5 4 5 6 7 8 7 8
Number of items in array after loop: 12

睡眠输出:

Number of items in array before loop: 5
Current value: 1        Array: 1 2 3 4 5
Current value: 2        Array: 1 2 3 4 5 4
Current value: 3        Array: 1 2 3 4 5 4 5
Current value: 4        Array: 1 2 3 4 5 4 5 6
Current value: 5        Array: 1 2 3 4 5 4 5 6 7
Array: 1 2 3 4 5 4 5 6 7 8
Number of items in array after loop: 10

我知道发生这种情况是因为for循环在睡眠时间完成之前退出,因此只有前 5 个项目被添加到运行空间池中。

有没有办法动态地将更多项目添加到 ArrayList 并仍然使用运行空间同时处理它们?

“工作”行为的核心是 PowerShell运行“非睡眠”脚本块的速度比它在for循环中创建它们的速度要快,因此循环在到达结束之前看到以前的迭代添加的新项目数组。 因此,它必须在退出并进入while循环之前处理所有项目。

当您添加Start-Sleep时,它改变了平衡,运行脚本块比创建脚本块花费的时间更长,因此for循环在最早的迭代添加新项目之前到达数组的末尾。

以下脚本通过组合您的forwhile循环在 (i) 创建新线程和 (ii) 检查它们是否已完成以及仅在所有工作完成后退出之间反复交替来解决此问题。

然而,多线程很难,所以最好假设我在某个地方犯了错误,并在你将它发布到你的实时工作流程之前进行适当的测试......

$scriptblock = {
    Param ($i, $array)
    # random sleep to simulate variable-length workloads. this is
    # more likely to flush out error conditions than a fixed sleep 
    # period as threads will finish out-of-turn more often
    Start-Sleep (Get-Random -Minimum 1 -Maximum 10)
    Write-Output "$i value: $array"
    if ($i -in @(1, 2, 3, 4, 5)) {
        $array.Add($i + 3) | Out-Null
    }
}

$pool = [RunspaceFactory]::CreateRunspacePool(1, 10)
$pool.Open()

# note - your "$runspaces" variable is misleading as you're creating 
# "PowerShell" objects, and a "Runspace" is a different thing entirely,
# so I've called it $instances instead
# see https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.powershell?view=powershellsdk-7.0.0
#  vs https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.runspaces.runspace?view=powershellsdk-7.0.0
$instances = @()

$array = [System.Collections.ArrayList]::Synchronized(([System.Collections.ArrayList]$(1, 2, 3, 4, 5)))
Write-Host "Number of items in array before loop: $($array.Count)"

while( $true )
{

    # start PowerShell instances for any items in $array that don't already have one.
    # on the first iteration this will seed the initial instances, and in
    # subsequent iterations it will create new instances for items added to
    # $array since the last iteration.
    while( $instances.Length -lt $array.Count )
    {
        $instance = [PowerShell]::Create().AddScript($scriptblock).AddArgument($array[$instances.Length]).AddArgument($array);
        $instance.RunspacePool = $pool
        $instances += [PSCustomObject]@{ Value = $instance; Status = $instance.BeginInvoke() }
    }

    # watch out because there's a race condition here. it'll need very unlucky 
    # timing, *but* an instance might have added an item to $array just after
    # the while loop finished, but before the next line runs, so there *could* 
    # be an item in $array that hasn't had an instance created for it even
    # if all the current instances have completed

    # is there any more work to do? (try to mitigate the race condition
    # by checking again for any items in $array that don't have an instance
    # created for them)
    $active = @( $instances | Where-Object { -not $_.Status.IsCompleted } )
    if( ($active.Length -eq 0) -and ($instances.Length -eq $array.Count) )
    {
        # instances have been created for every item in $array,
        # *and* they've run to completion, so there's no more work to do
        break;
    }

    # if there are incomplete instances, wait for a short time to let them run
    # (this is to avoid a "busy wait" - https://en.wikipedia.org/wiki/Busy_waiting)
    Start-Sleep -Milliseconds 250;

}

# all the instances have completed, so end them
foreach ($instance in $instances)
{
    $instance.Value.EndInvoke($instance.Status);
}

Write-Host "array: $array"
Write-Host "Number of items in array after loop: $($array.Count)"
$pool.Close()
$pool.Dispose()

示例输出:

Number of items in array before loop: 5
1 value: 1 2 3 4 5 6 5 7
2 value: 1 2 3 4 5 6
3 value: 1 2 3 4 5
4 value: 1 2 3 4 5 6 5
5 value: 1 2 3 4 5 6 5 7 4
6 value: 1 2 3 4 5 6 5 7
5 value: 1 2 3 4 5 6 5 7 4 8
7 value: 1 2 3 4 5 6 5 7
4 value: 1 2 3 4 5 6 5 7 4 8 8
8 value: 1 2 3 4 5 6 5 7 4 8 8
8 value: 1 2 3 4 5 6 5 7 4 8 8
7 value: 1 2 3 4 5 6 5 7 4 8 8 7

请注意,数组中项目的顺序将根据$scriptblock中随机睡眠的长度而有所不同。

可能还可以进行其他改进,但这至少似乎可行……

这个答案试图使用BlockingCollection<T>生产者-消费者问题提供更好的解决方案,它提供了生产者/消费者模式的实现

正如 OP 在评论中指出的那样,用我之前的回答澄清这个问题:

如果队列的起始计数(比如 2)小于最大线程数(比如 5),那么无论有多少项目被添加到队列中,只有那么多(在这种情况下为 2)线程保持活动状态之后。 只有起始数量的线程处理队列中的其余项目。 就我而言,起始计数通常是一。 然后我提出一个irmInvoke-RestMethod别名)请求,并添加了一些 10~20 项。 这些仅由第一个线程处理。 其他线程一开始就进入 Completed 状态 有针对这个的解决方法吗?

对于此示例,运行空间将使用TryTake(T, TimeSpan)方法重载,该方法会阻塞线程并等待指定的超时。 在每次循环迭代中,运行空间也将使用它们的TryTake(..)结果更新同步哈希表。

主线程将使用同步哈希表等到所有运行空间都发送了$false状态,当这种情况发生时,将向线程发送退出信号以使用.CompleteAdding()

即使不完美,这也解决了一些线程可能会提前退出循环并尝试确保所有线程同时结束(当集合中没有更多项目时)的问题

生产者逻辑将与前面的答案非常相似,但是,在这种情况下,每个线程将在每次循环迭代中等待$timeout.Seconds - 5$timeout.Seconds + 5之间的随机时间。

可以在这个 gist上找到可以从这个演示中得到的结果。

using namespace System.Management.Automation.Runspaces
using namespace System.Collections.Concurrent
using namespace System.Threading

try {
    $threads = 20
    $bc      = [BlockingCollection[int]]::new()
    $status  = [hashtable]::Synchronized(@{ TotalCount = 0 })

    # set a timer, all threads will wait for it before exiting
    # this timespan should be tweaked depending on the task at hand
    $timeout = [timespan]::FromSeconds(5)

    foreach($i in 1, 2, 3, 4, 5) {
        $bc.Add($i)
    }


    $scriptblock = {
        param([timespan] $timeout, [int] $threads)

        $id = [runspace]::DefaultRunspace
        $status[$id.InstanceId] = $true
        $syncRoot = $status.SyncRoot
        $release  = {
            [Threading.Monitor]::Exit($syncRoot)
            [Threading.Monitor]::PulseAll($syncRoot)
        }

        # will use this to simulate random delays
        $min = $timeout.Seconds - 5
        $max = $timeout.Seconds + 5

        [ref] $target = $null
        while(-not $bc.IsCompleted) {
            # NOTE from `Hashtable.Synchronized(Hashtable)` MS Docs:
            #
            #    The Synchronized method is thread safe for multiple readers and writers.
            #    Furthermore, the synchronized wrapper ensures that there is only
            #    one writer writing at a time.
            #
            #    Enumerating through a collection is intrinsically not a
            #    thread-safe procedure. Even when a collection is synchronized,
            #    other threads can still modify the collection, which causes the
            #    enumerator to throw an exception.

            # Mainly doing this (lock on the sync hash) to get the Active Count
            # Not really needed and only for demo porpuses

            # if we can't lock on this object in 200ms go next iteration
            if(-not [Threading.Monitor]::TryEnter($syncRoot, 200)) {
                continue
            }

            # if there are no items in queue, send `$false` to the main thread
            if(-not ($status[$id.InstanceId] = $bc.TryTake($target, $timeout))) {
                # release the lock and signal the threads they can get a handle
                & $release
                # and go next iteration
                continue
            }

            # if there was an item in queue, get the active count
            $active = @($status.Values -eq $true).Count
            # add 1 to the total count
            $status['TotalCount'] += 1
            # and release the lock
            & $release

            Write-Host (
                ('Target Value: {0}' -f $target.Value).PadRight(20) + '|'.PadRight(5) +
                ('Items in Queue: {0}' -f $bc.Count).PadRight(20)   + '|'.PadRight(5) +
                ('Runspace Id: {0}' -f $id.Id).PadRight(20)         + '|'.PadRight(5) +
                ('Active Runspaces [{0:D2} / {1:D2}]' -f $active, $threads)
            )

            $ran = [random]::new()
            # start a simulated delay
            Start-Sleep $ran.Next($min, $max)

            # get a random number between 0 and 10
            $ran = $ran.Next(11)
            # if the number is greater than the Dequeued Item
            if ($ran -gt $target.Value) {
                # enumerate starting from `$ran - 2` up to `$ran`
                foreach($i in ($ran - 2)..$ran) {
                    # enqueue each item
                    $bc.Add($i)
                }
            }

            # Send 1 to the Success Stream, this will help us check
            # if the test succeeded later on
            1
        }
    }

    $iss    = [initialsessionstate]::CreateDefault2()
    $rspool = [runspacefactory]::CreateRunspacePool(1, $threads, $iss, $Host)
    $rspool.ApartmentState = [ApartmentState]::STA
    $rspool.ThreadOptions  = [PSThreadOptions]::UseNewThread
    $rspool.InitialSessionState.Variables.Add([SessionStateVariableEntry[]]@(
        [SessionStateVariableEntry]::new('bc', $bc, 'Producer Consumer Collection')
        [SessionStateVariableEntry]::new('status', $status, 'Monitoring hash for signaling `.CompleteAdding()`')
    ))
    $rspool.Open()

    $params = @{
        Timeout = $timeout
        Threads = $threads
    }

    $rs = for($i = 0; $i -lt $threads; $i++) {
        $ps = [powershell]::Create($iss).AddScript($scriptblock).AddParameters($params)
        $ps.RunspacePool = $rspool

        @{
            Instance    = $ps
            AsyncResult = $ps.BeginInvoke()
        }
    }

    while($status.ContainsValue($true)) {
        Start-Sleep -Milliseconds 200
    }

    # send signal to stop
    $bc.CompleteAdding()

    [int[]] $totalCount = foreach($r in $rs) {
        try {
            $r.Instance.EndInvoke($r.AsyncResult)
            $r.Instance.Dispose()
        }
        catch {
            Write-Error $_
        }
    }
    Write-Host ("`nTotal Count [ IN {0} / OUT {1} ]" -f $totalCount.Count, $status['TotalCount'])
    Write-Host ("Items in Queue: {0}" -f $bc.Count)
    Write-Host ("Test Succeeded: {0}" -f (
        [Linq.Enumerable]::Sum($totalCount) -eq $status['TotalCount'] -and
        $bc.Count -eq 0
    ))
}
finally {
    ($bc, $rspool).ForEach('Dispose')
}

请注意,此答案不能很好地解决 OP 的问题。 请参阅此答案以更好地了解生产者-消费者问题


这是与mclayton 的有用答案不同的方法,希望这两个答案都能引导您解决问题。 此示例使用ConcurrentQueue<T>并包含执行相同操作的多个线程。

如您所见,在这种情况下,我们只启动了5 个线程,它们将尝试同时使项目出队。

如果0 到 10 之间的随机生成的数字大于出列项,它会创建一个从随机数 - 2 到给定随机数的数组并将它们排入队列(尝试模拟,糟糕的是,您在评论中发布的内容, “实际问题涉及到多个端点的Invoke-RestMethod ( irm ),基于其结果,我可能必须查询更多相似的端点” )。

请注意,对于此示例,我使用的是$threads = $queue.Count但情况并非总是如此 不要启动太多线程,否则您可能会终止会话! 另请注意,如果同时查询多个端点,您的网络可能会过载。 我想说,保持线程始终低于$queue.Count

您可以从下面的代码中获得的结果在每个运行时都会有很大差异。

using namespace System.Management.Automation.Runspaces
using namespace System.Collections.Concurrent

try {
    $queue = [ConcurrentQueue[int]]::new()
    foreach($i in 1, 2, 3, 4, 5) {
        $queue.Enqueue($i)
    }
    $threads = $queue.Count

    $scriptblock = {
        [ref] $target = $null
        while($queue.TryDequeue($target)) {
            [pscustomobject]@{
                'Target Value'      = $target.Value
                'Elements in Queue' = $queue.Count
            }

            # get a random number between 0 and 10
            $ran = Get-Random -Maximum 11
            # if the number is greater than the Dequeued Item
            if ($ran -gt $target.Value) {
                # enumerate starting from `$ran - 2` up to `$ran`
                foreach($i in ($ran - 2)..$ran) {
                    # enqueue each item
                    $queue.Enqueue($i)
                }
            }
        }
    }

    $iss    = [initialsessionstate]::CreateDefault2()
    $rspool = [runspacefactory]::CreateRunspacePool(1, $threads, $iss, $Host)
    $rspool.InitialSessionState.Variables.Add([SessionStateVariableEntry]::new(
        'queue', $queue, ''
    ))
    $rspool.Open()

    $rs = for($i = 0; $i -lt $threads; $i++) {
        $ps = [powershell]::Create().AddScript($scriptblock)
        $ps.RunspacePool = $rspool

        @{
            Instance    = $ps
            AsyncResult = $ps.BeginInvoke()
        }
    }

    foreach($r in $rs) {
        try {
            $r.Instance.EndInvoke($r.AsyncResult)
            $r.Instance.Dispose()
        }
        catch {
            Write-Error $_
        }
    }
}
finally {
    $rspool.ForEach('Dispose')
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM