简体   繁体   English

Powershell如何提高Where-Object函数的运行速度

[英]Powershell how to improve the speed of the Where-Object function

I have two CSV files.我有两个 CSV 文件。 The first CSV is Card Data, which holds about 30,000 records and contains the card's name, UUID, and price (which is currently empty).第一个 CSV 是 Card Data,它包含大约 30,000 条记录并包含卡的名称、UUID 和价格(当前为空)。 The second CSV is Pricing Data, which holds around 50,000 records and contains UUID and some pricing information for that specific UUID.第二个 CSV 是定价数据,它包含大约 50,000 条记录并包含 UUID 和该特定 UUID 的一些定价信息。

These are two separate CSV files that are generated elsewhere.这是在别处生成的两个单独的 CSV 文件。

For each record in Card Data CSV I am taking the UUID and finding the corresponding UUID in the Pricing Data CSV using the Where-Object function in PowerShell.对于卡片数据 CSV 中的每条记录,我使用 PowerShell 中的Where-Object函数获取 UUID 并在定价数据 CSV 中查找相应的 UUID。 This is so I can find the pricing information for the respective card and run that through a pricing algorithm to generate a price for each record in the Card Data CSV.这样我就可以找到相应卡的定价信息,并通过定价算法运行该信息,为卡数据 CSV 中的每条记录生成价格。

At the moment is seems to take around 1 second per record in the Card Data CSV file and with 30,000 records to process, it would take over 8 hours to run through.目前,卡片数据 CSV 文件中的每条记录似乎需要大约 1 秒,而要处理 30,000 条记录,则需要 8 多个小时才能完成。 Is there a better more efficient way to perform this task.是否有更好更有效的方法来执行此任务。

Code:代码:

Function Calculate-Price ([float]$A, [float]$B, [float]$C) {
    #Pricing Algorithm
    ....

    $Card.'Price' = $CCPrice
}

$PricingData = Import-Csv "$Path\Pricing.csv"
$CardData = Import-Csv "$Update\Cards.csv"

Foreach ($Card In $CardData) {
    $PricingCard = $PricingData | Where-Object { $_.UUID -eq $Card.UUID } 
    . Calculate-Price -A $PricingCard.'A-price' -B $PricingCard.'B-price'  -C $PricingCard.'C-price' 
}

$CardData | Select "Title","Price","UUID" | 
    Export-Csv -Path "$Update\CardsUpdated.csv" -NoTypeInformation

The first CSV is Card Data, which holds about 30,000 records第一个 CSV 是 Card Data,包含大约30,000 条记录

The second CSV is Pricing Data, which holds around 50,000 records第二个 CSV 是定价数据,其中包含大约50,000 条记录

No wonder it's slow, you're calculating the expression $_.UUID -eq $Card.UUID ~1500000000 (that's 1.5 BILLION, or 1500 MILLION) times - that already sounds pretty compute-heavy, and we've not even considered the overhead from the pipeline having to bind input arguments to Where-Object the same amount of times.难怪它很慢,您正在计算表达式$_.UUID -eq $Card.UUID ~1500000000(即 15 亿或 15 亿)次 - 这听起来计算量很大,我们甚至没有考虑管道的开销必须将输入参数绑定到Where-Object的次数相同。


Instead of using the array of objects returned by Import-Csv directly , use a hashtable to "index" the records in the data set you need to search, by the property that you're joining on later!不是直接使用Import-Csv返回的对象数组,而是使用哈希表根据您稍后加入的属性“索引”您需要搜索的数据集中的记录!

$PricingData = Import-Csv "$Path\Pricing.csv"
$CardData = Import-Csv "$Update\Cards.csv"

$PricingByUUID = @{}
$PricingData |ForEach-Object {
    # Let's index the price cards using their UUID value
    $PricingByUUID[$_.UUID] = $_
}

Foreach ($Card In $CardData) {
    # No need to search through the whole set anymore
    $PricingCard = $PricingByUUID[$Card.UUID]
    . Calculate-Price -A $PricingCard.'A-price' -B $PricingCard.'B-price'  -C $PricingCard.'C-price' 
}

Under the hood, hashtables (and most other dictionary types in .NET) are implemented in a way so that they have extremely fast constant-time lookup/retrieval performance - which is exactly the kind of thing you want in this situation!在幕后,哈希表(以及 .NET 中的大多数其他字典类型)的实现方式使它们具有极快的恒定时间查找/检索性能 - 这正是您在这种情况下想要的那种东西!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM