简体   繁体   中英

Powershell how to improve the speed of the Where-Object function

I have two CSV files. The first CSV is Card Data, which holds about 30,000 records and contains the card's name, UUID, and price (which is currently empty). The second CSV is Pricing Data, which holds around 50,000 records and contains UUID and some pricing information for that specific UUID.

These are two separate CSV files that are generated elsewhere.

For each record in Card Data CSV I am taking the UUID and finding the corresponding UUID in the Pricing Data CSV using the Where-Object function in PowerShell. This is so I can find the pricing information for the respective card and run that through a pricing algorithm to generate a price for each record in the Card Data CSV.

At the moment is seems to take around 1 second per record in the Card Data CSV file and with 30,000 records to process, it would take over 8 hours to run through. Is there a better more efficient way to perform this task.

Code:

Function Calculate-Price ([float]$A, [float]$B, [float]$C) {
    #Pricing Algorithm
    ....

    $Card.'Price' = $CCPrice
}

$PricingData = Import-Csv "$Path\Pricing.csv"
$CardData = Import-Csv "$Update\Cards.csv"

Foreach ($Card In $CardData) {
    $PricingCard = $PricingData | Where-Object { $_.UUID -eq $Card.UUID } 
    . Calculate-Price -A $PricingCard.'A-price' -B $PricingCard.'B-price'  -C $PricingCard.'C-price' 
}

$CardData | Select "Title","Price","UUID" | 
    Export-Csv -Path "$Update\CardsUpdated.csv" -NoTypeInformation

The first CSV is Card Data, which holds about 30,000 records

The second CSV is Pricing Data, which holds around 50,000 records

No wonder it's slow, you're calculating the expression $_.UUID -eq $Card.UUID ~1500000000 (that's 1.5 BILLION, or 1500 MILLION) times - that already sounds pretty compute-heavy, and we've not even considered the overhead from the pipeline having to bind input arguments to Where-Object the same amount of times.


Instead of using the array of objects returned by Import-Csv directly , use a hashtable to "index" the records in the data set you need to search, by the property that you're joining on later!

$PricingData = Import-Csv "$Path\Pricing.csv"
$CardData = Import-Csv "$Update\Cards.csv"

$PricingByUUID = @{}
$PricingData |ForEach-Object {
    # Let's index the price cards using their UUID value
    $PricingByUUID[$_.UUID] = $_
}

Foreach ($Card In $CardData) {
    # No need to search through the whole set anymore
    $PricingCard = $PricingByUUID[$Card.UUID]
    . Calculate-Price -A $PricingCard.'A-price' -B $PricingCard.'B-price'  -C $PricingCard.'C-price' 
}

Under the hood, hashtables (and most other dictionary types in .NET) are implemented in a way so that they have extremely fast constant-time lookup/retrieval performance - which is exactly the kind of thing you want in this situation!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM