简体   繁体   中英

Fastest way to match two large arrays of objects by key in Powershell

I have two powershell arrays of objects generated via Import-CSV, and I must match them by one of their properties. Specifically, it is a 1:n relationship so currently I'm following this pattern:

foreach ($line in $array1) {
    $match=$array2 | where {$_.key -eq $line.key} # could be 1 or n results
    ...# process here the 1 to n lines
}

, which is not very efficient (both tables have many columns) and takes a time that is unacceptable for our needs. Is there a fastest way to perform this match?

Both data sources come from a csv file, so using something instead of Import-CSV would be also welcome. Thanks

The standard method is to index the data using a hashtable (or dictionary/map in other languages).

function buildIndex($csv, [string]$keyName) {
    $index = @{}
    foreach ($row in $csv) {
        $key = $row.($keyName)
        $data = $index[$key]
        if ($data -is [Collections.ArrayList]) {
            $data.add($row) >$null
        } elseif ($data) {
            $index[$key] = [Collections.ArrayList]@($data, $row)
        } else {
            $index[$key] = $row
        }
    }
    $index
}

$csv1 = Import-Csv 'r:\1.csv'
$csv2 = Import-Csv 'r:\2.csv'

$index2 = buildIndex $csv2, 'key'

foreach ($row in $csv1) {
    $matchedInCsv2 = $index2[$row.key]
    foreach ($row2 in $matchedInCsv2) {
        # ........
    }
}

Also, if you need speed and iterate a big collection, avoid | pipelining as it's many times slower than foreach/while/do statements . And don't use anything with a ScriptBlock like where {$_.key -eq $line.key} in your code because execution context creation adds a ridiculously big overhead compared to the simple code inside.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM