I have two powershell arrays of objects generated via Import-CSV, and I must match them by one of their properties. Specifically, it is a 1:n relationship so currently I'm following this pattern:
foreach ($line in $array1) {
$match=$array2 | where {$_.key -eq $line.key} # could be 1 or n results
...# process here the 1 to n lines
}
, which is not very efficient (both tables have many columns) and takes a time that is unacceptable for our needs. Is there a fastest way to perform this match?
Both data sources come from a csv file, so using something instead of Import-CSV would be also welcome. Thanks
The standard method is to index the data using a hashtable (or dictionary/map in other languages).
function buildIndex($csv, [string]$keyName) {
$index = @{}
foreach ($row in $csv) {
$key = $row.($keyName)
$data = $index[$key]
if ($data -is [Collections.ArrayList]) {
$data.add($row) >$null
} elseif ($data) {
$index[$key] = [Collections.ArrayList]@($data, $row)
} else {
$index[$key] = $row
}
}
$index
}
$csv1 = Import-Csv 'r:\1.csv'
$csv2 = Import-Csv 'r:\2.csv'
$index2 = buildIndex $csv2, 'key'
foreach ($row in $csv1) {
$matchedInCsv2 = $index2[$row.key]
foreach ($row2 in $matchedInCsv2) {
# ........
}
}
Also, if you need speed and iterate a big collection, avoid |
pipelining as it's many times slower than foreach/while/do statements . And don't use anything with a ScriptBlock like where {$_.key -eq $line.key}
in your code because execution context creation adds a ridiculously big overhead compared to the simple code inside.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.