简体   繁体   English

循环浏览大量CSV文件

[英]Loop through huge CSV files

I have to loop through a file with ~50.000 rows each day to generate reports and import those data records into our database 我必须遍历每天约有50.000行的文件,以生成报告并将这些数据记录导入到我们的数据库中

Since I have to execute some -replace -statements and stuff I currently loop through each row via foreach . 由于我必须执行一些-replace statements和其他内容,因此我目前通过foreach在每一行中循环。 This approach finishes after ~16 Minutes: 此方法在约16分钟后完成:

$csv_file = ".\testfile.csv"
$csv_import = Import-Csv $csv_file -Delimiter ";" -Encoding "default"

function Import-CsvVersion1 {
    $results = @()

    foreach ($data in $csv_import) {
        $properties = [ordered]@{
            id          = $data."id"
            name        = $data."name"
            description = $data."description"
            netcost     = $data."netcost"
            rrp         = $data."rrp"
        }
        $results += New-Object PSObject -Property $properties
    }

    # Export $results into final csv
}

I found another approach where the result of the foreach will be assigned directly to the $results variable. 我发现了另一种方法,其中foreach的结果将直接分配给$results变量。 This approach finished after ~8 Minutes (so it needs only half of the time): 此方法在约8分钟后完成(因此只需要一半的时间):

$csv_file = ".\testfile.csv"
$csv_import = Import-Csv $csv_file -Delimiter ";" -Encoding "default"

function Import-CsvVersion2 {
    $results = foreach ($data in $csv_import) {
        [PSCustomObject]@{
            id          = $data."id"
            name        = $data."name"
            description = $data."description"
            netcost     = $data."netcost"
            rrp         = $data."rrp"
        }
    }

    # Export $results into final csv
}

I've read somewhere that a loop via ForEach-Object may be even faster - unfortunately I don't know how to start with this. 我读过某个地方的文章,说通过ForEach-Object进行循环可能会更快-不幸的是,我不知道如何开始。

Thanks to @GuentherSchmitz I was able to create the third test-function. 感谢@GuentherSchmitz,我得以创建第三个测试函数。 In the next test I used a CSV-file with ~2.000 rows the results where: 在下一个测试中,我使用了一个带有〜2.000行的CSV文件,结果如下:

  • Import-CsvVersion1 -> 4 Minute(s) 24 Seconds Import-CsvVersion1 > 4分钟24秒
  • Import-CsvVersion2 -> 0 Minute(s) 18 Seconds Import-CsvVersion2 > 0分钟18秒
  • Import-CsvVersion3 -> 1 Minute(s) 20 Seconds Import-CsvVersion3 > 1分钟20秒

Thanks again for the help :-) 再次感谢您的帮助 :-)

PS: I also got rid of a former Write-Progress which apparently slowed the script down by about 80% PS:我还摆脱了以前的Write-Progress ,它显然使脚本速度降低了约80%

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM