OledbDataReader 用完所有 RAM（在 powershell 中）

Question

From all my reading, the oledb datareader does not store records in memory, but this code is maxing out the RAM.从我所有的阅读来看，oledb datareader 不会将记录存储在 memory 中，但这段代码会占用 RAM。 Its meant to pull data from an Oracle db (about 10M records) and write them to a GZIP file.它的目的是从 Oracle db（大约 10M 条记录）中提取数据并将它们写入 GZIP 文件。 I have tried everything (including commenting out the Gzip write) and it still ramps up the RAM until it falls over.我已经尝试了所有方法（包括注释掉 Gzip 写入），它仍然会增加 RAM 直到它崩溃。 Is there are way to just execute the reader without it staying in memory?有没有办法只执行读取器而无需停留在 memory 中？ What am I doing wrong?我究竟做错了什么？

$tableName='ACCOUNTS'
$fileNum=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);           
$cmd.CommandTimeout = '0';        
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
$encoding = [System.Text.Encoding]::UTF8
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())            
{            
        $j++
        $str=$reader[0..$($reader.Fieldcount-1)] -join '|'                        
        $out=$encoding.GetBytes($("$str`n").ToString() )
        $gzipStream.Write($out,0, $out.length)
        if($j % 10000 -eq 0){write-host $j}
        if($j % 1000000 -eq 0){
            write-host 'creating new gz file'
            $gzipStream.Close();
            $gzipStream.Dispose()
            $fileNum+=1
            $gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
            $output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
            [System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
            }
}

Edit: from the comments, [system.gc]::Collect() had no effect.编辑：根据评论，[system.gc]::Collect() 没有效果。 Also, stripping it down to the simplest form and only reading a single field also had no effect.此外，将其剥离为最简单的形式并且仅读取单个字段也没有效果。 This code ramps up to 16GB memory (viewed in task manager) and then quits with OOM此代码增加到 16GB memory（在任务管理器中查看）然后退出并出现 OOM

$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);           
$cmd.CommandTimeout = '0';        
            
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())            
{            
    $str=$reader[0]                       
}

Answer 1

Possibly it's using up virtual address space rather than actual RAM.可能它用完了虚拟地址空间而不是实际 RAM。 That's a common problem with the underlying.Net garbage collector used with (at least) the ADO.Net and string objects created here, especially if any of the records have fields with lots of text.这是与（至少）此处创建的 ADO.Net 和字符串对象一起使用的底层 .Net 垃圾收集器的常见问题，尤其是在任何记录的字段包含大量文本的情况下。

Building on that, it looks like you're doing most of the correct things to avoid this issue (using DataReader, writing directly to a stream, etc).在此基础上，看起来您正在做大部分正确的事情来避免这个问题（使用 DataReader，直接写入 stream 等）。 What you could do to improve this is writing to the stream one field at a time, rather than using -join to push all the fields into the same string and then writing, and making sure we re-use the same $out array buffer (though I'm not sure exactly what this last looks like in PowerShell or with Encoding.GetBytes() .你可以做些什么来改进这一点，一次写入 stream 一个字段，而不是使用-join将所有字段推送到同一个字符串然后写入，并确保我们重新使用相同的$out数组缓冲区（尽管我不确定最后一个在 PowerShell 或Encoding.GetBytes()中是什么样子。

This may help, but it still can create issues with how it concatenates the fieldDelimiter and line terminator.这可能会有所帮助，但它仍然会在如何连接 fieldDelimiter 和行终止符方面产生问题。 If you find this runs for longer, but still eventually produces an error, you probably need to do the tedious work to have separate write operations to the gzip stream for each of those values.如果您发现它运行的时间更长，但最终仍会产生错误，您可能需要进行繁琐的工作，以便为每个值分别对 gzip stream 进行写入操作。

$tableName='ACCOUNTS'
$fileNum=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);           
$cmd.CommandTimeout = '0';        
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
$encoding = [System.Text.Encoding]::UTF8
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())            
{            
        $j++
        $fieldDelimiter= ""
        $terminator = ""
        for ($k=0;$k -lt $reader.Fieldcount;$k++) {
            if ($k -eq $reader.Fieldcount - 1) { $terminator = "`n"}

            $out = $encoding.GetBytes("$fieldDelimiter$($reader[$k])$terminator")
            $gzipStream.Write($out,0,$out.length)

            $fieldDelimiter= "|"            
        }      

        if($j % 10000 -eq 0){write-host $j}
        if($j % 1000000 -eq 0){
            write-host 'creating new gz file'
            $gzipStream.Close();
            $gzipStream.Dispose()
            $fileNum+=1
            $gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
            $output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
            [System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
            }
}

OledbDataReader 用完所有 RAM（在 powershell 中）

问题描述

1 个解决方案

解决方案1
0 2022-02-21 23:06:40

OledbDataReader 用完所有 RAM（在 powershell 中）

问题描述

1 个解决方案

解决方案1 0 2022-02-21 23:06:40

解决方案1
0 2022-02-21 23:06:40