[英]Process a CSV file starting at a predetermined line/row using LumenWorks parser
I am using LumenWorks
awesome CSV reader to process CSV files.我正在使用LumenWorks
很棒的 CSV 阅读器来处理 CSV 文件。 Some files have over 1 million records.有些文件有超过 100 万条记录。
What I want is to process the file in sections.我想要的是分段处理文件。 Eg I want to process 100,000 records first, validate the data and then send this records over an Internet connection.例如,我想先处理 100,000 条记录,验证数据,然后通过 Internet 连接发送这些记录。 Once sent, I then reopen the file and continue from record 100,001.发送后,我重新打开文件并从记录 100,001 继续。 On and on till I finish processing the file.一直这样,直到我处理完文件。 In my application I have already created the logic of keeping track of which record I am currently processing.在我的应用程序中,我已经创建了跟踪我当前正在处理的记录的逻辑。
Does the LumenWorks parser support processing from a predetermined line in the CSV or it always has to start from the top? LumenWorks 解析器是否支持从 CSV 中的预定行进行处理,还是必须始终从顶部开始? I see it has a buffer variable.我看到它有一个缓冲区变量。 Is there a way to use this buffer variable to achieve my goal?有没有办法使用这个缓冲区变量来实现我的目标?
my_csv = New CsvReader(New StreamReader(file_path), False, ",", buffer_variable)
It seems the LumenWorks CSV Reader
needs to start at the top - I needed to ignore the first n lines in a file, and attempted to pass a StreamReader
that was at the correct position/row, but got a Key already exists
Dictionary
error when I attempted to get the FieldCount
(there were no duplicates).似乎LumenWorks CSV Reader
需要从顶部开始 - 我需要忽略文件中的前 n 行,并试图传递一个位于正确位置/行的StreamReader
,但是当我得到一个Key already exists
Dictionary
错误时试图获取FieldCount
(没有重复项)。
However, I have found some success by first reading pre-trimmed file into StringBuilder
and then into a StringReader
to allow the CSV Reader to read it.但是,通过首先将预先修剪的文件读入StringBuilder
,然后读入StringReader
以允许 CSV 阅读器读取它,我发现了一些成功。 Your mileage may vary with huge files, but it does help to trim a file:您的里程可能因大文件而异,但它确实有助于修剪文件:
using (StreamReader sr = new StreamReader(filePath))
{
string line = sr.ReadLine();
StringBuilder sbCsv = new StringBuilder();
int lineNumber = 0;
do
{
lineNumber++;
// Ignore the start rows of the CSV file until we reach the header
if (lineNumber >= Constants.HeaderStartingRow)
{
// Place into StringBuilder
sbCsv.AppendLine(line);
}
}
while ((line = sr.ReadLine()) != null);
// Use a StringReader to read the trimmed CSV file into a CSV Reader
using (StringReader str = new StringReader(sbCsv.ToString()))
{
using (CsvReader csv = new CsvReader(str, true))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
{
// Do Work
}
}
}
}
}
You might be able to adapt this solution to reading chunks of a file - eg as you read through the StreamReader
, assign different "chunks" to a Collection
of StringBuilder
objects and also pre-pend the header row if you want it.您可能能够调整此解决方案以读取文件的块 - 例如,当您阅读StreamReader
,将不同的“块”分配给StringBuilder
对象的Collection
,如果需要,还可以预先添加标题行。
尝试使用 CachedCSVReader 而不是 CSVReader 和 MoveTo(long recordnumber)、MoveToStart 等方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.