[英]Efficiently join multiple CSV files keeping the header from first file in C#
Given multiple CSV files, that can be hundreds of megabytes or more per file. 给定多个CSV文件,每个文件可以达到数百兆字节或更多。 They all have the same header row starting the file and have CRLF at the end of each line.
它们都在文件的开头具有相同的标题行,并且每行的末尾都有CRLF。 Each file may or may not have a CRLF at the end of the file.
每个文件的末尾可能有CRLF,也可能没有CRLF。 The goal is to:
目标是:
Given the size of the files, this needs to be as fast and memory efficient as possible. 在给定文件大小的情况下,这需要尽可能快并且内存效率更高。
If the headers are the same, then you can just open a write stream, then go through all the input files, opening read streams for them and copying data. 如果标题相同,则可以打开一个写流,然后遍历所有输入文件,为它们打开读取流并复制数据。 The first file is copied in its entirety.
第一个文件被完整复制。 Subsequent files have the first line skipped.
后续文件的第一行被跳过。
That approach would be the fastest, so long as you are 100% sure the columns align and it's only the first line that needs skipping. 只要您100%确定列对齐并且仅是第一行需要跳过,该方法将是最快的。
This kind of thing would be quite straightforward to do on a Unix-style command line, btw. 这种事情在Unix风格的命令行btw上非常简单。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.