[英]Combine multiple files into single file
代碼:
static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
string[] fileAry = Directory.GetFiles(dirPath, filePattern);
Console.WriteLine("Total File Count : " + fileAry.Length);
using (TextWriter tw = new StreamWriter(destFile, true))
{
foreach (string filePath in fileAry)
{
using (TextReader tr = new StreamReader(filePath))
{
tw.WriteLine(tr.ReadToEnd());
tr.Close();
tr.Dispose();
}
Console.WriteLine("File Processed : " + filePath);
}
tw.Close();
tw.Dispose();
}
}
我需要對其進行優化,因為它非常慢:平均大小為 40 — 50 Mb XML 文件的 45 個文件需要 3 分鍾。
請注意:平均 45 MB 的 45 個文件只是一個例子,它可以是n
m
大小的文件,其中n
以千為單位, m
可以是平均 128 Kb。 簡而言之,它可以變化。
你能提供任何關於優化的意見嗎?
為什么不直接使用Stream.CopyTo(Stream destination)
方法?
private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
{
string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
using (var outputStream = File.Create(outputFilePath))
{
foreach (var inputFilePath in inputFilePaths)
{
using (var inputStream = File.OpenRead(inputFilePath))
{
// Buffer size can be passed as the second argument.
inputStream.CopyTo(outputStream);
}
Console.WriteLine("The file {0} has been processed.", inputFilePath);
}
}
}
請注意,上述方法已重載。
有兩種方法重載:
第二個方法重載通過bufferSize
參數提供緩沖區大小調整。
你可以做幾件事:
我的經驗是默認緩沖區大小可以增加到大約 120K 的顯着好處,我懷疑在所有流上設置一個大緩沖區將是最簡單和最顯着的性能提升:
new System.IO.FileStream("File.txt", System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read, 150000);
使用Stream
類,而不是StreamReader
類。
using
語句。一種選擇是利用復制命令,讓它做擅長的事情。
就像是:
static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
var cmd = new ProcessStartInfo("cmd.exe",
String.Format("/c copy {0} {1}", filePattern, destFile));
cmd.WorkingDirectory = dirPath;
cmd.UseShellExecute = false;
Process.Start(cmd);
}
我會使用 BlockingCollection 來讀取,以便您可以同時讀取和寫入。
顯然應該寫入單獨的物理磁盤以避免硬件爭用。 此代碼將保留順序。
讀取將比寫入更快,因此不需要並行讀取。
同樣,由於讀取速度會更快,因此限制了集合的大小,因此讀取不會比寫入更早。
在寫入當前文件的同時並行讀取單個 next 的簡單任務存在文件大小不同的問題 - 寫入小文件比讀取大文件快。
我使用這種模式在 T1 上讀取和解析文本,然后在 T2 上插入到 SQL。
public void WriteFiles()
{
using (BlockingCollection<string> bc = new BlockingCollection<string>(10))
{
// play with 10 if you have several small files then a big file
// write can get ahead of read if not enough are queued
TextWriter tw = new StreamWriter(@"c:\temp\alltext.text", true);
// clearly you want to write to a different phyical disk
// ideally write to solid state even if you move the files to regular disk when done
// Spin up a Task to populate the BlockingCollection
using (Task t1 = Task.Factory.StartNew(() =>
{
string dir = @"c:\temp\";
string fileText;
int minSize = 100000; // play with this
StringBuilder sb = new StringBuilder(minSize);
string[] fileAry = Directory.GetFiles(dir, @"*.txt");
foreach (string fi in fileAry)
{
Debug.WriteLine("Add " + fi);
fileText = File.ReadAllText(fi);
//bc.Add(fi); for testing just add filepath
if (fileText.Length > minSize)
{
if (sb.Length > 0)
{
bc.Add(sb.ToString());
sb.Clear();
}
bc.Add(fileText); // could be really big so don't hit sb
}
else
{
sb.Append(fileText);
if (sb.Length > minSize)
{
bc.Add(sb.ToString());
sb.Clear();
}
}
}
if (sb.Length > 0)
{
bc.Add(sb.ToString());
sb.Clear();
}
bc.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
string text;
try
{
while (true)
{
text = bc.Take();
Debug.WriteLine("Take " + text);
tw.WriteLine(text);
}
}
catch (InvalidOperationException)
{
// An InvalidOperationException means that Take() was called on a completed collection
Debug.WriteLine("That's All!");
tw.Close();
tw.Dispose();
}
}))
Task.WaitAll(t1, t2);
}
}
}
sergey-brunov發布的合並 2GB 文件的嘗試解決方案。 系統為此工作占用了大約 2 GB 的 RAM。 我進行了一些更改以進行更多優化,現在需要 350MB RAM 來合並 2GB 文件。
private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
{
string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
foreach (var inputFilePath in inputFilePaths)
{
using (var outputStream = File.AppendText(outputFilePath))
{
// Buffer size can be passed as the second argument.
outputStream.WriteLine(File.ReadAllText(inputFilePath));
Console.WriteLine("The file {0} has been processed.", inputFilePath);
}
}
}
// Binary File Copy
public static void mergeFiles(string strFileIn1, string strFileIn2, string strFileOut, out string strError)
{
strError = String.Empty;
try
{
using (FileStream streamIn1 = File.OpenRead(strFileIn1))
using (FileStream streamIn2 = File.OpenRead(strFileIn2))
using (FileStream writeStream = File.OpenWrite(strFileOut))
{
BinaryReader reader = new BinaryReader(streamIn1);
BinaryWriter writer = new BinaryWriter(writeStream);
// create a buffer to hold the bytes. Might be bigger.
byte[] buffer = new Byte[1024];
int bytesRead;
// while the read method returns bytes keep writing them to the output stream
while ((bytesRead =
streamIn1.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
while ((bytesRead =
streamIn2.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
}
}
catch (Exception ex)
{
strError = ex.Message;
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.