[英]What am I Doing Wrong Here… My for each loop is very slow
I am trying to concatenate the strings in two files and save it in a third file. 我试图将字符串连接到两个文件中,并将其保存在第三个文件中。 But when the records from the first two files are more (say 100000+ records) my output file takes a long time to generate.
但是,如果前两个文件中的记录更多(例如100000条以上的记录),我的输出文件将花费很长时间生成。 What am i doing wrong here... Can someone please help
我在这里做错什么了...有人可以帮忙吗
var fileA = File.ReadAllLines("File1.txt");
var fileB = File.ReadAllLines("File2.txt");
Then Do a cartesian of the Rows in the Files NxM where N and M represent the Number of rows in File1 and File2. 然后对文件NxM中的行进行笛卡尔运算,其中N和M代表File1和File2中的行数。 So if there are 100 and 50 records each in File 1 and File2 Respectively, then the output is 100*50=5000
因此,如果文件1和文件2中分别有100和50条记录,则输出为100 * 50 = 5000
FileStream fs = new FileStream("OutputFile.txt", FileMode.Create);
// First, save the standard output.
TextWriter tmp = Console.Out;
StreamWriter sw = new StreamWriter(fs);
foreach (var lst in cartesian)
{
Console.WriteLine(lst);
Console.SetOut(sw);
Console.WriteLine(lst);
Console.SetOut(tmp);
Console.WriteLine(lst);
}
sw.Close();
I don't think you're doing anything wrong. 我认为您没有做错任何事情。 It just legitimately takes a long time to do a cartesian join of 100,000 x 100,000 records.
合法地花费很长时间进行100,000 x 100,000条记录的笛卡尔连接。 You might improve performance a little bit by doing the join with nested
for
loops instead of LINQ, but your process is probably I/O bound. 通过使用嵌套的
for
循环而不是LINQ进行连接,可以稍微提高性能,但是您的过程可能受I / O约束。
Note that you don't need to use Console.SetOut
, you can call WriteLine
directly on sw
: 请注意,您无需使用
Console.SetOut
,可以直接在sw
上调用WriteLine
:
foreach (var lst in cartesian)
{
Console.WriteLine(lst);
sw.WriteLine(lst);
// and if you want to do it again: Console.WriteLine(lst);
}
Console.WriteLine()
when writing to stdout is relatively heavy. 写入标准输出时,
Console.WriteLine()
相对较重。 See this test where I first just output 100000 lines to a text file with zero other processing, then the second test I write to stdout twice and call SetOut
once each iteration. 看到这个测试,我首先将100000行输出到一个零其他处理的文本文件中,然后我将第二个测试写入stdout两次,并且每次迭代调用一次
SetOut
。 This is slightly different as your test writes to stdout twice but calls SetOut
twice every iteration instead of only once. 这与测试稍有不同,因为您的测试两次写入stdout,但每次迭代调用
SetOut
两次,而不是一次。
FileStream fs = new FileStream(@"c:\temp\OutputFile.txt", FileMode.Create);
StreamWriter sw = new StreamWriter(fs);
TextWriter tmp = Console.Out; // stdout since it hasn't been changed
Console.SetOut(sw); // point to file
var stopw = Stopwatch.StartNew();
for (int i = 0; i < 100000; i++)
{
Console.WriteLine(i); // writes to file
}
sw.Dispose();
fs.Dispose();
var toFileTotalMs = stopw.Elapsed.TotalMilliseconds;
// Reset console to write to stdout
Console.SetOut(tmp);
stopw.Restart();
for (int i = 0; i < 100000; i++)
{
Console.WriteLine(i); // writes to stdout
Console.SetOut(tmp); // point to stdout (every iteration)
Console.WriteLine(i); // writes to stdout
}
var toConsoleTotalMs = stopw.Elapsed.TotalMilliseconds;
Console.WriteLine($"toFileTotalMs={toFileTotalMs}; toConsoleTotalMs={toConsoleTotalMs};");
Console.Read(); // leaves console window open
Outputs: 输出:
toFileTotalMs = 17.7198 toConsoleTotalMs = 15964.9133
toFileTotalMs = 17.7198 toConsoleTotalMs = 15964.9133
So it takes 900 times longer to do two Console.WriteLine()
's to stdout and call SetOut
than it does to just write to the file. 因此,执行两个
Console.WriteLine()
到stdout并调用SetOut
比仅写入文件要长900倍。 I just tried changing the original for loop to call SetOut
every iteration in addition to writing to file and it went from 17.7ms to 43.8ms. 我只是尝试将原始的for循环更改为除了写入文件外,还在每次迭代中调用
SetOut
,它从17.7ms变为43.8ms。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.