简体   繁体   English

TStringList.SaveToFile的等效“流”代码是什么,并且对于大量数据哪个更好?

[英]What is the equivalent 'streams' code of TStringList.SaveToFile and which is better for large amounts of data?

The following console application utilises TStringList.SaveToFile to write multiples lines to a text file: 以下控制台应用程序利用TStringList.SaveToFile将多个行写入文本文件:

program Project1;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Classes;
var
  i: Integer;
  a,b,c: Single;
  myString : String;
  myStringList : TStringList;
begin
  try
    Randomize;
    myStringList := TStringList.Create; 
    for i := 0 to 1000000 do
    begin
      a := Random;
      b := Random;
      c := Random;
      myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + FloatToStr(c);
      myStringList.Add(myString);
    end;
    myStringList.SaveToFile('Output.txt');
    myStringList.Free;
    WriteLn('Done');
    Sleep(10000);
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

It takes around 3 seconds to write a >50MB file with 1000001 lines and seems to work fine. 写入大于50MB且具有1000001行的文件大约需要3秒钟,并且似乎工作正常。 However, many people advocate using streams for such processes. 但是,许多人主张将流用于此类过程。 What would the stream equivalent be and what are the advantages/disadvantages of using it compared to TStringList.SaveToFile? 与TStringList.SaveToFile相比,等效的流是什么?使用它的优缺点是什么?

It may be faster to write directly to a stream. 直接写入流可能更快。 Or it may not. 否则可能不会。 I suggest you try it out and time both options. 我建议您尝试一下并为这两个选项计时。 Writing to a stream looks like this: 写入流看起来像这样:

for i := 0 to 1000000 do
begin
  a := Random;
  b := Random;
  c := Random;
  myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + 
    FloatToStr(c) + sLineBreak;
  Stream.WriteBuffer(myString[1], Length(myString)*SizeOf(myString[1]));
end;

To have any hope of this version being fast, you need to use a buffered stream. 为了使这个版本很快,您需要使用缓冲流。 Try this one: Buffered files (for faster disk access) . 尝试以下一项: 缓冲文件(以加快磁盘访问速度)

The code above will output UTF-16 text on modern Delphi. 上面的代码将在现代Delphi上输出UTF-16文本。 If you want to output ANSI text simply declare myString as AnsiString . 如果要输出ANSI文本,只需将myString声明为AnsiString

I'll let you do the timing, but my guess is that this variant performs similarly to the string list. 我让您安排时间,但是我猜想这个变体的性能类似于字符串列表。 I suspect that the time is spent calling Random and FloatToStr . 我怀疑时间花费在调用RandomFloatToStr I expect that the file saving with the string list is already very fast. 我希望使用字符串列表保存文件已经非常快了。

Putting speed to one side, there is another benefit of this approach. 一方面讲,这种方法还有另一个好处。 In the string list approach, as per the code in the question, the entire content of the text file is stored in memory. 在字符串列表方法中,按照问题中的代码,文本文件的全部内容都存储在内存中。 And when you save the file, another copy is made as part of the save procedure. 并且,当您保存文件时,会在保存过程中制作另一个副本。 So you will have two copies of the entire file in memory. 因此,您将在内存中拥有整个文件的两个副本。

In contrast, when saving directly to a stream, the only memory requirement is whatever buffer your stream class uses. 相反,直接保存到流时,唯一的内存需求是流类使用的任何缓冲区。 For a 50MB file as per the question there's likely no real problem with either approach. 根据问题,对于50MB的文件,两种方法都可能没有真正的问题。 For a much larger file then you will run into out of memory errors if you try to hold the entire file in memory. 对于更大的文件,如果尝试将整个文件保存在内存中,则会遇到内存不足错误。


Personally though, I'd consider making use of the TStreamWriter class. 不过,就我个人而言,我将考虑使用TStreamWriter类。 This useful class separates the concerns of writing data (text, values etc.) from the concern of pushing to a stream. 这个有用的类将写数据(文本,值等)的关注与推送到流的关注分开了。 Your code would become: 您的代码将变为:

Writer := TStreamWriter.Create(Stream);//use whatever stream you like
try
  for i := 0 to 1000000 do
  begin
    a := Random;
    b := Random;
    c := Random;
    Writer.WriteLine(FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) +
      FloatToStr(c));
  end;
finally
  Writer.Free;
end;

The TStreamWriter implements buffering with a 1KB buffer so you can use TFileStream and expect to get reasonable performance. TStreamWriter使用1KB缓冲区实现缓冲,因此您可以使用TFileStream并期望获得合理的性能。


I would recommend that you choose the technique that leads to the most readable code. 我建议您选择导致代码可读性最高的技术。 If performance becomes an issue you can optimise that later. 如果性能成为问题,您可以稍后对其进行优化。 My personal preference would be for TStreamWriter . 我个人更喜欢TStreamWriter This gives very clean and readable code, yet also excellent separation of content generation from streaming. 这给出了非常清晰易读的代码,同时还很好地将内容生成与流分离。 The performance is perfectly reasonable also. 性能也完全合理。

A TFileStream based solution would look as follows, but there are some important points: 基于TFileStream的解决方案如下所示,但是有一些重要的要点:

  • The TFileStream code is slower. TFileStream代码较慢。 There's no buffering in TFileStream and writing 20 bytes at a time to file is not effective. TFileStream没有缓冲,一次向文件写入20个字节是无效的。 The TStringList bufferes everything in RAM and saves it all at once. TStringList将所有内容缓冲在RAM中,并立即将其全部保存。 That's optimum, but it uses a lot of RAM. 这是最佳选择,但它会占用大量RAM。
  • In the TStringList - based variant 50% of time is spent in Random , as expected actually. 在基于TStringList的变体中, Random实际花费了50%的时间。
  • For the TFileStream solution to become more effective you'd need to roll a buffering scheme so you'd write a reasonable amount to disk each time (example: 4Kb) 为了使TFileStream解决方案更有效,您需要滚动缓冲方案,以便每次都在磁盘上写入合理的数量(例如:4Kb)

Code: 码:

program Project9;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  SysUtils,
  Classes,
  DateUtils;
var
  i: Integer;
  a,b,c: Single;
  myString : AnsiString;
  StartTime: TDateTime;
  F: TFileStream;
begin
  try
    Randomize;
    StartTime := Now;
    F := TFileStream.Create('Output.txt', fmCreate);
    try
      for i := 0 to 1000000 do
      begin
        a := Random;
        b := Random;
        c := Random;
        myString := FloatToStr(a) + Char(9) + FloatToStr(b) + Char(9) + FloatToStr(c);
        myString := AnsiString(Format('%f'#9'%f'#9'%f'#13#10, [a, b, c]));
        F.WriteBuffer(myString[1], Length(myString));
      end;
    finally F.Free;
    end;
    WriteLn('Done. ', SecondOf(Now-StartTime), ':', MilliSecondOf(Now-StartTime));
    ReadLn;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM