简体   繁体   English

Delphi:使用Reset / ReadLn替代文本文件读取

[英]Delphi: Alternative to using Reset/ReadLn for text file reading

i want to process a text file line by line. 我想逐行处理文本文件。 In the olden days i loaded the file into a StringList : 在过去,我将文件加载到StringList

slFile := TStringList.Create();
slFile.LoadFromFile(filename);

for i := 0 to slFile.Count-1 do
begin
   oneLine := slFile.Strings[i];
   //process the line
end;

Problem with that is once the file gets to be a few hundred megabytes, i have to allocate a huge chunk of memory; 这一问题是一旦文件到达是一个几百兆,我必须分配的内存大块 ; when really i only need enough memory to hold one line at a time. 当我真的只需要足够的内存来保持一行一行时。 (Plus, you can't really indicate progress when you the system is locked up loading the file in step 1). (另外,在系统锁定加载步骤1中的文件时,您无法真正指示进度)。

The i tried using the native, and recommended, file I/O routines provided by Delphi: 我尝试使用Delphi提供的本机和推荐的文件I / O例程:

var
   f: TextFile;
begin
   Reset(f, filename);
   while ReadLn(f, oneLine) do
   begin
       //process the line
   end;

Problem with Assign is that there is no option to read the file without locking (ie fmShareDenyNone ). Assign问题在于没有选项可以在没有锁定的情况下读取文件(即fmShareDenyNone )。 The former stringlist example doesn't support no-lock either, unless you change it to LoadFromStream : 以前的stringlist示例也不支持no-lock,除非您将其更改为LoadFromStream

slFile := TStringList.Create;
stream := TFileStream.Create(filename, fmOpenRead or fmShareDenyNone);
   slFile.LoadFromStream(stream);
stream.Free;

for i := 0 to slFile.Count-1 do
begin
   oneLine := slFile.Strings[i];
   //process the line
end;

So now even though i've gained no locks being held, i'm back to loading the entire file into memory. 所以现在即使我没有获得锁定,我又回到将整个文件加载到内存中。

Is there some alternative to Assign / ReadLn , where i can read a file line-by-line, without taking a sharing lock? 是否有一些替代Assign / ReadLn ,我可以逐行读取文件,而不需要共享锁?

i'd rather not get directly into Win32 CreateFile / ReadFile , and having to deal with allocating buffers and detecting CR , LF , CRLF 's. 我宁愿不直接进入Win32 CreateFile / ReadFile ,还要处理分配缓冲区并检测CRLFCRLF

i thought about memory mapped files, but there's the difficulty if the entire file doesn't fit (map) into virtual memory, and having to maps views (pieces) of the file at a time. 我考虑过内存映射文件,但如果整个文件不适合(映射)到虚拟内存中,并且必须一次映射文件的视图(片段),则会遇到困难。 Starts to get ugly. 开始变得难看。

i just want Reset with fmShareDenyNone ! 我只想用fmShareDenyNone Reset

With recent Delphi versions, you can use TStreamReader . 使用最新的Delphi版本,您可以使用TStreamReader Construct it with your file stream, and then call its ReadLine method (inherited from TTextReader ). 使用文件流构造它,然后调用ReadLine方法 (继承自TTextReader )。

An option for all Delphi versions is to use Peter Below's StreamIO unit , which gives you AssignStream . 所有Delphi版本的选项是使用Peter Below的StreamIO单元 ,它为您提供AssignStream It works just like AssignFile , but for streams instead of file names. 它的工作方式与AssignFile类似,但对于流而不是文件名。 Once you've used that function to associate a stream with a TextFile variable, you can call ReadLn and the other I/O functions on it just like any other file. 一旦使用该函数将流与TextFile变量相关联,就可以像调用任何其他文件一样调用ReadLn和其他I / O函数。

如果您需要在较旧的Delphis中支持ansi和Unicode,则可以使用我的GpTextFileGpTextStream

You can use this sample code: 您可以使用此示例代码:

TTextStream = class(TObject)
      private
        FHost: TStream;
        FOffset,FSize: Integer;
        FBuffer: array[0..1023] of Char;
        FEOF: Boolean;
        function FillBuffer: Boolean;
      protected
        property Host: TStream read FHost;
      public
        constructor Create(AHost: TStream);
        destructor Destroy; override;
        function ReadLn: string; overload;
        function ReadLn(out Data: string): Boolean; overload;
        property EOF: Boolean read FEOF;
        property HostStream: TStream read FHost;
        property Offset: Integer read FOffset write FOffset;
      end;

    { TTextStream }

    constructor TTextStream.Create(AHost: TStream);
    begin
      FHost := AHost;
      FillBuffer;
    end;

    destructor TTextStream.Destroy;
    begin
      FHost.Free;
      inherited Destroy;
    end;

    function TTextStream.FillBuffer: Boolean;
    begin
      FOffset := 0;
      FSize := FHost.Read(FBuffer,SizeOf(FBuffer));
      Result := FSize > 0;
      FEOF := Result;
    end;

    function TTextStream.ReadLn(out Data: string): Boolean;
    var
      Len, Start: Integer;
      EOLChar: Char;
    begin
      Data:='';
      Result:=False;
      repeat
        if FOffset>=FSize then
          if not FillBuffer then
            Exit; // no more data to read from stream -> exit
        Result:=True;
        Start:=FOffset;
        while (FOffset<FSize) and (not (FBuffer[FOffset] in [#13,#10])) do
          Inc(FOffset);
        Len:=FOffset-Start;
        if Len>0 then begin
          SetLength(Data,Length(Data)+Len);
          Move(FBuffer[Start],Data[Succ(Length(Data)-Len)],Len);
        end else
          Data:='';
      until FOffset<>FSize; // EOL char found
      EOLChar:=FBuffer[FOffset];
      Inc(FOffset);
      if (FOffset=FSize) then
        if not FillBuffer then
          Exit;
      if FBuffer[FOffset] in ([#13,#10]-[EOLChar]) then begin
        Inc(FOffset);
        if (FOffset=FSize) then
          FillBuffer;
      end;
    end;

    function TTextStream.ReadLn: string;
    begin
      ReadLn(Result);
    end;

Usage: 用法:

procedure ReadFileByLine(Filename: string);
var
  sLine: string;
  tsFile: TTextStream;
begin
  tsFile := TTextStream.Create(TFileStream.Create(Filename, fmOpenRead or    fmShareDenyWrite));
  try
    while tsFile.ReadLn(sLine) do
    begin
      //sLine is your line
    end;
  finally
    tsFile.Free;
  end;
end;

What I do is use a TFileStream but I buffer the input into fairly large blocks (eg a few megabytes each) and read and process one block at a time. 我所做的是使用TFileStream,但我将输入缓冲到相当大的块(例如每个几兆字节),并一次读取和处理一个块。 That way I don't have to load the whole file at once. 这样我就不必一次加载整个文件。

It works quite quickly that way, even for large files. 它的工作速度非常快,即使对于大文件也是如此。

I do have a progress indicator. 我有一个进度指示器。 As I load each block, I increment it by the fraction of the file that has additionally been loaded. 当我加载每个块时,我将它增加了另外加载的文件的分数。

Reading one line at a time, without something to do your buffering, is simply too slow for large files. 一次读取一行,无需进行缓冲,对于大文件来说太慢了。

As it seems the FileMode variable is not valid for Textfiles, but my tests showed that multiple reading from the file is no problem. 因为看起来FileMode变量对Textfiles无效,但我的测试显示从文件中多次读取没有问题。 You didn't mention it in your question, but if you are not going to write to the textfile while it is read you should be good. 你没有在你的问题中提到它,但是如果你在阅读时不打算写文本文件,你应该是好的。

Why not simply read the lines of the file directly from the TFileStream itself one at a time ? 为什么不直接从TFileStream本身直接读取文件的行?

ie (in pseudocode): 即(伪代码):

  readline: 
    while NOT EOF and (readchar <> EOL) do
      appendchar to result


  while NOT EOF do
  begin
    s := readline
    process s
  end;

One problem you may find with this is that iirc TFileStream is not buffered so performance over a large file is going to be sub-optimal. 您可能会发现的一个问题是iirc TFileStream没有缓冲,因此大文件的性能将不是最佳的。 However, there are a number of solutions to the problem of non-buffered streams, including this one , that you may wish to investigate if this approach solves your initial problem. 但是,对于非缓冲流问题有很多解决方案, 包括这个问题 ,您可能希望调查此方法是否解决了您的初始问题。

I had same problem a few years ago especially the problem of locking the file. 几年前我遇到了同样的问题,尤其是锁定文件的问题。 What I did was use the low level readfile from the shellapi. 我所做的是使用shellapi的低级读取文件。 I know the question is old since my answer (2 years) but perhaps my contribution could help someone in the future. 我知道自从我的回答(2年)以来这个问题已经很久了,但也许我的贡献可以帮助将来的某个人。

const
  BUFF_SIZE = $8000;
var
  dwread:LongWord;
  hFile: THandle;
  datafile : array [0..BUFF_SIZE-1] of char;

hFile := createfile(PChar(filename)), GENERIC_READ, FILE_SHARE_READ or FILE_SHARE_WRITE, nil, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, 0);
SetFilePointer(hFile, 0, nil, FILE_BEGIN);
myEOF := false;
try
  Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);   
  while (dwread > 0) and (not myEOF) do
  begin
    if dwread = BUFF_SIZE then
    begin
      apos := LastDelimiter(#10#13, datafile);
      if apos = BUFF_SIZE then inc(apos);
      SetFilePointer(hFile, aPos-BUFF_SIZE, nil, FILE_CURRENT);
    end
    else myEOF := true;
    Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
  end;
finally
   closehandle(hFile);
end;

For me the speed improvement appeared to be significant. 对我来说,速度提升似乎很重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM