[英]Delphi: Alternative to using Reset/ReadLn for text file reading
i want to process a text file line by line. 我想逐行处理文本文件。 In the olden days i loaded the file into a StringList
: 在过去,我将文件加载到StringList
:
slFile := TStringList.Create();
slFile.LoadFromFile(filename);
for i := 0 to slFile.Count-1 do
begin
oneLine := slFile.Strings[i];
//process the line
end;
Problem with that is once the file gets to be a few hundred megabytes, i have to allocate a huge chunk of memory; 这一问题是一旦文件到达是一个几百兆,我必须分配的内存大块 ; when really i only need enough memory to hold one line at a time. 当我真的只需要足够的内存来保持一行一行时。 (Plus, you can't really indicate progress when you the system is locked up loading the file in step 1). (另外,在系统锁定加载步骤1中的文件时,您无法真正指示进度)。
The i tried using the native, and recommended, file I/O routines provided by Delphi: 我尝试使用Delphi提供的本机和推荐的文件I / O例程:
var
f: TextFile;
begin
Reset(f, filename);
while ReadLn(f, oneLine) do
begin
//process the line
end;
Problem with Assign
is that there is no option to read the file without locking (ie fmShareDenyNone
). Assign
问题在于没有选项可以在没有锁定的情况下读取文件(即fmShareDenyNone
)。 The former stringlist
example doesn't support no-lock either, unless you change it to LoadFromStream
: 以前的stringlist
示例也不支持no-lock,除非您将其更改为LoadFromStream
:
slFile := TStringList.Create;
stream := TFileStream.Create(filename, fmOpenRead or fmShareDenyNone);
slFile.LoadFromStream(stream);
stream.Free;
for i := 0 to slFile.Count-1 do
begin
oneLine := slFile.Strings[i];
//process the line
end;
So now even though i've gained no locks being held, i'm back to loading the entire file into memory. 所以现在即使我没有获得锁定,我又回到将整个文件加载到内存中。
Is there some alternative to Assign
/ ReadLn
, where i can read a file line-by-line, without taking a sharing lock? 是否有一些替代Assign
/ ReadLn
,我可以逐行读取文件,而不需要共享锁?
i'd rather not get directly into Win32 CreateFile
/ ReadFile
, and having to deal with allocating buffers and detecting CR
, LF
, CRLF
's. 我宁愿不直接进入Win32 CreateFile
/ ReadFile
,还要处理分配缓冲区并检测CR
, LF
, CRLF
。
i thought about memory mapped files, but there's the difficulty if the entire file doesn't fit (map) into virtual memory, and having to maps views (pieces) of the file at a time. 我考虑过内存映射文件,但如果整个文件不适合(映射)到虚拟内存中,并且必须一次映射文件的视图(片段),则会遇到困难。 Starts to get ugly. 开始变得难看。
i just want Reset
with fmShareDenyNone
! 我只想用fmShareDenyNone
Reset
!
With recent Delphi versions, you can use TStreamReader
. 使用最新的Delphi版本,您可以使用TStreamReader
。 Construct it with your file stream, and then call its ReadLine
method (inherited from TTextReader
). 使用文件流构造它,然后调用其ReadLine
方法 (继承自TTextReader
)。
An option for all Delphi versions is to use Peter Below's StreamIO unit , which gives you AssignStream
. 所有Delphi版本的选项是使用Peter Below的StreamIO单元 ,它为您提供AssignStream
。 It works just like AssignFile
, but for streams instead of file names. 它的工作方式与AssignFile
类似,但对于流而不是文件名。 Once you've used that function to associate a stream with a TextFile
variable, you can call ReadLn
and the other I/O functions on it just like any other file. 一旦使用该函数将流与TextFile
变量相关联,就可以像调用任何其他文件一样调用ReadLn
和其他I / O函数。
如果您需要在较旧的Delphis中支持ansi和Unicode,则可以使用我的GpTextFile或GpTextStream 。
You can use this sample code: 您可以使用此示例代码:
TTextStream = class(TObject)
private
FHost: TStream;
FOffset,FSize: Integer;
FBuffer: array[0..1023] of Char;
FEOF: Boolean;
function FillBuffer: Boolean;
protected
property Host: TStream read FHost;
public
constructor Create(AHost: TStream);
destructor Destroy; override;
function ReadLn: string; overload;
function ReadLn(out Data: string): Boolean; overload;
property EOF: Boolean read FEOF;
property HostStream: TStream read FHost;
property Offset: Integer read FOffset write FOffset;
end;
{ TTextStream }
constructor TTextStream.Create(AHost: TStream);
begin
FHost := AHost;
FillBuffer;
end;
destructor TTextStream.Destroy;
begin
FHost.Free;
inherited Destroy;
end;
function TTextStream.FillBuffer: Boolean;
begin
FOffset := 0;
FSize := FHost.Read(FBuffer,SizeOf(FBuffer));
Result := FSize > 0;
FEOF := Result;
end;
function TTextStream.ReadLn(out Data: string): Boolean;
var
Len, Start: Integer;
EOLChar: Char;
begin
Data:='';
Result:=False;
repeat
if FOffset>=FSize then
if not FillBuffer then
Exit; // no more data to read from stream -> exit
Result:=True;
Start:=FOffset;
while (FOffset<FSize) and (not (FBuffer[FOffset] in [#13,#10])) do
Inc(FOffset);
Len:=FOffset-Start;
if Len>0 then begin
SetLength(Data,Length(Data)+Len);
Move(FBuffer[Start],Data[Succ(Length(Data)-Len)],Len);
end else
Data:='';
until FOffset<>FSize; // EOL char found
EOLChar:=FBuffer[FOffset];
Inc(FOffset);
if (FOffset=FSize) then
if not FillBuffer then
Exit;
if FBuffer[FOffset] in ([#13,#10]-[EOLChar]) then begin
Inc(FOffset);
if (FOffset=FSize) then
FillBuffer;
end;
end;
function TTextStream.ReadLn: string;
begin
ReadLn(Result);
end;
Usage: 用法:
procedure ReadFileByLine(Filename: string);
var
sLine: string;
tsFile: TTextStream;
begin
tsFile := TTextStream.Create(TFileStream.Create(Filename, fmOpenRead or fmShareDenyWrite));
try
while tsFile.ReadLn(sLine) do
begin
//sLine is your line
end;
finally
tsFile.Free;
end;
end;
What I do is use a TFileStream but I buffer the input into fairly large blocks (eg a few megabytes each) and read and process one block at a time. 我所做的是使用TFileStream,但我将输入缓冲到相当大的块(例如每个几兆字节),并一次读取和处理一个块。 That way I don't have to load the whole file at once. 这样我就不必一次加载整个文件。
It works quite quickly that way, even for large files. 它的工作速度非常快,即使对于大文件也是如此。
I do have a progress indicator. 我有一个进度指示器。 As I load each block, I increment it by the fraction of the file that has additionally been loaded. 当我加载每个块时,我将它增加了另外加载的文件的分数。
Reading one line at a time, without something to do your buffering, is simply too slow for large files. 一次读取一行,无需进行缓冲,对于大文件来说太慢了。
As it seems the FileMode variable is not valid for Textfiles, but my tests showed that multiple reading from the file is no problem. 因为看起来FileMode变量对Textfiles无效,但我的测试显示从文件中多次读取没有问题。 You didn't mention it in your question, but if you are not going to write to the textfile while it is read you should be good. 你没有在你的问题中提到它,但是如果你在阅读时不打算写文本文件,你应该是好的。
Why not simply read the lines of the file directly from the TFileStream itself one at a time ? 为什么不直接从TFileStream本身直接读取文件的行?
ie (in pseudocode): 即(伪代码):
readline:
while NOT EOF and (readchar <> EOL) do
appendchar to result
while NOT EOF do
begin
s := readline
process s
end;
One problem you may find with this is that iirc TFileStream is not buffered so performance over a large file is going to be sub-optimal. 您可能会发现的一个问题是iirc TFileStream没有缓冲,因此大文件的性能将不是最佳的。 However, there are a number of solutions to the problem of non-buffered streams, including this one , that you may wish to investigate if this approach solves your initial problem. 但是,对于非缓冲流问题有很多解决方案, 包括这个问题 ,您可能希望调查此方法是否解决了您的初始问题。
I had same problem a few years ago especially the problem of locking the file. 几年前我遇到了同样的问题,尤其是锁定文件的问题。 What I did was use the low level readfile from the shellapi. 我所做的是使用shellapi的低级读取文件。 I know the question is old since my answer (2 years) but perhaps my contribution could help someone in the future. 我知道自从我的回答(2年)以来这个问题已经很久了,但也许我的贡献可以帮助将来的某个人。
const
BUFF_SIZE = $8000;
var
dwread:LongWord;
hFile: THandle;
datafile : array [0..BUFF_SIZE-1] of char;
hFile := createfile(PChar(filename)), GENERIC_READ, FILE_SHARE_READ or FILE_SHARE_WRITE, nil, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, 0);
SetFilePointer(hFile, 0, nil, FILE_BEGIN);
myEOF := false;
try
Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
while (dwread > 0) and (not myEOF) do
begin
if dwread = BUFF_SIZE then
begin
apos := LastDelimiter(#10#13, datafile);
if apos = BUFF_SIZE then inc(apos);
SetFilePointer(hFile, aPos-BUFF_SIZE, nil, FILE_CURRENT);
end
else myEOF := true;
Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
end;
finally
closehandle(hFile);
end;
For me the speed improvement appeared to be significant. 对我来说,速度提升似乎很重要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.