简体繁体 English

使用TStringList加载巨大的文本文件是Delphi中的最佳方法吗？

[英]Is using TStringList to load huge text file the best way in Delphi?

原文 2011-02-27 15:48:59 9 4 delphi/ text-files

What is the best way to load huge text file data in delphi? 在delphi中加载大量文本文件数据的最佳方法是什么？ Is there any component that can load text file superfast? 有没有可以超快加载文本文件的组件？

Let's say I have a text file contains database and stored in fix length format. 假设我有一个包含数据库并以固定长度格式存储的文本文件。 It contains 150 field with each at least 50 characters. 它包含150个字段，每个字段至少50个字符。 1. I need to load it into memory 2. I need to parse it and probably store it in a memdataset for processing 1.我需要将其加载到内存中2.我需要对其进行解析，并可能将其存储在memdataset中进行处理

My questions: 1. Is it enough if I use TStringList.loadFromFile method? 我的问题：1.如果我使用TStringList.loadFromFile方法是否足够？ 2. Is there any other better component to manipulate the text file? 2.还有其他更好的组件来操纵文本文件吗？ 3. Should I use low level reading from textfile? 3.我应该使用低级读取文本文件吗？

Thank you in advance. 先感谢您。

4 个解决方案

TStringList is never the optimal way of working with lots of text, but it's the simplest. TStringList从来不是处理大量文本的最佳方法，但它是最简单的。 If you've got small files on your hands you can use TStringList without issues. 如果手头上有小文件，则可以使用TStringList而不会出现问题。 Even if you have large files (not huge files) you might implement a version of you algorithm using TStringList for testing purposes, because it's simple and easy to understand. 即使您有大文件（不是大文件），也可以使用TStringList来实现您算法的一个版本以进行测试，因为它很容易理解。

If your files are large, as they probably are since you call them "databases", you need to look into alternative technologies that will enable you to read only as much as you need from the database. 如果文件很大（由于您称它们为“数据库”而可能是如此），则需要研究可替代的技术，这些技术可使您仅从数据库中读取所需的内容。 Look into: 调查：

TFileStream TFileStream
Memory mapped files. 内存映射文件。

Don't look at the old "file" based API's still available in Delphi, they're plain old. 不要看在Delphi中仍然可用的基于“文件”的旧API，它们已经很老了。

I'm not going to go into details on how to access text using those methods because we've recently had two similar questions on SO: 我不会详细介绍如何使用这些方法访问文本，因为我们最近在SO上有两个类似的问题：

How Can I Efficiently Read The FIrst Few Lines of Many Files in Delphi 如何有效地在Delphi中读取许多文件的前几行

and 和

Fast Search to see if a String Exists in Large Files with Delphi 快速搜索以查看是否在Delphi中大文件中存在字符串

Since you have a fixed length that you're working with, you can build an access class based on TList with a TWriter and TReader that will take your records into account. 由于您使用的长度是固定的，因此可以使用TWriter和TReader基于TList构建访问类，该访问类将考虑您的记录。 You'll have none of the overhead of a TStringList (not that it's a bad thing, but if you don't need it, why have it) and you can build in your own access to records into the class. 您将没有TStringList的开销（这不是一件坏事，但是如果您不需要它，为什么要拥有它），则可以将自己的记录访问权构建到类中。 Ultimately it depends on what you are trying to accomplish with the data once you have it loaded into memory. 最终，这取决于将数据加载到内存后要完成的工作。 While TStringlist is easy to use, it isn't as efficient as "rolling your own". 尽管TStringlist易于使用，但效率不如“自己滚动”。

However, efficiency in data manipulation may not be that much of an issue, as you are using text files to hold a database. 但是，由于您正在使用文本文件来保存数据库，因此数据操作的效率可能不是问题。 If you just need to read in and make decisions based on data in the file, the more flexible TList may be overkill. 如果您只需要读入文件并根据文件中的数据做出决定，那么更灵活的TList可能会显得过大。

I recommend to adhere to TStringList if you find it convenient for your problem. 如果您发现问题很方便，我建议坚持使用TStringList 。 Optimization is another thing that should be done later. 优化是另一件事，以后应该做。

As for TStringList the optimization is to declare a descendant class that overrides TStrings.LoadFromStream method - you can make it practically as fast as possible, taking into account the structure of your files. 对于TStringList ，优化是声明一个覆盖TStrings.LoadFromStream方法的后代类-考虑到文件的结构，您实际上可以使其尽可能快。

It is not entirely clear from your question why you need to load the entire file into memory, prior to then going on to create an in-memory data set.... are you conflating the two issues? 从您的问题尚不完全清楚，为什么在继续创建内存数据集之前需要将整个文件加载到内存中？您是否将这两个问题混为一谈？ (ie because you need to create an in-memory data set you think you first need to load the source data entirely into memory? Or is there some initial pre-processing of the source file which is only possible with the entire file loaded in memory (this is unlikely and even if this is the case, it isn't necessary with a navigable stream object such as a TFileStream). （即，因为您需要创建一个内存中的数据集，因此您认为您首先需要将源数据完全加载到内存中？或者对源文件进行一些初始预处理，而这只有将整个文件加载到内存中才可能进行（这是不可能的，即使是这种情况，对于可导航的流对象（例如TFileStream）也没有必要）。

But I think the answer you are looking for is right there in the question.... 但是我认为您正在寻找的答案就在问题中。

If you are loading this file in order to parse it and populate/initialise a further data structure (the data set) for further processing, then using an existing high level data structure is an unnecessary and potentially costly (in terms of time) step. 如果要加载此文件以对其进行解析并填充/初始化其他数据结构（数据集）以进行进一步处理，则使用现有的高级数据结构是不必要的，并且可能会（在时间上）成本很高。

Use the lowest level means of access that provides the capabilities you need. 使用提供所需功能的最低级别的访问方式。

In this case a TFileStream will likely provide the best balance of convenience and ease of use. 在这种情况下，TFileStream将可能在便利性和易用性之间实现最佳平衡。