简体   繁体   English

从文本文件中读取单词

[英]Read word from a text file

This is the requirement I must follow: 这是我必须遵循的要求:

There will be a C style or C++ style string to hold the word. 将有一个C样式或C ++样式字符串来保存单词。 An int to hold a count of each word. 一个int来保存每个单词的计数。 A struct or class to hold both of these. 用于保存这两者的结构或类。 This struct/class will be inserted into an STL list. 此结构/类将插入到STL列表中。 You will also need a C style or C++ style string to hold the line of text you read from the files. 您还需要一个C样式或C ++样式字符串来保存从文件中读取的文本行。 You will parse this line into words as per the word definition in the assign spec. 您将根据assign spec中的单词定义将此行解析为单词。

The first part seems alright, but in the second one, I still don't get the point about reading a line then parsing it into a word. 第一部分似乎没问题,但是在第二部分中,我仍然没有明白有关读取一条线然后将其解析为一个单词的观点。 Is it more efficient than reading straight a word from text file by using? 它是否比使用文本文件直接读取单词更有效?

The efficiency depends on the definition of the word (which comes from the assignment spec.): if you need to go through the linem more than once to determine where a word begins/ends (ie what belongs to a word), it is more efficient to keep the line in memory, then perform the read from disk multiple times (although the performance impact can be lessened by I/O cache). 效率取决于单词的定义(来自赋值规范):如果你需要不止一次地通过linem来确定单词的开始/结束位置(即什么属于单词),它更多有效地将行保留在内存中,然后多次执行从磁盘读取(尽管I / O缓存可以减少性能影响)。

Even if there is no performance gain, this being a homework assignment, I think you are asked to do this to learn 1) how to read strings (lines) from a file; 即使没有性能提升,这是一个家庭作业,我想你要这样做来学习1)如何从文件中读取字符串(行); 2) how to parse a string in memory. 2)如何解析内存中的字符串。 To achieve the two goals at once, you have this requirement 为了实现这两个目标,您有这个要求

使用fstream从文件中读取每行,然后通过对空间进行划分并直到loop的行尾将其解析为单词。

Depending on your use case, it can be useful to read files line by line. 根据您的使用情况,逐行读取文件会很有用。

Reading the whole file in memory first and parsing it afterward do not minimize memory usage. 首先在内存中读取整个文件并在之后解析它不会最小化内存使用量。 The memory required for your program to run would be at least the size of the file. 程序运行所需的内存至少是文件的大小。 If the input file is big compared to the memory available to your program, you won't be able to allocate enough memory to store the entire file (try to allocate a string of 20GB to see what happens). 如果输入文件与程序可用的内存相比较大,则无法分配足够的内存来存储整个文件(尝试分配20GB的字符串以查看发生的情况)。

On the other hand, if you read line by line, only the size of one line is needed in memory at a time: you can release memory allocated for previous lines immediately. 另一方面,如果逐行读取,则一次只需要内存中一行的大小:您可以立即释放为前一行分配的内存。

So parsing line by line is useful if: 因此,如果符合以下条件,逐行解析很有用:

  • The input files are too big to fit entirely in memory 输入文件太大,无法完全适合内存
  • The size of each line is small enough (reading line by line does not help if the file is made of one large line) 每行的大小足够小(如果文件由一个大行组成,则逐行读取没有帮助)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM