简体   繁体   English

如何使用缓冲区正确读取已打开的std :: ifstream

[英]How to correctly read from already opened std::ifstream using a buffer

Background 背景

I implement a JSON parser and offer an operator>> function to parse from an std::ifstream . 我实现了一个JSON解析器,并提供了一个operator>>函数来解析std::ifstream To speed up reading, I copy 16 KB into a buffer and let my parser read from the buffer. 为了加快读取速度,我将16 KB复制到一个缓冲区中,让我的解析器从缓冲区中读取。 A small benchmark showed that this is faster than directly working with std::ifstream::get or std::ifstream::read . 一个小的基准测试显示,这比直接使用std::ifstream::getstd::ifstream::read更快。

Current (buggy?) implementation 当前(错误?)实施

When I successfully read a JSON value, I want to "put back" all unneccessary bytes from the buffer to the stream so a subsequent call of operator>> with the same std::istream continues parsing right where the first call ended. 当我成功读取JSON值时,我想将所有不需要的字节从缓冲区“放回”到流中,因此后续调用operator>>并使用相同的std::istream继续解析第一个调用结束的位置。 I currently implement this "putting back" like this: 我目前正在实施这样的“退回”:

is.clear();
is.seekg(start_position + static_cast<std::streamoff>(processed_chars));
is.clear();

Thereby, is is the input file stream, start_position is the initial value of is.tellg() , and processed_chars the number of characters read by the parser. 因此, is是输入文件流, start_positionis.tellg()的初始值,并且processed_chars是解析器读取的字符数。

This works with GCC and Clang with OSX and Linux, but MSVC 2015 and MSVC 2017 fail to bring the input stream into the desired state. 这适用于GCC和Clang与OSX和Linux,但MSVC 2015和MSVC 2017无法将输入流带入所需状态。

My questions 我的问题

  1. Apparently, I am doing something wrong here. 显然,我在这里做错了什么。 The different compilers should not behave so differently. 不同的编译器不应该表现得如此不同。 The clear() calls are already the result of trial&error to make the code run with GCC/Clang. clear()调用已经是试验和错误的结果,使代码与GCC / Clang一起运行。

  2. What would be the correct way to (a) read from an already opened std::ifstream using a cache and (b) be able to resume parsing after the last processed character (instead after the last cached character)? (a)使用缓存从已打开的std::ifstream读取和(b)能够在最后处理的字符之后(而不是在最后一个缓存的字符之后)恢复解析的正确方法是什么?

  3. Is there a better way to quickly read from an already opened std::ifstream ? 有没有更好的方法快速读取已经打开的std::ifstream As I mentioned above, removing the cache makes the parser slower. 如上所述,删除缓存会使解析器变慢。

(Apologies for the naive question and the horrible implementation! I did not find an answer on this that coped with an already open std::ifstream or that could "put back" already cached characters.) (对于天真的问题和可怕的实现道歉!我没有找到答案,处理已经打开的std::ifstream或者可以“放回”已经缓存的字符。)

If you open a file stream in text mode, this is not valid: 如果以文本模式打开文件流,则无效:

is.seekg(start_position + static_cast<std::streamoff>(processed_chars));

...because according to the standard, seekg / tellg are not directly related to the number of processed chars (this is actually OS-dependent). ...因为根据标准, seekg / tellg与处理的字符数量没有直接关系(这实际上与操作系统有关)。

Here are possible options for you (cannot give more details with what you gave in your question): 以下是可能的选项(无法提供您在问题中提供的更多详细信息):

  • use putback to put back the character you read but did not use; 使用putback来放回你读过但未使用的字符;
  • use tellg to get the correct position. 使用tellg来获得正确的位置。

Something like this maybe: 这样的事情可能是:

// is is the istream
auto tg = is.tellg();
is.read(buffer, BUFFER_SIZE);
 // process...
is.seekg(tg); // valid
is.ignore(processed_chars);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM