简体   繁体   中英

How to correctly read from already opened std::ifstream using a buffer

Background

I implement a JSON parser and offer an operator>> function to parse from an std::ifstream . To speed up reading, I copy 16 KB into a buffer and let my parser read from the buffer. A small benchmark showed that this is faster than directly working with std::ifstream::get or std::ifstream::read .

Current (buggy?) implementation

When I successfully read a JSON value, I want to "put back" all unneccessary bytes from the buffer to the stream so a subsequent call of operator>> with the same std::istream continues parsing right where the first call ended. I currently implement this "putting back" like this:

is.clear();
is.seekg(start_position + static_cast<std::streamoff>(processed_chars));
is.clear();

Thereby, is is the input file stream, start_position is the initial value of is.tellg() , and processed_chars the number of characters read by the parser.

This works with GCC and Clang with OSX and Linux, but MSVC 2015 and MSVC 2017 fail to bring the input stream into the desired state.

My questions

  1. Apparently, I am doing something wrong here. The different compilers should not behave so differently. The clear() calls are already the result of trial&error to make the code run with GCC/Clang.

  2. What would be the correct way to (a) read from an already opened std::ifstream using a cache and (b) be able to resume parsing after the last processed character (instead after the last cached character)?

  3. Is there a better way to quickly read from an already opened std::ifstream ? As I mentioned above, removing the cache makes the parser slower.

(Apologies for the naive question and the horrible implementation! I did not find an answer on this that coped with an already open std::ifstream or that could "put back" already cached characters.)

If you open a file stream in text mode, this is not valid:

is.seekg(start_position + static_cast<std::streamoff>(processed_chars));

...because according to the standard, seekg / tellg are not directly related to the number of processed chars (this is actually OS-dependent).

Here are possible options for you (cannot give more details with what you gave in your question):

  • use putback to put back the character you read but did not use;
  • use tellg to get the correct position.

Something like this maybe:

// is is the istream
auto tg = is.tellg();
is.read(buffer, BUFFER_SIZE);
 // process...
is.seekg(tg); // valid
is.ignore(processed_chars);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM