[英]Reading txt files and parsing them fast using c++ and boost memory mapped files
Important edit : The problem is not what i stated, after manually profiling i understood that when i replace the line : "file >> x >> y >> z;" 重要编辑:问题不是我说的,手动分析后我明白当我更换线时:“file >> x >> y >> z;” with the line "file.readline(buffer, size);"
使用“file.readline(buffer,size);”行
it takes only 0.4 seconds, so the question is entirely different, how to parse the floats from the line, file>>x>>y>>z; 它只需要0.4秒,所以问题完全不同,如何从行中解析浮点数,文件>> x >> y >> z;
(i don't know if i should delete the question or not, because the original question is not relevant) (我不知道我是否应该删除这个问题,因为原来的问题不相关)
=== OLD === After vast research on the internet and stack overflow, i understood that the best way to read large files with c++ is by using memory mapped files. === OLD ===经过对互联网和堆栈溢出的大量研究后,我明白用c ++读取大文件的最佳方法是使用内存映射文件。
I have a txt file, 15MB that on each line has 3 float separated by spaces. 我有一个txt文件,每行15MB,有3个以空格分隔的浮点数。
I had this code : 我有这个代码:
ifstream file(path)
float x,y,z;
while(!file.eof())
file >> x >> y >> z;
Which could read this file in 9.5 seconds. 哪个可以在9.5秒内读取此文件。
In order to read the file faster using stackoverflow users i came up with this code, that if i understand it correctly uses memory mapped files and should read it faster Stream types in C++, how to read from IstringStream? 为了使用stackoverflow用户更快地读取文件我想出了这个代码,如果我理解它正确使用内存映射文件并且应该更快地读取C ++中的Stream类型,如何从IstringStream中读取?
#include <iostream>
#include <boost/iostreams/stream.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
namespace io = boost::iostreams;
int main()
{
io::stream<io::mapped_file_source> str("test.txt");
// you can read from str like from any stream, str >> x >> y >> z
for(float x,y,z; str >> x >> y >> z; )
std::cout << "Reading from file: " << x << " " << y << " " << z << '\n';
}
Unfortunately the speed remains the same, still 9.5 seconds. 不幸的是速度保持不变,仍然是9.5秒。
Any suggestions ? 有什么建议么 ? Thanks
谢谢
Streams are slow. 流很慢。 Part is because the constraints that apply to them are onerous, part is because implementations have a tendency of being poorly optimized.
部分是因为适用于它们的约束是繁重的,部分原因是实现具有不良优化的趋势。
Try using Boost.Spirit parsers. 尝试使用Boost.Spirit解析器。 While the syntax takes a bit of getting used to and compilation can sometimes be very slow, the runtime performance of Spirit is very high.
虽然语法需要一些习惯,编译有时可能非常慢,但Spirit的运行时性能非常高。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.