简体   繁体   中英

C++ Iterate over logfile content by timestamp

I have a log file with log lines. Each line is composed of a timestamp and a message.

timestamp1 blablabla  
timestamp2 foo  
timestamp3 bar  
etc...

My class LogFile has a map as member to match each timestamp to an ifstream position.
Now I would like to create a custom iterator over these timestamps.

example :

LogFile myFile("file.log");  
for (LogFile::iterator it = myFile.begin(); it != myFile.end(); it++)  
    std::cout << it->message << std::endl;

output :

blablabla  
foo  
bar 

I also want the iterator to be able to get decremented.

Now I don't really know how to implement it in an efficient way.

The most simple would be to open-seek-read-close the file with each iterator increment. But is that efficient ? I read that open/seek/close was quite expensive.

Maybe a better solution is to have a LogFile::open() method to open the file, keeping it open to do all the incrementation we want, and finally close the file with a LogFile::close() method.

Do you have some tips over this ? I'm sure this not the first time someone has to deal this kind of problem.

EDIT A few more details :

My class LogFile has a member of type std::map<Time, std::streampos> in order to store the links between timestamps and stream position.
I need to increment and decrement the iterator. Hence I think a map would be more appropriate, since I will use the std::map::find(Time) a lot (complexity o(log(n)), instead of std::vector::find(Time) with ao(n) complexity).

The log file is very big (~20Mo) and my application has to run on a limited resources embeded system. So I can't store the whole file in the ram, I have to bufferize and take only the part I need on a given time.

So yes, I think I'm gonna deal with the "open() and close() once" method.

In the case of increment, I dont have to std::ifstream::seekg() a lot. But in the case of decrement, I dont see another way to seekg for each timestamp. Is that really the best way?

I would definitely avoid opening and closing the file unless necessary. Open the file in the LogFile constructor, close it in the destructor.

I don't know your exact use case, but using map to store this data seems redundant in my opinion. I don't think you will use exact timestamp to query this map, will you? And i presume logfile entries are already sorted by time in the source logfile, therefore when read into a vector, they will be sorted too. My suggestion is this, use a vector which contains pairs of timestamps and ifstream positions.

I definitely would recommend keeping the file open as long as the LogFile instance that uses the file lives. That way, when someone uses iterators to access the entries sequentially, you will save significant time beceause less seeking will be done.

I also would recommend reading all entries from the file to the memory if the logfile is not too big, which would save time otherwise lost by all the i/o operations.

可能值得尝试使用mmap映射文件,然后使用madvise手动制作可能很快将被访问的页面保留在内存中。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM