简体繁体中英

High performance reading - linux/pthreads

原文 2011-12-07 12:28:01 2 2 c/ linux

I have moderately large binary file consisting of independent blocks like this:

header1
data1
header2
data2
header3
data3
...

The number of blocks, the size of each block and the total size of the file vary quite a lot, but typical numbers are ~1000 blocks and average blocksize 100kb. The files are generated by an external application which I have no control over, but I want to read them as fast as possible. In many cases I am only interested in a fraction (ie 10 %) of the blocks, and this is the case I will optimize for.

My current implementation is like this:

Open the file and read all the headers - using information in the header to fseek() to the next header location; retain an open FILE * pointer.
When data is requested use fseek() to locate the data block, read all the data and return it.

This works fine - but I was thinking maybe(?) it was possible to speed things up using eg aio, mmap or other techniques I have only heard of.

Any thoughts?

Joakim

2 answers

The speed difference between mmap and read is not that big (both need to read the data from disk), the biggest advantage of mmap is avoiding the double buffering.

If you are only interested in 10% of the contents, your biggest saving will be to not read the other 90%. This could be done by only reading the headers, and seeking to the next header or to the data block wanted. But it all depends on the fileformat, which the OP did not show in detail.

Most of the time is probably spent in accessing the disk. So perhaps buying an SSD is sensible. (Whatever you do, your application is I/O bound).

Apparently, your file is only about 100Mb. You could get it on disk (kernel file) cache just by reading it, eg with cat yourfile > /dev/null before running your program. For such a small file (on a reasonable machine it fits in RAM), I won't worry that much.

You could pre-process the text file, eg to make a database (for sqlite , or a real RDBMS like PostGreSQL) or just a gdbm indexed file.

If using <stdio.h> you might have a bigger buffer with setbuffer , or call fopen with a "rmt" mode (the m is a GNU Glibc extension to ask mmap -ing it).

You could use mmap with madvise .

You could (perhaps in a separate thread) use the readahead syscall.

But your file seems small enough that you should not bother that much. Are you sure it is really a performance issue? Do you read that file many thousand times per day, or do you have many hundreds of such files?

Pthreads - High memory usage

Equivalent of SetThreadPriority on Linux (pthreads)

Linux scheduling. (pthreads)

C - Linux - Pthreads and Semaphore

tracing pthreads in linux?

PThreads & MultiCore CPU on Linux

How to create a high resolution timer in Linux to measure program performance?

Performance characteristics of pthreads vs ucontext

pthreads reading and writing to the same variable

Segmentation Fault in pthreads, Linux Ubuntu

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Pthreads - High memory usage Equivalent of SetThreadPriority on Linux (pthreads) Linux scheduling. (pthreads) C - Linux - Pthreads and Semaphore tracing pthreads in linux? PThreads & MultiCore CPU on Linux How to create a high resolution timer in Linux to measure program performance? Performance characteristics of pthreads vs ucontext pthreads reading and writing to the same variable Segmentation Fault in pthreads, Linux Ubuntu

Related Tags

High performance reading - linux/pthreads

Question

2 answers

solution1
2 2011-12-07 12:50:17

solution2
1 ACCPTED 2011-12-07 12:37:32

High performance reading - linux/pthreads

Question

2 answers

solution1 2 2011-12-07 12:50:17

solution2 1 ACCPTED 2011-12-07 12:37:32

solution1
2 2011-12-07 12:50:17

solution2
1 ACCPTED 2011-12-07 12:37:32