简体   繁体   中英

Why is constructing std::string from mmaped file so slow?

I'm trying to process large files, right now I have the file loaded to memory and the following parsing function:

In the first case, I'm constructing string from parts of the file (reading headers of csv), the first function:

void csv_parse_items_file(const char* file, size_t fsize,
    //void(*deal)(const string&, const size_t&, const int&), 
size_t arrstart_counter = 0) {
size_t idx = 0;
int line = 0;
size_t last_idx = 0;
int counter = 0;

cout<<"items_header before loop, thread_id="+std::to_string(thread_index())<<endl;
map<string, int> headers;
{
    int counter = 0;
    while (file[idx] && file[idx] != '\n') {
        if (file[idx] == '\t' || file[idx] == '\n') {
            string key(file, last_idx, idx - last_idx);
            headers[key] = counter++;
            last_idx = idx + 1;
        }

        ++idx;
    }
}
cout<<"items_header after loop, thread_id="+std::to_string(thread_index())<<endl;
... then the processing continues in a loop

The header file is less then 1000 chars compared with the size of the files (86431022 and 237179072). but still excuting this line string key(file, last_idx, idx - last_idx); takes very long time;

    $g++ -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/binrocessing_items:1306054

compiled with g++ -pthread -c -g -std=c++11

files mmaped with mmap(NULL, size_, PROT_READ, MAP_PRIVATE, fd_, 0);

string key(file, last_idx, idx - last_idx); is equivalent to string key(std::string(file), last_idx, idx - last_idx); . You are copying the whole file every time through the loop, only to then extract a small piece of it.

Make it string key(file + last_idx, idx - last_idx);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM