简体   繁体   中英

Read a particular word from a file in C++ quickly

My Manager has told me to create a test for which I am required to test whether a particular word exists in a File or not. The problem is that the File maybe very big & if the test runs for a long time then it will be failed during regression testing. So I want to know if there is any convenience API in standard C++ for my purpose which would quickly tell me whether the word exists or not. I dont want to know the location of the word. The word is somewhere near the beginning of the File but its exact location is not known. Any help in this regard? Thank You.

If the file has no particular structure, other than to contain words (in any order), the only solution is linear search, which means reading the entire file. If you know that the word can only be near the beginning, then you only have to search to the furthest point the word can be found.

If that's not fast enough, you either have to structure the file somehow (sorted, etc.), or you have to speed up the reading proceudre itself (eg use mmap ).

mmap the file and then strnstr it would probably be the best. Unless you know something clever about the structure of the file that would restrict the area you have to search in.

extern "C" {
#include <sys/mman.h>
#include <fcntl.h>
}

#include <cstring>
#include <cerrno>
#include <iostream>

int main(int argc, char* argv[]) {

    // I don't check the arguments here, you should probably do that

    // String to search for
    char* search_string = argv[2];

    // Open the file so we can map it
    int fd = open(argv[1], O_RDONLY);
    if (fd < 0) {
        std::cout << "Open failed: " << strerror(errno) << std::endl;
        return 1;
    }

    // Find the length of the file so we know how much to map
    off_t len = lseek(fd, 0, SEEK_END);
    if (len == -1) {
        std::cout << "Seek failed: " << strerror(errno) << std::endl;
        return 1;
    }

    // map the file into memory
    char* file_contents = (char*)mmap(
        NULL, len, PROT_READ, MAP_FILE | MAP_PRIVATE, fd, 0);
    if (file_contents == MAP_FAILED) {
        std::cout << "map failed: " << strerror(errno) << std::endl;
        return 1;
    }

    // We don't need the file open any more, we do need to unmap it later though
    close(fd);

    // Search for the string in the file here
    char* found = strnstr(file_contents, search_string, len);
    if (found == NULL)
        std::cout << "String not found" << std::endl;
    else
        std::cout << "String found @ " << found - file_contents << std::endl;

    munmap(file_contents, len);
}

Memory Mapped file access allows you to access parts of the file directly without loading it into memory.

Qt provides memory mapping as far as I know, boost, too, the C++ standard library doesn't.

You could also use the native API of the OS. mmap for UNIX, CreateFileMapping for Windows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM