简体   繁体   English

用C ++快速读取文件中的特定单词

[英]Read a particular word from a file in C++ quickly

My Manager has told me to create a test for which I am required to test whether a particular word exists in a File or not. 我的经理告诉我要创建一个测试,我需要测试一个特定的单词是否存在于文件中。 The problem is that the File maybe very big & if the test runs for a long time then it will be failed during regression testing. 问题是文件可能非常大并且如果测试运行了很长时间,那么在回归测试期间它将失败。 So I want to know if there is any convenience API in standard C++ for my purpose which would quickly tell me whether the word exists or not. 所以我想知道标准C ++中是否有任何便利API用于我的目的,它会很快告诉我这个词是否存在。 I dont want to know the location of the word. 我不想知道这个词的位置。 The word is somewhere near the beginning of the File but its exact location is not known. 这个词是在文件开头附近的某个地方,但它的确切位置是未知的。 Any help in this regard? 在这方面有什么帮助吗? Thank You. 谢谢。

If the file has no particular structure, other than to contain words (in any order), the only solution is linear search, which means reading the entire file. 如果文件没有特定的结构,除了包含单词(按任何顺序),唯一的解决方案是线性搜索,这意味着读取整个文件。 If you know that the word can only be near the beginning, then you only have to search to the furthest point the word can be found. 如果您知道该单词只能在开头附近,那么您只需要搜索到可以找到该单词的最远点。

If that's not fast enough, you either have to structure the file somehow (sorted, etc.), or you have to speed up the reading proceudre itself (eg use mmap ). 如果这还不够快,你要么必须以某种方式构造文件(排序等),要么你必须加快阅读程序本身(例如使用mmap )。

mmap the file and then strnstr it would probably be the best. mmap文件,然后strnstr它可能是最好的。 Unless you know something clever about the structure of the file that would restrict the area you have to search in. 除非你对文件的结构有所了解,否则会限制你必须搜索的区域。

extern "C" {
#include <sys/mman.h>
#include <fcntl.h>
}

#include <cstring>
#include <cerrno>
#include <iostream>

int main(int argc, char* argv[]) {

    // I don't check the arguments here, you should probably do that

    // String to search for
    char* search_string = argv[2];

    // Open the file so we can map it
    int fd = open(argv[1], O_RDONLY);
    if (fd < 0) {
        std::cout << "Open failed: " << strerror(errno) << std::endl;
        return 1;
    }

    // Find the length of the file so we know how much to map
    off_t len = lseek(fd, 0, SEEK_END);
    if (len == -1) {
        std::cout << "Seek failed: " << strerror(errno) << std::endl;
        return 1;
    }

    // map the file into memory
    char* file_contents = (char*)mmap(
        NULL, len, PROT_READ, MAP_FILE | MAP_PRIVATE, fd, 0);
    if (file_contents == MAP_FAILED) {
        std::cout << "map failed: " << strerror(errno) << std::endl;
        return 1;
    }

    // We don't need the file open any more, we do need to unmap it later though
    close(fd);

    // Search for the string in the file here
    char* found = strnstr(file_contents, search_string, len);
    if (found == NULL)
        std::cout << "String not found" << std::endl;
    else
        std::cout << "String found @ " << found - file_contents << std::endl;

    munmap(file_contents, len);
}

Memory Mapped file access allows you to access parts of the file directly without loading it into memory. 内存映射文件访问允许您直接访问文件的某些部分而无需将其加载到内存中。

Qt provides memory mapping as far as I know, boost, too, the C++ standard library doesn't. 据我所知,Qt提供了内存映射,而且,C ++标准库也没有。

You could also use the native API of the OS. 您还可以使用操作系统的本机API。 mmap for UNIX, CreateFileMapping for Windows. 适用于UNIX的mmap ,适用于Windows的CreateFileMapping

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM