快速有效地从文件映射中搜索大型c ++字符串

Question

A quick preface, I'm very comfortable in .Net, but have limited experience in c++, so I'm not sure if I'm doing this well. 一个快速的序言，我在.Net非常舒服，但在c ++方面经验有限，所以我不确定我是否做得很好。 I'm using a file mapping object to retrieve what could potentially be a relatively long string, comprised of up to several thousand file names. 我正在使用文件映射对象来检索可能是一个相对较长的字符串，由多达数千个文件名组成。 This function can be called from an ImageOverlayHandler attached to Windows explorer, so both speed and memory consumption are of concern. 可以从连接到Windows资源管理器的ImageOverlayHandler调用此函数，因此速度和内存消耗都很重要。 This code could potentially be called by hundreds of overlay requests at once (but only in edge cases). 这个代码可能会被一次数百个覆盖请求调用（但仅限于边缘情况）。 In the code below, is this an efficient way to do this? 在下面的代码中，这是一种有效的方法吗？ Using this approach, if I've understood my code correctly, I will not make a local copy of the mapped file, and the boost::contains call should be pretty quick. 使用这种方法，如果我已正确理解我的代码，我将不会制作映射文件的本地副本，并且boost :: contains调用应该非常快。 Any thoughts on either how I can improve it, or how I should do it differently? 关于我如何改进它，或者我应该如何做到这一点的任何想法？ in a previous iteration I was using vectors, etc, but that seemed like it would use a lot more memory. 在之前的迭代中我使用了矢量等，但似乎它会使用更多的内存。

HRESULT GetFolders()
{
    HANDLE hMapFile;
    LPCWSTR pBuf;
    hMapFile = OpenFileMapping(
               FILE_MAP_ALL_ACCESS,   // read/write access
               FALSE,                 // do not inherit the name
               szFolderName);               // name of mapping object

    if (hMapFile == NULL)
    {
       return NULL;
    }
    pBuf = (LPCWSTR) MapViewOfFile(hMapFile, // handle to map object
           FILE_MAP_ALL_ACCESS,  // read/write permission
           0,
           0,
           BUF_SIZE);

    if (pBuf == NULL)
    {
       CloseHandle(hMapFile);

       return NULL;
    }

    wstring resOut = (wstring)pBuf;

    bool val = boost::contains(resOut, L"C:\\FOLDER1");
    UnmapViewOfFile(pBuf);
    CloseHandle(hMapFile);
}

Answer 1

Since you describe this data as a list of filenames, consider putting each filename in a std::set or std::unordered_set (C++11) or boost::unordered_set (C++03 and above). 由于您将此数据描述为文件名列表，因此请考虑将每个文件名放在std::set或std::unordered_set （C ++ 11）或boost::unordered_set （C ++ 03及更高版本）中。

Your approach has O(n) efficiency. 您的方法具有O（n）效率。

std::set would have O(log n) efficiency. std::set将具有O（log n）效率。

unordered_set would have O(1) efficiency. unordered_set将具有O（1）效率。

Note: the creation of any of these containers should be done once, in advance. 注意：任何这些容器的创建都应该提前完成一次。 Not for every call of GetFolders() 不是每次调用GetFolders()

Answer 2

If you're creating the list of filenames once or only occasionally and testing it repeatedly, you could benefit from a binary search. 如果您一次或仅偶尔创建文件名列表并重复测试，您可以从二进制搜索中受益。 A binary search requires two things: the input list must be sorted, and you must be able to index into any element of the list efficiently. 二进制搜索需要两件事：输入列表必须排序，并且您必须能够有效地索引到列表的任何元素。

You can fulfill the first requirement by sorting the list in C# before you write it to the file. 在将文件写入文件之前，可以通过在C＃中对列表进行排序来满足第一个要求。 You can fulfill the second requirement by creating a list of integers that represent the offset into the string for the start of each filename. 您可以通过创建一个整数列表来满足第二个要求，这些整数表示每个文件名开头的字符串偏移量。 Since each integer is the same size it can be indexed, and it's one simple indirection to get to the actual filename. 由于每个整数的大小都相同，因此可以将其编入索引，并且只需一个简单的间接即可获得实际的文件名。

The std::equal_range algorithm will do a binary search. std::equal_range算法将进行二进制搜索。 If the returned iterators are equal, the item wasn't found, otherwise the first iterator points to it. 如果返回的迭代器相等，则找不到该项，否则第一个迭代器指向它。

You'll need a custom comparator function to pass to equal_range to do the indirection on the string. 您需要一个自定义比较器函数来传递给equal_range以对字符串执行间接操作。

快速有效地从文件映射中搜索大型c ++字符串

问题描述

2 个解决方案

解决方案1
1 2012-01-24 15:56:20

解决方案2
1 已采纳 2012-01-24 17:04:28

快速有效地从文件映射中搜索大型c ++字符串

问题描述

2 个解决方案

解决方案1 1 2012-01-24 15:56:20

解决方案2 1 已采纳 2012-01-24 17:04:28

解决方案1
1 2012-01-24 15:56:20

解决方案2
1 已采纳 2012-01-24 17:04:28