简体   繁体   English

不明白如何使用 or,and,not 进行搜索查询

[英]Don't understand how to do a search query with with or,and,not

I am reposting this question as I fixed it now to make it easier to understand exactly what I need to do.我正在重新发布这个问题,因为我现在修复了它,以便更容易准确地理解我需要做什么。

I have a function declared:我有一个 function 声明:

set<string> findQueryMatches(map<string, set<string>>& index, string sentence);

The map is an already filled map with keys and values, while the string will be a sentence that will look something like this "fish +red". map 是一个已经填充了键和值的 map,而字符串将是一个类似于“fish +red”的句子。 The keys and values for the map come from a file that I read in previous functions with an example being: map 的键和值来自我在之前的函数中读取的文件,示例如下:

www.shoppinglist.com
EGGS! milk, fish,      @  bread cheese
www.rainbow.org
red ~green~ orange yellow blue indigo violet
www.dr.seuss.net
One Fish Two Fish Red fish Blue fish !!!
www.bigbadwolf.com
I'm not trying to eat you

The website names are the values while the separate words (also have a clean token function that cleans punctuation, so egg. becomes egg and all the weird symbols get deleted) are keys for the map, So if you search for fish.网站名称是值,而单独的单词(还有一个干净的标记 function 可以清除标点符号,所以 egg. 变成 egg 并且所有奇怪的符号都被删除)是 map 的键,所以如果你搜索 fish。 you get the list of values for that keyword.您将获得该关键字的值列表。

In the above function SearchQueryMatches, I input a string sentence which has to handle the terms as a compound query, where individual terms are synthesized into one combined result.在上面的 function SearchQueryMatches 中,我输入了一个字符串语句,它必须将术语作为复合查询处理,其中单个术语被合成为一个组合结果。

The entered string will possess whitespaces, +, and -.输入的字符串将包含空格、+ 和 -。 a minus means a result must match one term without matching the other, a plus means the results must match both items, while a white space without any preface means they are unioned, so they match one or the other.减号表示结果必须匹配一个术语而不匹配另一个,加号表示结果必须匹配两个项目,而没有任何前言的空格表示它们是联合的,因此它们匹配一个或另一个。

For example,例如,

"tasty -mushrooms simple +cheap" translates into "tasty WITHOUT mushrooms OR simple AND cheap" “tasty -mushrooms simple +cheap”翻译成“没有蘑菇的美味或简单又便宜”

I started by doing stringstream that separates the sentence and then did if statements like我从做分隔句子的字符串流开始,然后做了 if 语句,如

if (word[0] == '+').....

After I separate these words and know what to do with them I will also have to call my helper clean function again to clean them up from the + and - before I begin the search.在我将这些词分开并知道如何处理它们之后,我还必须再次调用我的助手 clean function 以在开始搜索之前从 + 和 - 中清理它们。

But now, I am struggling with what I would need to do next.但是现在,我正在为接下来需要做的事情而苦苦挣扎。 I heard of set_intersection functions from the C++ set library but I never used them and honestly have absolutely 0 idea on how to use it.我听说过 C++ 集合库中的 set_intersection 函数,但我从来没有使用过它们,老实说,我对如何使用它一无所知。

The return will be a set of the websites that satisfy the search query.返回将是一组满足搜索查询的网站。

What would be a good way to program the inside of the if statements with what they would be doing each time there is a +, -, or no preface?什么是对 if 语句内部进行编程的好方法,每次有 +、- 或没有前言时它们会做什么? I am completely lost on this.我完全迷失了这一点。

Certainly you can solve the problem using set_intersection, set_difference and set_union.当然,您可以使用 set_intersection、set_difference 和 set_union 来解决问题。 And here is a example about how to use those functions in your problem:这是一个关于如何在您的问题中使用这些功能的示例:

std::set<std::string> findQueryMatches(std::map<std::string, 
    std::set<std::string>>& index, std::string sentence) {
    std::set<std::string> url_set;
    std::stringstream ss;
    std::string str;
    ss.str(sentence);
    while(ss >> str) {
        if(str[0] == '-') { //difference
            std::set<std::string> difference_data;
            std::set_difference(url_set.begin(), url_set.end(), index[str.substr(1, str.size() - 1)].begin(), index[str.substr(1, str.size() - 1)].end(), 
                 std::inserter(difference_data, difference_data.begin()));
            url_set = difference_data;
            std::cout<<str<<": ";
            for(auto const& x: url_set) {
                std::cout<<x<<' ';
            }
            std::cout<<'\n';
        } else if(str[0] == '+') { //intersection
            std::set<std::string> intersection_data;
            std::set_intersection(index[str.substr(1, str.size() - 1)].begin(), index[str.substr(1, str.size() - 1)].end(), url_set.begin(), url_set.end(),
                 std::inserter(intersection_data, intersection_data.begin()));
            url_set = intersection_data;
            std::cout<<str<<": ";
            for(auto const& x: url_set) {
                std::cout<<x<<' ';
            }
            std::cout<<'\n';
        } else { //union
            std::set<std::string> union_data;
            std::set_union(index[str].begin(), index[str].end(), url_set.begin(), url_set.end(),
                 std::inserter(union_data, union_data.begin()));
            url_set = union_data;
            std::cout<<str<<": ";
            for(auto const& x: url_set) {
                std::cout<<x<<' ';
            }
            std::cout<<'\n';
        }
    }
    return url_set;
}

Keep in mind that you have to provide an output operator to set_intersection, set_difference and set_union (look at this: https://en.cppreference.com/w/cpp/algorithm/set_difference or this how to find the intersection of two std::set in C++? ).请记住,您必须为 set_intersection、set_difference 和 set_union 提供 output 运算符(查看此: https://en.cppreference.com/w/cpp/algorithm/set_difference如何找到两个 std_difference 的交集: :设置在 C++? )。 Those output operators can be defined of this way:那些 output 运算符可以这样定义:

template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator std::set_union ( InputIterator1 first1, InputIterator1 last1,
                                  InputIterator2 first2, InputIterator2 last2,
                                  OutputIterator result );
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator std::set_intersection ( InputIterator1 first1, InputIterator1 last1,
                                  InputIterator2 first2, InputIterator2 last2,
                                  OutputIterator result );      
template <class InputIterator1, class InputIterator2, class OutputIterator>
OutputIterator std::set_difference ( InputIterator1 first1, InputIterator1 last1,
                                  InputIterator2 first2, InputIterator2 last2,
                                  OutputIterator result );  

For example given this data:例如给定这个数据:

www.shoppinglist.com
EGGS! milk, fish      @  bread cheese
www.rainbow.org
red ~green~ orange yellow blue indigo violet
www.dr.seuss.net
One Fish Two Fish Red fish Blue fish !!!
www.bigbadwolf.com
I'm not trying to eat you milk,

And this sentence:还有这句话:

milk, +milk, Blue +fish -Fish

The result is:结果是:

milk,: www.bigbadwolf.com www.shoppinglist.com
+milk,: www.bigbadwolf.com www.shoppinglist.com
Blue: www.bigbadwolf.com www.dr.seuss.net www.shoppinglist.com
+fish: www.dr.seuss.net www.shoppinglist.com
-Fish: www.shoppinglist.com

Cheers!干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM