简体   繁体   English

使用带有std :: set的自定义比较器

[英]using a custom comparator with std::set

I'm trying to create a list of words read from a file arranged by their length. 我正在尝试创建从文件中读取的单词的列表,这些单词按其长度排列。 For that, I'm trying to use std::set with a custom comparator. 为此,我试图将std :: set与自定义比较器一起使用。

class Longer {
 public:
  bool operator() (const string& a, const string& b)
    { return a.size() > b.size();}
};

set<string, Longer> make_dictionary (const string& ifile){
  // produces a map of words in 'ifile' sorted by their length

  ifstream ifs {ifile};
  if (!ifs) throw runtime_error ("couldn't open file for reading");

  string word;
  set<string, Longer> words;

   while (ifs >> word){
     strip(word);
     tolower(word);
     words.insert(word);
 }

 remove_plurals(words);

 if (ifs.eof()){       
   return words;
  }
  else
    throw runtime_error ("input failed");
}

From this, I expect a list of all words in a file arranged by their length. 由此,我期望文件中所有单词的列表按其长度排列。 Instead, I get a very short list, with exactly one word for each length occurring in the input: 取而代之的是,我得到了一个很短的列表,在输入中每个长度恰好有一个单词:

polynomially-decidable
complexity-theoretic
linearly-decidable
lexicographically
alternating-time
finite-variable
newenvironment
documentclass
binoppenalty
investigate
usepackage
corollary
latexsym
article
remark
logic
12pt
box
on
a

Any idea of what's going on here? 对这里发生的事情有任何想法吗?

With your comparator, equal-length words are equivalent, and you can't have duplicate equivalent entries in a set. 在比较器中,等长字是等价的,并且集合中不能有重复的等价条目。

To maintain multiple words, you should modify your comparator so that it also performs, say, a lexicographic comparison if the lengths are the same. 为了保持多个单词,您应该修改比较器,以便在长度相同的情况下也可以执行字典比较。

Your comparator only compares by length, that means that equally-sized but different strings are treated as being equivalent by std::set . 比较器仅按长度进行比较,这意味着std::set大小相等但不同的字符串视为等效。 ( std::set treats them equally if neither a < b nor b < a are true, with < being your custom comparator function.) (如果a < bb < a都不为真,则std::set会同等对待它们,其中<是您的自定义比较器函数。)

That means your comparator should also consider the string contents to avoid this situation. 这意味着您的比较器还应考虑字符串内容,以避免这种情况。 The keyword here is lexicographic comparison, meaning you take multiple comparison criteria in account. 这里的关键字是字典比较,这意味着您考虑了多个比较标准。 The first criterion would be your string length, and the second would be the string itself. 第一个条件是您的字符串长度,第二个条件是字符串本身。 An easy way to write lexicographic comparison is to make use of std::tuple which provides a comparison operator performing lexicographic comparison on the components by overloading the operator< . 编写字典比较的一种简单方法是使用std::tuple ,它提供了一个比较运算符,该operator<通过重载operator<在组件上执行字典比较。

To make your "reverse" ordering of length, which you wrote with operator> , compatible with the usually used operator< , simply take the negative size of the strings, ie first rewrite a.size() > b.size() as -a.size() < -b.size() , and then compose it with the string itself into tuples, finally compare the tuples with < : 要使您使用operator>编写的“反向”长度排序与通常使用的operator<兼容,只需将字符串的负数取为负,即首先将a.size() > b.size()重写为-a.size() < -b.size() ,然后将其与字符串本身组合为元组,最后将元组与<进行比较:

class Longer {
public:
    bool operator() (const string& a, const string& b)
    {
        return std::make_tuple(-a.size(),  a )
             < std::make_tuple(-b.size(),  b );
        //                     ^^^^^^^^^  ^^^
        //                       first    second
        //                   criterion    criterion
    }
};

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM