简体   繁体   English

不区分大小写的 std::set 字符串

[英]Case insensitive std::set of strings

How do you have a case insensitive insertion Or search of a string in std::set?如何在 std::set 中进行不区分大小写的插入或字符串搜索?

For example-例如-

std::set<std::string> s;
s.insert("Hello");
s.insert("HELLO"); //not allowed, string already exists.

You need to define a custom comparator:您需要定义一个自定义比较器:

struct InsensitiveCompare { 
    bool operator() (const std::string& a, const std::string& b) const {
        return strcasecmp(a.c_str(), b.c_str()) < 0;
    }
};

std::set<std::string, InsensitiveCompare> s;

You may try stricmp or strcoll if strcasecmp is not available.您可以尝试stricmpstrcoll如果strcasecmp不可用。

std::set offers the possibility of providing your own comparer (as do most std containers). std::set 提供了提供您自己的比较器的可能性(大多数 std 容器也是如此)。 You can then perform any type of comparison you like.然后,您可以执行您喜欢的任何类型的比较。 Full example is available here完整示例可在此处获得

This is a generic solution that also works with other string types than std::string (tested with std::wstring , std::string_view , char const* ).这是一个通用解决方案,也适用于std::string以外的其他字符串类型(使用std::wstringstd::string_viewchar const* )。 Basically anything that defines a range of characters should work.基本上任何定义一系列字符的东西都应该工作。

The key point here is to use boost::as_literal that allows us to treat null-terminated character arrays, character pointers and ranges uniformly in the comparator.这里的关键是使用boost::as_literal ,它允许我们在比较器中统一处理以空字符结尾的字符数组、字符指针和范围。

Generic code ("iset.h"):通用代码(“iset.h”):

#pragma once
#include <set>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <boost/range/as_literal.hpp>

// Case-insensitive generic string comparator.
struct range_iless
{
    template< typename InputRange1, typename InputRange2 >
    bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const 
    {
        // include the standard begin() and end() aswell as any custom overloads for ADL
        using std::begin; using std::end;  

        // Treat null-terminated character arrays, character pointers and ranges uniformly.
        // This just creates cheap iterator ranges (it doesn't copy container arguments)!
        auto ir1 = boost::as_literal( r1 );
        auto ir2 = boost::as_literal( r2 );

        // Compare case-insensitively.
        return std::lexicographical_compare( 
            begin( ir1 ), end( ir1 ), 
            begin( ir2 ), end( ir2 ), 
            boost::is_iless{} );
    }
};

// Case-insensitive set for any Key that consists of a range of characters.
template< class Key, class Allocator = std::allocator<Key> >
using iset = std::set< Key, range_iless, Allocator >;

Usage example ("main.cpp"):用法示例(“main.cpp”):

#include "iset.h"  // above header file
#include <iostream>
#include <string>
#include <string_view>

// Output range to stream.
template< typename InputRange, typename Stream, typename CharT >
void write_to( Stream& s, InputRange const& r, CharT const* sep )
{
    for( auto const& elem : r )
        s << elem << sep;
    s << std::endl;
}

int main()
{
    iset< std::string  >     s1{  "Hello",  "HELLO",  "world" };
    iset< std::wstring >     s2{ L"Hello", L"HELLO", L"world" };
    iset< char const*  >     s3{  "Hello",  "HELLO",  "world" };
    iset< std::string_view > s4{  "Hello",  "HELLO",  "world" };

    write_to( std::cout,  s1,  " " );    
    write_to( std::wcout, s2, L" " );    
    write_to( std::cout,  s3,  " " );    
    write_to( std::cout,  s4,  " " );    
}

Live Demo at ColiruColiru 现场演示

From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors.从我读到的内容来看,这比 stricmp() 更可移植,因为 stricmp() 实际上不是 std 库的一部分,而仅由大多数编译器供应商实现。 As a result below is my solution to just roll your own.因此,下面是我自己推出的解决方案。

#include <string>
#include <cctype>
#include <iostream>
#include <set>

struct caseInsensitiveLess
{
  bool operator()(const std::string& x, const std::string& y)
  {
    unsigned int xs ( x.size() );
    unsigned int ys ( y.size() );
    unsigned int bound ( 0 );

    if ( xs < ys ) 
      bound = xs; 
    else 
      bound = ys;

    {
      unsigned int i = 0;
      for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
      {
        if (tolower(*it1) < tolower(*it2))
          return true;

        if (tolower(*it2) < tolower(*it1))
          return false;
      }
    }
    return false; 
  }
};

int main()
{
  std::set<std::string, caseInsensitiveLess> ss1;
  std::set<std::string> ss2;

  ss1.insert("This is the first string");
  ss1.insert("THIS IS THE FIRST STRING");
  ss1.insert("THIS IS THE SECOND STRING");
  ss1.insert("This IS THE SECOND STRING");
  ss1.insert("This IS THE Third");

  ss2.insert("this is the first string");
  ss2.insert("this is the first string");
  ss2.insert("this is the second string");
  ss2.insert("this is the second string");
  ss2.insert("this is the third");

  for ( auto& i: ss1 )
   std::cout << i << std::endl;

  std::cout << std::endl;

  for ( auto& i: ss2 )
   std::cout << i << std::endl;

}

Output with case insensitive set and regular set showing the same ordering:不区分大小写的集和常规集的输出显示相同的顺序:

This is the first string
THIS IS THE SECOND STRING
This IS THE Third

this is the first string
this is the second string
this is the third

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM