不區分大小寫的 std::set 字符串

Question

如何在 std::set 中進行不區分大小寫的插入或字符串搜索？

例如-

std::set<std::string> s;
s.insert("Hello");
s.insert("HELLO"); //not allowed, string already exists.

Answer 1

您需要定義一個自定義比較器：

struct InsensitiveCompare { 
    bool operator() (const std::string& a, const std::string& b) const {
        return strcasecmp(a.c_str(), b.c_str()) < 0;
    }
};

std::set<std::string, InsensitiveCompare> s;

您可以嘗試stricmp或strcoll如果strcasecmp不可用。

Answer 2

std::set 提供了提供您自己的比較器的可能性（大多數 std 容器也是如此）。 然后，您可以執行您喜歡的任何類型的比較。 完整示例可在此處獲得

Answer 3

這是一個通用解決方案，也適用於std::string以外的其他字符串類型（使用std::wstring 、 std::string_view 、 char const* ）。 基本上任何定義一系列字符的東西都應該工作。

這里的關鍵是使用boost::as_literal ，它允許我們在比較器中統一處理以空字符結尾的字符數組、字符指針和范圍。

通用代碼（“iset.h”）：

#pragma once
#include <set>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <boost/range/as_literal.hpp>

// Case-insensitive generic string comparator.
struct range_iless
{
    template< typename InputRange1, typename InputRange2 >
    bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const 
    {
        // include the standard begin() and end() aswell as any custom overloads for ADL
        using std::begin; using std::end;  

        // Treat null-terminated character arrays, character pointers and ranges uniformly.
        // This just creates cheap iterator ranges (it doesn't copy container arguments)!
        auto ir1 = boost::as_literal( r1 );
        auto ir2 = boost::as_literal( r2 );

        // Compare case-insensitively.
        return std::lexicographical_compare( 
            begin( ir1 ), end( ir1 ), 
            begin( ir2 ), end( ir2 ), 
            boost::is_iless{} );
    }
};

// Case-insensitive set for any Key that consists of a range of characters.
template< class Key, class Allocator = std::allocator<Key> >
using iset = std::set< Key, range_iless, Allocator >;

用法示例（“main.cpp”）：

#include "iset.h"  // above header file
#include <iostream>
#include <string>
#include <string_view>

// Output range to stream.
template< typename InputRange, typename Stream, typename CharT >
void write_to( Stream& s, InputRange const& r, CharT const* sep )
{
    for( auto const& elem : r )
        s << elem << sep;
    s << std::endl;
}

int main()
{
    iset< std::string  >     s1{  "Hello",  "HELLO",  "world" };
    iset< std::wstring >     s2{ L"Hello", L"HELLO", L"world" };
    iset< char const*  >     s3{  "Hello",  "HELLO",  "world" };
    iset< std::string_view > s4{  "Hello",  "HELLO",  "world" };

    write_to( std::cout,  s1,  " " );    
    write_to( std::wcout, s2, L" " );    
    write_to( std::cout,  s3,  " " );    
    write_to( std::cout,  s4,  " " );    
}

Coliru 現場演示

Answer 4

從我讀到的內容來看，這比 stricmp() 更可移植，因為 stricmp() 實際上不是 std 庫的一部分，而僅由大多數編譯器供應商實現。 因此，下面是我自己推出的解決方案。

#include <string>
#include <cctype>
#include <iostream>
#include <set>

struct caseInsensitiveLess
{
  bool operator()(const std::string& x, const std::string& y)
  {
    unsigned int xs ( x.size() );
    unsigned int ys ( y.size() );
    unsigned int bound ( 0 );

    if ( xs < ys ) 
      bound = xs; 
    else 
      bound = ys;

    {
      unsigned int i = 0;
      for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
      {
        if (tolower(*it1) < tolower(*it2))
          return true;

        if (tolower(*it2) < tolower(*it1))
          return false;
      }
    }
    return false; 
  }
};

int main()
{
  std::set<std::string, caseInsensitiveLess> ss1;
  std::set<std::string> ss2;

  ss1.insert("This is the first string");
  ss1.insert("THIS IS THE FIRST STRING");
  ss1.insert("THIS IS THE SECOND STRING");
  ss1.insert("This IS THE SECOND STRING");
  ss1.insert("This IS THE Third");

  ss2.insert("this is the first string");
  ss2.insert("this is the first string");
  ss2.insert("this is the second string");
  ss2.insert("this is the second string");
  ss2.insert("this is the third");

  for ( auto& i: ss1 )
   std::cout << i << std::endl;

  std::cout << std::endl;

  for ( auto& i: ss2 )
   std::cout << i << std::endl;

}

不區分大小寫的集和常規集的輸出顯示相同的順序：

This is the first string
THIS IS THE SECOND STRING
This IS THE Third

this is the first string
this is the second string
this is the third

不區分大小寫的 std::set 字符串

問題描述

4 個解決方案

解決方案1
39 已采納 2010-11-27 11:50:44

解決方案2
2 2010-11-27 11:50:04

解決方案3
1 2018-09-14 13:44:55

解決方案4
0 2013-09-11 18:43:27

不區分大小寫的 std::set 字符串

問題描述

4 個解決方案

解決方案1 39 已采納 2010-11-27 11:50:44

解決方案2 2 2010-11-27 11:50:04

解決方案3 1 2018-09-14 13:44:55

解決方案4 0 2013-09-11 18:43:27

解決方案1
39 已采納 2010-11-27 11:50:44

解決方案2
2 2010-11-27 11:50:04

解決方案3
1 2018-09-14 13:44:55

解決方案4
0 2013-09-11 18:43:27