簡體   English   中英

不區分大小寫的 std::set 字符串

[英]Case insensitive std::set of strings

如何在 std::set 中進行不區分大小寫的插入或字符串搜索?

例如-

std::set<std::string> s;
s.insert("Hello");
s.insert("HELLO"); //not allowed, string already exists.

您需要定義一個自定義比較器:

struct InsensitiveCompare { 
    bool operator() (const std::string& a, const std::string& b) const {
        return strcasecmp(a.c_str(), b.c_str()) < 0;
    }
};

std::set<std::string, InsensitiveCompare> s;

您可以嘗試stricmpstrcoll如果strcasecmp不可用。

std::set 提供了提供您自己的比較器的可能性(大多數 std 容器也是如此)。 然后,您可以執行您喜歡的任何類型的比較。 完整示例可在此處獲得

這是一個通用解決方案,也適用於std::string以外的其他字符串類型(使用std::wstringstd::string_viewchar const* )。 基本上任何定義一系列字符的東西都應該工作。

這里的關鍵是使用boost::as_literal ,它允許我們在比較器中統一處理以空字符結尾的字符數組、字符指針和范圍。

通用代碼(“iset.h”):

#pragma once
#include <set>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <boost/range/as_literal.hpp>

// Case-insensitive generic string comparator.
struct range_iless
{
    template< typename InputRange1, typename InputRange2 >
    bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const 
    {
        // include the standard begin() and end() aswell as any custom overloads for ADL
        using std::begin; using std::end;  

        // Treat null-terminated character arrays, character pointers and ranges uniformly.
        // This just creates cheap iterator ranges (it doesn't copy container arguments)!
        auto ir1 = boost::as_literal( r1 );
        auto ir2 = boost::as_literal( r2 );

        // Compare case-insensitively.
        return std::lexicographical_compare( 
            begin( ir1 ), end( ir1 ), 
            begin( ir2 ), end( ir2 ), 
            boost::is_iless{} );
    }
};

// Case-insensitive set for any Key that consists of a range of characters.
template< class Key, class Allocator = std::allocator<Key> >
using iset = std::set< Key, range_iless, Allocator >;

用法示例(“main.cpp”):

#include "iset.h"  // above header file
#include <iostream>
#include <string>
#include <string_view>

// Output range to stream.
template< typename InputRange, typename Stream, typename CharT >
void write_to( Stream& s, InputRange const& r, CharT const* sep )
{
    for( auto const& elem : r )
        s << elem << sep;
    s << std::endl;
}

int main()
{
    iset< std::string  >     s1{  "Hello",  "HELLO",  "world" };
    iset< std::wstring >     s2{ L"Hello", L"HELLO", L"world" };
    iset< char const*  >     s3{  "Hello",  "HELLO",  "world" };
    iset< std::string_view > s4{  "Hello",  "HELLO",  "world" };

    write_to( std::cout,  s1,  " " );    
    write_to( std::wcout, s2, L" " );    
    write_to( std::cout,  s3,  " " );    
    write_to( std::cout,  s4,  " " );    
}

Coliru 現場演示

從我讀到的內容來看,這比 stricmp() 更可移植,因為 stricmp() 實際上不是 std 庫的一部分,而僅由大多數編譯器供應商實現。 因此,下面是我自己推出的解決方案。

#include <string>
#include <cctype>
#include <iostream>
#include <set>

struct caseInsensitiveLess
{
  bool operator()(const std::string& x, const std::string& y)
  {
    unsigned int xs ( x.size() );
    unsigned int ys ( y.size() );
    unsigned int bound ( 0 );

    if ( xs < ys ) 
      bound = xs; 
    else 
      bound = ys;

    {
      unsigned int i = 0;
      for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
      {
        if (tolower(*it1) < tolower(*it2))
          return true;

        if (tolower(*it2) < tolower(*it1))
          return false;
      }
    }
    return false; 
  }
};

int main()
{
  std::set<std::string, caseInsensitiveLess> ss1;
  std::set<std::string> ss2;

  ss1.insert("This is the first string");
  ss1.insert("THIS IS THE FIRST STRING");
  ss1.insert("THIS IS THE SECOND STRING");
  ss1.insert("This IS THE SECOND STRING");
  ss1.insert("This IS THE Third");

  ss2.insert("this is the first string");
  ss2.insert("this is the first string");
  ss2.insert("this is the second string");
  ss2.insert("this is the second string");
  ss2.insert("this is the third");

  for ( auto& i: ss1 )
   std::cout << i << std::endl;

  std::cout << std::endl;

  for ( auto& i: ss2 )
   std::cout << i << std::endl;

}

不區分大小寫的集和常規集的輸出顯示相同的順序:

This is the first string
THIS IS THE SECOND STRING
This IS THE Third

this is the first string
this is the second string
this is the third

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM