[英]Case insensitive std::set of strings
How do you have a case insensitive insertion Or search of a string in std::set?如何在 std::set 中进行不区分大小写的插入或字符串搜索?
For example-例如-
std::set<std::string> s;
s.insert("Hello");
s.insert("HELLO"); //not allowed, string already exists.
You need to define a custom comparator:您需要定义一个自定义比较器:
struct InsensitiveCompare {
bool operator() (const std::string& a, const std::string& b) const {
return strcasecmp(a.c_str(), b.c_str()) < 0;
}
};
std::set<std::string, InsensitiveCompare> s;
You may try stricmp
or strcoll
if strcasecmp
is not available.您可以尝试
stricmp
或strcoll
如果strcasecmp
不可用。
This is a generic solution that also works with other string types than std::string
(tested with std::wstring
, std::string_view
, char const*
).这是一个通用解决方案,也适用于
std::string
以外的其他字符串类型(使用std::wstring
、 std::string_view
、 char const*
)。 Basically anything that defines a range of characters should work.基本上任何定义一系列字符的东西都应该工作。
The key point here is to use boost::as_literal
that allows us to treat null-terminated character arrays, character pointers and ranges uniformly in the comparator.这里的关键是使用
boost::as_literal
,它允许我们在比较器中统一处理以空字符结尾的字符数组、字符指针和范围。
Generic code ("iset.h"):通用代码(“iset.h”):
#pragma once
#include <set>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <boost/range/as_literal.hpp>
// Case-insensitive generic string comparator.
struct range_iless
{
template< typename InputRange1, typename InputRange2 >
bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const
{
// include the standard begin() and end() aswell as any custom overloads for ADL
using std::begin; using std::end;
// Treat null-terminated character arrays, character pointers and ranges uniformly.
// This just creates cheap iterator ranges (it doesn't copy container arguments)!
auto ir1 = boost::as_literal( r1 );
auto ir2 = boost::as_literal( r2 );
// Compare case-insensitively.
return std::lexicographical_compare(
begin( ir1 ), end( ir1 ),
begin( ir2 ), end( ir2 ),
boost::is_iless{} );
}
};
// Case-insensitive set for any Key that consists of a range of characters.
template< class Key, class Allocator = std::allocator<Key> >
using iset = std::set< Key, range_iless, Allocator >;
Usage example ("main.cpp"):用法示例(“main.cpp”):
#include "iset.h" // above header file
#include <iostream>
#include <string>
#include <string_view>
// Output range to stream.
template< typename InputRange, typename Stream, typename CharT >
void write_to( Stream& s, InputRange const& r, CharT const* sep )
{
for( auto const& elem : r )
s << elem << sep;
s << std::endl;
}
int main()
{
iset< std::string > s1{ "Hello", "HELLO", "world" };
iset< std::wstring > s2{ L"Hello", L"HELLO", L"world" };
iset< char const* > s3{ "Hello", "HELLO", "world" };
iset< std::string_view > s4{ "Hello", "HELLO", "world" };
write_to( std::cout, s1, " " );
write_to( std::wcout, s2, L" " );
write_to( std::cout, s3, " " );
write_to( std::cout, s4, " " );
}
From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors.从我读到的内容来看,这比 stricmp() 更可移植,因为 stricmp() 实际上不是 std 库的一部分,而仅由大多数编译器供应商实现。 As a result below is my solution to just roll your own.
因此,下面是我自己推出的解决方案。
#include <string>
#include <cctype>
#include <iostream>
#include <set>
struct caseInsensitiveLess
{
bool operator()(const std::string& x, const std::string& y)
{
unsigned int xs ( x.size() );
unsigned int ys ( y.size() );
unsigned int bound ( 0 );
if ( xs < ys )
bound = xs;
else
bound = ys;
{
unsigned int i = 0;
for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
{
if (tolower(*it1) < tolower(*it2))
return true;
if (tolower(*it2) < tolower(*it1))
return false;
}
}
return false;
}
};
int main()
{
std::set<std::string, caseInsensitiveLess> ss1;
std::set<std::string> ss2;
ss1.insert("This is the first string");
ss1.insert("THIS IS THE FIRST STRING");
ss1.insert("THIS IS THE SECOND STRING");
ss1.insert("This IS THE SECOND STRING");
ss1.insert("This IS THE Third");
ss2.insert("this is the first string");
ss2.insert("this is the first string");
ss2.insert("this is the second string");
ss2.insert("this is the second string");
ss2.insert("this is the third");
for ( auto& i: ss1 )
std::cout << i << std::endl;
std::cout << std::endl;
for ( auto& i: ss2 )
std::cout << i << std::endl;
}
Output with case insensitive set and regular set showing the same ordering:
不区分大小写的集和常规集的输出显示相同的顺序:
This is the first string
THIS IS THE SECOND STRING
This IS THE Third
this is the first string
this is the second string
this is the third
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.