[英]Case insensitive std::string.find()
I am using std::string
's find()
method to test if a string is a substring of another.我正在使用
std::string
的find()
方法来测试一个字符串是否是另一个字符串的 substring。 Now I need case insensitive version of the same thing.现在我需要同一事物的不区分大小写的版本。 For string comparison I can always turn to
stricmp()
but there doesn't seem to be a stristr()
.对于字符串比较,我总是可以求助于
stricmp()
但似乎没有stristr()
。
I have found various answers and most suggest using Boost
which is not an option in my case.我找到了各种答案,大多数人建议使用
Boost
,这在我的情况下不是一个选项。 Additionally, I need to support std::wstring
/ wchar_t
.此外,我需要支持
std::wstring
/ wchar_t
。 Any ideas?有任何想法吗?
You could use std::search
with a custom predicate.您可以将
std::search
与自定义谓词一起使用。
#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;
// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
my_equal( const std::locale& loc ) : loc_(loc) {}
bool operator()(charT ch1, charT ch2) {
return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
}
private:
const std::locale& loc_;
};
// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
typename T::const_iterator it = std::search( str1.begin(), str1.end(),
str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
if ( it != str1.end() ) return it - str1.begin();
else return -1; // not found
}
int main(int arc, char *argv[])
{
// string test
std::string str1 = "FIRST HELLO";
std::string str2 = "hello";
int f1 = ci_find_substr( str1, str2 );
// wstring test
std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
std::wstring wstr2 = L"привет";
int f2 = ci_find_substr( wstr1, wstr2 );
return 0;
}
The new C++11 style:新的 C++11 风格:
#include <algorithm>
#include <string>
#include <cctype>
/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
auto it = std::search(
strHaystack.begin(), strHaystack.end(),
strNeedle.begin(), strNeedle.end(),
[](char ch1, char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
);
return (it != strHaystack.end() );
}
Explanation of the std::search can be found on cplusplus.com . std::search 的解释可以在cplusplus.com上找到。
why not use Boost.StringAlgo:为什么不使用 Boost.StringAlgo:
#include <boost/algorithm/string/find.hpp>
bool Foo()
{
//case insensitive find
std::string str("Hello");
boost::iterator_range<std::string::const_iterator> rng;
rng = boost::ifind_first(str, std::string("EL"));
return rng;
}
Why not just convert both strings to lowercase before you call find()
?为什么不在调用
find()
之前将两个字符串都转换为小写?
Notice:注意:
Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.由于您正在执行子字符串搜索(std::string)而不是元素(字符)搜索,不幸的是,我知道没有现有的解决方案可以在标准库中立即访问以执行此操作。
Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).不过,这很容易做到:只需将两个字符串都转换为大写(或都转换为小写 - 我在本例中选择了 upper)。
std::string upper_string(const std::string& str)
{
string upper;
transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
return upper;
}
std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
return upper(str).find(upper(substr) );
}
This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand.这不是一个快速的解决方案(接近悲观领域),但它是我所知道的唯一一个现成的解决方案。 It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.
如果您担心效率,那么实现您自己的不区分大小写的子字符串查找器也不难。
Additionally, I need to support std::wstring/wchar_t.
此外,我需要支持 std::wstring/wchar_t。 Any ideas?
有任何想法吗?
tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).语言环境中的 tolower/toupper 也适用于宽字符串,因此上面的解决方案应该同样适用(简单地将 std::string 更改为 std::wstring)。
[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. [编辑] 正如所指出的,另一种方法是通过指定您自己的字符特征来从 basic_string 调整您自己的不区分大小写的字符串类型。 This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.
如果您可以接受所有字符串搜索、比较等对给定字符串类型不区分大小写,则此方法有效。
Also make sense to provide Boost version: This will modify original strings.提供 Boost 版本也有意义:这将修改原始字符串。
#include <boost/algorithm/string.hpp>
string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)
if (str1.find(str2) != std::string::npos)
{
// str1 contains str2
}
or using perfect boost xpression library或使用完美的boost xpression 库
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
cout << word << " found!" << endl;
}
In this example you should pay attention that your search word don't have any regex special characters.在这个例子中你应该注意你的搜索词没有任何正则表达式特殊字符。
如果您想根据 Unicode 和语言环境规则进行“真实”比较,请使用ICU 的Collator
类。
#include <iostream>
using namespace std;
template <typename charT>
struct ichar {
operator charT() const { return toupper(x); }
charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }
int main()
{
string s = "The STRING";
wstring ws = L"The WSTRING";
cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr")) << endl;
}
A little bit dirty, but short & fast.有点脏,但又短又快。
I love the answers from Kiril V. Lyadvinsky and CC .我喜欢Kiril V. Lyadvinsky和CC的回答。 but my problem was a little more specific than just case-insensitivity;
但我的问题不仅仅是不区分大小写; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, eg,
Wolfjäger
shouldn't match jäger
but <jäger>
should.我需要一个懒惰的 Unicode 支持的命令行参数解析器,它可以在处理字母数字字符串搜索时消除误报/否定,这些搜索可能在基本字符串中包含特殊字符,用于格式化我正在搜索的字母数字关键字,例如,
Wolfjäger
应该' t 匹配jäger
但<jäger>
应该匹配。
It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.它基本上只是 Kiril/CC 的答案,对字母数字精确长度匹配进行了额外处理。
/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
/* Fail fast if the base string was smaller than what we're looking for */
if (subString.length() > baseString.length())
return false;
auto it = std::search(
baseString.begin(), baseString.end(), subString.begin(), subString.end(),
[](char ch1, char ch2)
{
return std::toupper(ch1) == std::toupper(ch2);
}
);
if(it == baseString.end())
return false;
size_t match_start_offset = it - baseString.begin();
std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);
/* Typical special characters and whitespace to split the substring up. */
size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$#@!~`");
/* Pass fast if the remainder of the base string where
the match started is the same length as the substring. */
if (match_end_pos == std::wstring::npos && match_start.length() == subString.length())
return true;
std::wstring extracted_match = match_start.substr(0, match_end_pos);
return (extracted_match.length() == subString.length());
}
The Most Efficient Way最有效的方法
Simple and Fast.简单快捷。
Performance is guaranteed to be linear, with an initialization cost of 2 * NEEDLE_LEN comparisons.
性能保证是线性的,初始化成本为 2 * NEEDLE_LEN 比较。 (glic)
(glic)
#include <cstring>
#include <string>
#include <iostream>
int main(void) {
std::string s1{"abc de fGH"};
std::string s2{"DE"};
auto pos = strcasestr(s1.c_str(), s2.c_str());
if(pos != nullptr)
std::cout << pos - s1.c_str() << std::endl;
return 0;
}
wxWidgets has a very rich string API wxString wxWidgets 有非常丰富的字符串 API wxString
it can be done with (using the case conversion way)可以用(使用大小写转换方式)
int Contains(const wxString& SpecProgramName, const wxString& str)
{
wxString SpecProgramName_ = SpecProgramName.Upper();
wxString str_ = str.Upper();
int found = SpecProgramName.Find(str_);
if (wxNOT_FOUND == found)
{
return 0;
}
return 1;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.