After learning basic c++ rules,I specialized my focus on std::regex
, creating two console apps: 1. renrem
and 2. bfind
.
And I decided to create some convenient functions to deal with regex
in c++ as easy as possible plus all with std
; named RFC ( = regex function collection )
There are several strange things that always make me surprise, but this one ruined all my attempt and those two console apps.
One of the important functions is count_match
that counts number of match inside a string. Here is the full code:
unsigned int count_match( const std::string& user_string, const std::string& user_pattern, const std::string& flags = "o" ){
const bool flags_has_i = flags.find( "i" ) < flags.size();
const bool flags_has_g = flags.find( "g" ) < flags.size();
std::regex::flag_type regex_flag = flags_has_i ? std::regex_constants::icase : std::regex_constants::ECMAScript;
// std::regex_constants::match_flag_type search_flag = flags_has_g ? std::regex_constants::match_default : std::regex_constants::format_first_only;
std::regex rx( user_pattern, regex_flag );
std::match_results< std::string::const_iterator > mr;
unsigned int counter = 0;
std::string temp = user_string;
while( std::regex_search( temp, mr, rx ) ){
temp = mr.suffix().str();
++counter;
}
if( flags_has_g ){
return counter;
} else {
if( counter >= 1 ) return 1;
else return 0;
}
}
First of all, as you can see, the line for search_flag
was commented because it is ignored by std::regex_search
and I do not know why? since -- the exact flag is accepted for std::regex_repalce
. So std::regex_search
ignores the format_first_only
but std::regex_replace
accepts it. Let's it goes.
The main problem is here that the icase
flag is also ignored when the pattern is character class -> []
. In fact when the pattern is only capital letter
or small letter
: [AZ]
or [az]
Supposing this string s = "ONE TWO THREE four five six seven"
the output for c++ std
std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 3 => Global match plus insensitive
whereas for the exact perl and d laugauge and c++ with boost
the output is:
std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 7 => Global match plus insensitive
I know about regex flavors PCRE ; or ECMAScript 262 that c++ uses it, But I have no ides why a simple flag, is ignored for the only search function that c++ has? Since std::regex_iterator
and std::regex_token_iterator
are also use this function internally.
And shortly, I can not use those two my apps and RFC with std
library because if this!
So if someone knows according to which rule it is maybe a valid rude in ECMAScript 262
or perhaps if I am wrong anywhere please tell me. Thanks.
tested with
gcc version 6.3.0 20170519 (Ubuntu/Linaro 6.3.0-18ubuntu2~16.04)
clang version 3.8.0-2ubuntu4
perl code:
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/g; print $c ;' "ONE TWO THREE four five six seven" // 3
perl -le '++$c while $ARGV[0] =~ m/[A-Z]+/gi; print $c ;' "ONE TWO THREE four five six seven" // 7
d code:
uint count_match( ref const (char[]) user_string, const (char[]) user_pattern, const (char[]) flags ){
const bool flag_has_g = flags.indexOf( "g" ) != -1;
Regex!( char ) rx = regex( user_pattern, flags );
uint counter = 0;
foreach( mr; matchAll( user_string, rx ) ){
++counter;
}
if( flag_has_g ){
return counter;
} else {
if( counter >= 1 ) return 1;
else return 0;
}
}
the output:
writeln( count_match( s, "[A-Z]+", "g" ) ); // 3
writeln( count_match( s, "[A-Z]+", "gi" ) ); // 7
js code:
var s = "ONE TWO THREE four five six seven"; var rx1 = new RegExp( "[AZ]+" , "g" ); var rx2 = new RegExp( "[AZ]+" , "gi" ); var counter = 0; while( rx1.exec( s ) ){ ++counter; } document.write( counter + "<br>" ); // 3 counter = 0; while( rx2.exec( s ) ){ ++counter; } document.write( counter ); // 7
Okay. After testing with gcc 7.1.0
it turned out that with version below 6.3.0
the output is: 1 3 3
and but with 7.1.0
the output is 1 3 7
here is the link .
Also with this version of clang
the output is correct. Here is the link . thanks to igor-tandetnik user
First of all I thought may this is a rule for ECMAScript
, but after testing js code and seeing Igor Tandetnik commend I test the code with gcc 7.1.0
and it outputs the correct result.
For test the regex library, I use:
std::cout << ( rx.flags() & std::regex_constants::icase == std::regex_constants::icase ? "yes" : "no" ) << '\n';
So when the icase
is set it returns true
otherwise returns false
. So I think there is no library fault. Here is the test with gcc 7.1.0
Therefore all versions below gcc 7.1.0
has incorrect output.
For clang
I have no ideas since I have clang 3.8.0
and it has incorrect output. But the online version even 3.7.1
output is correct.
screenshot with clang 3.8.0
for this code:
std::cout << count_match( s, "[A-Z]+" ) << '\n'; // 1 => First match
std::cout << count_match( s, "[A-Z]+", "g" ) << '\n'; // 3 => Global match
std::cout << count_match( s, "[A-Z]+", "gi" ) << '\n'; // 7 => Global match plus insensitive
So with online compiler the output is incorrect for clang 3.2
and below . But higher version outputs the correct result.
Please correct me if I am wrong
First of all, as you can see, the line for search_flag was commented because it is ignored by std::regex_search and I do not know why? since -- the exact flag is accepted for std::regex_repalce.
The flag in question is format_first_only
. This flag makes sense only for a "replace" operation. In regex_replace
, the default is "replace all" but if you pass this flag it becomes "replace first only."
In regex_match
and regex_search
, there is no replacement going on at all; both of those functions just find the first match (and in the case of regex_match
, that match must consume the entire string). Since the flag is meaningless in that case, I would expect the implementation to ignore it; but I wouldn't fault the implementation for throwing an exception, either, if it chose to be noisy about it.
The main problem is here that the icase flag is also ignored when the pattern is character class -> []. In fact when the pattern is only capital letter or small letter: [AZ] or [az]
icase
working wrong for character classes is definitely a bug in your vendor's library.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.