Explain this UTF-8 detection regex

Question

This question asked how to detect UTF-8 strings - How to detect if have to apply utf8 decode or encode on a string?

The solution was this:

if (preg_match('!!u', $string))
{
   // this is utf-8
}
else 
{
   // definitely not utf-8
}

I haven't been able to figure out how to breakdown the "!!u" expression. I clicked through all of PHP's PCRE stuff and might have missed the description for "!" marks and "u"-somethings. I tried running it through perl's YAPE::Regex::Explain (as seen in Please explain this Perl regular expression ) and couldn't get something that made sense [I'm no perl expert - don't know if I fed it the right expression/string].

So... how exactly does preg_match('!!u', $string) work?

Answer 1

It's just an empty regular expression. ! is the delimiter and u is the modfier .

As for why it works, from PHP Manual's description of the u modifier (emphasis mine):

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

Answer 2

The ! is being used as the delimiter instead of / . I'll rewrite this for you, //u is the same thing. The u is a modifier that treats the pattern as utf8.

Explain this UTF-8 detection regex

Question

2 answers

solution1
6 ACCPTED 2012-06-01 18:46:38

solution2
5 2012-06-01 18:46:57

Explain this UTF-8 detection regex

Question

2 answers

solution1 6 ACCPTED 2012-06-01 18:46:38

solution2 5 2012-06-01 18:46:57

solution1
6 ACCPTED 2012-06-01 18:46:38

solution2
5 2012-06-01 18:46:57