简体   繁体   中英

Why doesn't this regular expression work in PHP?

I need to match (case insensitive) "abcd" and an optional trademark symbol

Regex: /abcd(™)?/gi

See example:

preg_match("/abcd(™)?/gi","AbCd™  U9+",$matches);
print_r($matches);

When I run this, $matches isn't populated with anything... Not even created as an empty array. Any ideas?

How is your file encoded? PHP has got issues when it comes to unicode. In your case, try using the escape sequence \\x99 instead of directly embedding the TM symbol.

Note: I'm not a PHP guru. However, this seems to be an issue about character encodings. For example, your PHP file could be encoded as win-1252 (where ™ is encoded as \\x99 ), and the data you are trying to match could be encoded as UTF-8 (where ™ is encoded as \\xe2\\x84\\xa2 ), or vice versa (ie your file is UTF-8 and your data is win-1252). Try looking in this direction, and give us more information about what you are doing.

I suspect it has something to do with the literal trademark symbol.

You'll probably want to check out how to use Unicode with your regular expressions , and then embed the escape sequence for the trademark symbol .

It was a combination of things... this was the regex that finally worked:

/abcd(\xe2\x84\xa2)?/i

I had to remove /g modifier and change the tm symbol to \\xe2\\x84\\xa2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM