简体   繁体   中英

\w doesn't match enough, what should I use instead?

(in PHP) I have the following string:

$string = '<!--:fr--><p>Mamá lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ut est et tortor sagittis auctor id ut urna. Etiam quañ justo, pharetra sed bibendum at, vulputate et augue.</p> <p>Curabitur cursus mi vel quam placerat malesuada. Fusce euismod mollis tincidunt. Sed cursus, sem et porta dictum, elit purus facilisis massa, eget consectetur nisi libero eget leo. Vivamus vitae mattis nulla. varius fermentum.</p><!--:-->'

And I wanna eliminate <:--:fr--> and <:--:--> using

preg_replace('/<!--:[a-z]{2}-->(\w+)<!--:-->/', '${1}', $string)

But it return the same $string. What is the problem?

You have characters that fall outside of [a-zA-Z0-9_] (which is what \w matches). You can match with [\s\S] , which means any whitespace or non whitespace character (ie everything).

You could also use . with s flag.

Try this...

preg_replace('/<!--:[a-z]{2}-->([\s\S]+?)<!--:-->/', '${1}', $string);

Ideone .

The other possibility is that you just remove the part you don't want.

preg_replace('/<!--:(?:[a-z]{2})?-->/', '', $string);

This matches only your not wanted part <:--?(:?[az]{2})?--> where the (?:[az]{2})? is two optional lowercase letters, that means it will match both parts.

To solve your problem, you only need a simple regex like <:--?(fr)?--> and a PHP code like:

$string = preg_replace('/<!--:(fr)?-->/', '', $string);

To answer the question: \w is a very limited and not recommended shortcut. It will eg not match ñ from your input and neither will it match , . PHP has good support for Unicode. The shortcut \p{L} match any letter from any language. There are also shortcuts for any punctuation etc. These can be combined in a character class. Eg if you want to match at least one letter (including French and Spanish letters), dot or comma in any sequence, you can write:

[\p{L}.,]+

There are some information on how this works here:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM