简体   繁体   中英

Remove special characters from a url, but not other language characters

I am working on a web application where people post articles(like forum) in english and other languages. To create pretty permalinks from post title I use code like this.

$ln=preg_replace("/[^A-Za-z0-9[:space:]]/","",$name);
$ln = strtolower($ln);
$ln=str_replace(' ','-',$ln);

This strips all characters except alphabets and numerics. But I also want to keep words in other languages like Chinese or Hindi. So it won't stript " स्टैक ओवरफ्लो " to " ". I am unable to find any REGEX solution yet.

[^\p{L} 0-9]

\p{L} matches any kind of letter from any language You can try this.This will preserve words from other languages and remove special symbols.See demo.

https://regex101.com/r/qH1uG3/8

$re = "/[^\\p{L} 0-9]/m";
$str = "@#\$#\$sadsadस्टैक ओवरफ्लो";
$subst = "";

$result = preg_replace($re, $subst, $str);

or

[^\p{L}\p{Z}\p{N}\p{M}]
  • \p{L} matches any kind of
  • \p{Z} matches any kind of
  • \p{N} matches any kind of in any script
  • \p{M} matches a character 的字符

To be more precise.See demo.

https://regex101.com/r/qH1uG3/11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM