简体   繁体   中英

How to generate url slug from chinese characters?

Normally for generating url slug I use https://github.com/jprichardson/string.js library - and exactly slugify method. However it removes all chinese characters. As a workaround I use following function:

var slugify = function(str){
   str = str.replace(/\s+/g,'-') // replace spaces with dashes
   str = encodeURIComponent(str) // encode (it encodes chinese characters)
   return str
}

So for input中文 标题I get %E4%B8%AD%E6%96%87-%E6%A0%87%E9%A2%98 and it looks like this in web browser url input box (and it works):

http://example.com/中文-标题

However I want to also remove any special characters like !@#$%^&*) etc. The problem is that string.js library is using following piece of code internally:

.replace(/[^\w\s-]/g

And it removes any special characters, BUT ALSO removes chinese characters as they don't match with \\w regexp...

So my question is - how to modify above regexp so make it keep chinese characters?


I tried

replace(/[^a-zA-Z0-9_\s-\u3400-\u9FBF]/g,'')

But it still replaces chinese characters...

If you want to match (or exclude) the dash - character in a set of characters (with square brackets), you have to put it in the end.

Your regexp matches characters that are not

  • in the range az
  • in the range AZ
  • in the range 0-9
  • _
  • in the range \\s-\㐀 that's your problem
  • -
  • \龿

You want to do:

replace(/[^a-zA-Z0-9_\u3400-\u9FBF\s-]/g,'')

do a positive match list:

  replace(/[\!@#\$%^&\*\)]/g,'')

Anyway I would consider to take URL meta chars out of that:

   replace(/[\!@\$\^\*\)]/g,'')

You can try uslug , which slugify汉语/漢語to汉语漢語

If you want to transform Chinese characters to Pinyin, try transliteration

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM