简体   繁体   中英

Matching sets consisting of letters plus non-letter characters

I want to match sets of characters that include a letter and non-letter characters. Many of them are a single letter. Or two letters.

 const match = 'tɕ\'i mɑ mɑ ku ʂ ɪɛ'.match(/\b(p|p\'|m|f|t|t\'|n|l|k|k\'|h|tɕ|tɕ\'|ɕ|tʂ|tʂ\'|ʂ|ʐ|ts|ts\'|s)\b/g) console.log(match)

I thought I could use \b , but it's wrong because there are "non-words" characters in the sets.

This is the current output:

[
  "t",
  "m",
  "m"
]

But I want this to be the output:

[
  "tɕ'",
  "m",
  "m",
  "k",
  "ʂ"
]

Note: notice that some sets end with a non-word boundary, like tɕ' .

(In ph.netic terms, the consonants.)

As stated in comments above \b doesn't with unicode characters in JS and moreover from your expected output it appears that you don't need word boundaries.

You can use this shortened and refactored regex:

t[ɕʂs]'?|[tkp]'?|[tmfnlhshɕʐʂ]

Code:

 const s = 'tɕ\'i mɑ mɑ ku ʂ ɪɛ'; const re = /t[ɕʂs]'?|[tkp]'?|[tmfnlhshɕʐʂ]/g console.log(s.match(re)) //=> ["tɕ'", "m", "m", "k", "ʂ" ]

RegEx Demo

RegEx Details:

- t[ɕʂs]'? : Match t followed by any letter inside [...] and then an optional '

  • | : OR
  • [tkp]'? : Match letters t or k or p and then an optional '
  • | : OR
  • [tmfnlhshɕʐʂ]) : Match any letter inside [...]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM