简体   繁体   中英

RegEx to test if a string contains more than X Unicode words

I saw many solutions that match Latin characters words like this one: /^\\W*(\\w+\\b\\W*){80,}$/ I'm looking for the equivalent expression that will support any language with Unicode characters.

The RegEx need to be JavaScript compatible.

EDIT: Javascript sadly doesn't seem to support this solution... You might want to look into XRegEx

I'll leave this here in case it's of use for anyone in another language more Perl compatible, but this doesn't answer your question, sorry.


For unicode support you can use the \\p{...} pattern .

Your pattern would become

/^\P{L}*(\p{L}+\P{L}*){80,}$/

Here \\P{L} stands for anything but a letter, \\p{L} for any letter (but not a digit or a _ , so it's a little bit different from \\w )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM