I need a regular expression that matches UTF-8 letters and digits, the dash sign ( -
) but doesn't match underscores ( _
), I tried these silly attempts without success:
([\\w-^_])+
([\\w^_]-?)+
(\\w[^_]-?)+
The \\w
is shorthand for [A-Za-z0-9_]
, but it also matches UTF-8 chars if I have the u
modifier set.
Can anyone help me out with this one?
Try this:
(?:[\w\-](?<!_))+
It does a simple match on anything that is encoded as a \\w (or a dash) and then has a zero-width lookbehind that ensures that the character that was just matched is not a underscore.
Otherwise you could pick this one:
(?:[^_\W]|-)+
which is a more set-based approach (note the uppercase W)
OK, I had a lot of fun with unicode in php's flavor of PCREs :D Peekaboo says there is a simple solution available:
[\p{L}\p{N}\-]+
\\p{L} matches anything unicode that qualifies as a Letter (note: not a word character, thus no underscores), while \\p{N} matches anything that looks like a number (including roman numerals and more exotic things).
\\- is just an escaped dash. Although not strictly necessary, I tend to make it a point to escape dashes in character classes... Note, that there are dozens of different dashes in unicode, thus giving rise to the following version:
[\p{L}\p{N}\p{Pd}]+
Where "Pd" is Punctuation Dash, including, but not limited to our minus-dash-thingy. (Note, again no underscore here).
我不确定你使用哪种语言,但在PERL中你可以简单地写:[[:alnum:] - ] +当设置正确的语言环境时。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.