[英]Regex doesn't match all foreign characters
Here's my regex ^([\\\\p{L}-|a-zA-Z0-9-_]+)$
and it is supposed to allow all foreign letters as well as numeric letter, number.这是我的正则表达式
^([\\\\p{L}-|a-zA-Z0-9-_]+)$
,它应该允许所有外文字母以及数字字母、数字。 But for some reason, hindi characters cannot match.但由于某种原因,印地文字符无法匹配。
I wrote a Xunit test to prove.我写了一个 Xunit 测试来证明。
[Fact]
public void test()
{
var hindiChar = "इम्तहान";
var input = "12345ABCDPrüfungテスト中文테스트إسرائيل" + hindiChar;
var regex = "^([\\p{L}-|a-zA-Z0-9-_]+)$";
Assert.True(new Regex(regex).IsMatch(input));
}
If you remove the hindiChar
, the test will return true;如果删除
hindiChar
,则测试将返回 true; but if you add the hindiChar
, the test will return false.但是如果添加
hindiChar
,测试将返回 false。
I thought part of the regex is to fit all foreign characters, but not sure why it doesn't match hindi characters.我认为正则表达式的一部分是适合所有外来字符,但不确定为什么它不匹配印地语字符。
It is not enough to use \\p{L}
to match words, you also need to match diacritics.使用
\\p{L}
来匹配单词是不够的,您还需要匹配变音符号。 That can be done by adding \\p{M}
to your regex.这可以通过将
\\p{M}
添加到您的正则表达式来完成。 Note that even the \\w
shorthand "word" character class in .NET regex by default also matches a set of diacritics, \\p{Mn}
( Mark, Nonspacing Unicode char category), see this .NET regex reference .请注意,即使默认情况下 .NET 正则表达式中的
\\w
速记“单词”字符类也匹配一组变音符号\\p{Mn}
(标记,非间距Unicode 字符类别),请参阅此 .NET 正则表达式参考。 However, here you need \\p{M}
to allow any diacritics.但是,在这里您需要
\\p{M}
以允许任何变音符号。
Note that |
请注意
|
inside a character class matches a literal |
在字符类中匹配文字
|
char, so you need to remove the |
char,所以你需要删除
|
from your pattern.从你的模式。
It seems to me you use在我看来你用
@"^[\p{L}\p{M}0-9_-]+$"
It will match any string of one or more letters, diacritics, ASCII digits, _
or -
chars.它将匹配由一个或多个字母、变音符号、ASCII 数字、
_
或-
字符组成的任何字符串。
See the regex demo .请参阅正则表达式演示。
Note that in case you want to allow any Unicode digit chars, you may even use请注意,如果您想允许任何 Unicode 数字字符,您甚至可以使用
@"^[\w\p{M}-]+$"
See another demo看另一个演示
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.