正则表达式不匹配所有外来字符

Question

Here's my regex ^([\\\\p{L}-|a-zA-Z0-9-_]+)$ and it is supposed to allow all foreign letters as well as numeric letter, number.这是我的正则表达式^([\\\\p{L}-|a-zA-Z0-9-_]+)$ ，它应该允许所有外文字母以及数字字母、数字。 But for some reason, hindi characters cannot match.但由于某种原因，印地文字符无法匹配。

I wrote a Xunit test to prove.我写了一个 Xunit 测试来证明。

[Fact]
        public void test()
        {
            var hindiChar = "इम्तहान";
            var input = "12345ABCDPrüfungテスト中文테스트إسرائيل" + hindiChar;
            var regex = "^([\\p{L}-|a-zA-Z0-9-_]+)$";
            Assert.True(new Regex(regex).IsMatch(input));
        }

If you remove the hindiChar , the test will return true;如果删除hindiChar ，则测试将返回 true； but if you add the hindiChar , the test will return false.但是如果添加hindiChar ，测试将返回 false。

I thought part of the regex is to fit all foreign characters, but not sure why it doesn't match hindi characters.我认为正则表达式的一部分是适合所有外来字符，但不确定为什么它不匹配印地语字符。

Answer 1

It is not enough to use \\p{L} to match words, you also need to match diacritics.使用\\p{L}来匹配单词是不够的，您还需要匹配变音符号。 That can be done by adding \\p{M} to your regex.这可以通过将\\p{M}添加到您的正则表达式来完成。 Note that even the \\w shorthand "word" character class in .NET regex by default also matches a set of diacritics, \\p{Mn} ( Mark, Nonspacing Unicode char category), see this .NET regex reference .请注意，即使默认情况下 .NET 正则表达式中的\\w速记“单词”字符类也匹配一组变音符号\\p{Mn} （标记，非间距Unicode 字符类别），请参阅此 .NET 正则表达式参考。 However, here you need \\p{M} to allow any diacritics.但是，在这里您需要\\p{M}以允许任何变音符号。

It seems to me you use在我看来你用

@"^[\p{L}\p{M}0-9_-]+$"

It will match any string of one or more letters, diacritics, ASCII digits, _ or - chars.它将匹配由一个或多个字母、变音符号、ASCII 数字、 _或-字符组成的任何字符串。

See the regex demo .请参阅正则表达式演示。

Note that in case you want to allow any Unicode digit chars, you may even use请注意，如果您想允许任何 Unicode 数字字符，您甚至可以使用

@"^[\w\p{M}-]+$"

See another demo看另一个演示

正则表达式不匹配所有外来字符

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-11-15 18:54:26

正则表达式不匹配所有外来字符

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-11-15 18:54:26

解决方案1
3 已采纳 2018-11-15 18:54:26