匹配词汇单词和短语

Question

I am writing an application/logic that has vocabulary word / phrase as an input parameter.我正在编写一个将vocabulary word / phrase作为输入参数的应用程序/逻辑。 I am having troubles writing validation logic for this parameter's value !我在为这个参数的值编写验证逻辑时遇到了麻烦！

Following are the rules I've came up with:以下是我想出的规则：

can be up to 4 words (with hyphens or not)最多可包含 4 个单词（带或不带连字符）
one apostrophe is allowed一个撇号是允许的
only regular letters are allowed (no special characters like;@#$%^&*()={}[]""?|/>/? ¶ © etc)只允许使用常规字母（没有特殊字符，例如;@#$%^&*()={}[]""?|/>/? ¶ © etc）
numbers are disallowed数字是不允许的
case insensitive不区分大小写
multiple languages support (English, Russian, Norwegian, etc..) (so both Unicode and Cyrillic must be supported)支持多种语言（英语、俄语、挪威语等）（因此必须同时支持 Unicode 和西里尔文）
either whole string matches or nothing整个字符串匹配或不匹配

Few examples (in 3 languages):几个例子（3 种语言）：

// match:
one two three four
one-two-three-four
one-two-three four
vær så snill
тест регекс
re-read
under the hood
ONe
rabbit's lair

// not-match:
one two three four five
one two three four@
one-two-three-four five
rabbit"s lair
one' two's
one1
1900

Given the expected result provided above - could someone point me to right direction on how to create a validation rule like that?鉴于上面提供的预期结果 - 有人可以指出我如何创建这样的验证规则的正确方向吗？ If that matters - I will be writing validation logic in C# so I have more tools than just Regex available at my disposal.如果这很重要 - 我将在C#中编写验证逻辑，所以我有更多的工具可供我使用，而不仅仅是Regex 。

If that is going to be of any help - I have been testing several solutions, like these ^[\p{Ll}\p{Lt}]+$ and (?=\S*['-])([a-zA-Z'-]+)$ .如果这会有所帮助-我一直在测试几种解决方案，例如^[\p{Ll}\p{Lt}]+$和(?=\S*['-])([a-zA-Z'-]+)$ 。 The first regex seems to be doing a great job allowing just the letters I need (En, No and Rus), whereas the second rule set is doing great in using the Lookahead concept.第一个正则表达式似乎做得很好，只允许我需要的字母（En、No 和 Rus），而第二个规则集在使用Lookahead概念方面做得很好。

\p{Ll} or \p{Lowercase_Letter} : a lowercase letter that has an uppercase variant. \p{Ll}或\p{Lowercase_Letter} ：具有大写变体的小写字母。
\p{Lu} or \p{Uppercase_Letter} : an uppercase letter that has a lowercase variant. \p{Lu}或\p{Uppercase_Letter} ：具有小写变体的大写字母。
\p{Lt} or \p{Titlecase_Letter} : a letter that appears at the start of a word when only the first letter of the word is capitalized. \p{Lt}或\p{Titlecase_Letter} ：当单词的首字母大写时，出现在单词开头的字母。
\p{L&} or \p{Letter&} : a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt). \p{L&}或\p{Letter&} ：以小写和大写变体形式存在的字母（Ll、Lu 和 Lt 的组合）。
\p{Lm} or \p{Modifier_Letter} : a special character that is used like a letter. \p{Lm}或\p{Modifier_Letter} ：像字母一样使用的特殊字符。
\p{Lo} or \p{Other_Letter} : a letter or ideograph that does not have lowercase and uppercase variants. \p{Lo}或\p{Other_Letter} ：没有小写和大写变体的字母或表意文字。

Needless to say, neither of the solutions I have been testing take into account all the rules I defined above..不用说，我一直在测试的解决方案都没有考虑到我上面定义的所有规则。

Answer 1

You can use您可以使用

\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+){0,3}\z

See the regex demo .请参阅正则表达式演示。 Details :详情：

\A - start of string \A - 字符串的开头
(??(::[^']*'){2}) - the string cannot contain two apostrophes (??(::[^']*'){2}) - 字符串不能包含两个撇号
\p{L}+ - one or more Unicode letters \p{L}+ - 一个或多个 Unicode 字母
(?:[\s'-]\p{L}+){0,3} - zero to three occurrences of (?:[\s'-]\p{L}+){0,3} - 零到三次出现
- [\s'-] - a whitespace, ' or - char [\s'-] - 空格， '或-字符
- \p{L}+ - one or more Unicode letters \p{L}+ - 一个或多个 Unicode 字母
\z - the very end of string. \z - 字符串的最后。

In C#, you can use it as在 C# 中，您可以将其用作

var IsValid = Regex.IsMatch(text, @"\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+");{0,3}\z")

匹配词汇单词和短语

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-07 15:08:17

匹配词汇单词和短语

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-07 15:08:17

解决方案1
1 已采纳 2021-02-07 15:08:17