简体   繁体   English

匹配词汇单词和短语

[英]Match vocabulary words and phrases

I am writing an application/logic that has vocabulary word / phrase as an input parameter.我正在编写一个将vocabulary word / phrase作为输入参数的应用程序/逻辑。 I am having troubles writing validation logic for this parameter's value !我在为这个参数的值编写验证逻辑时遇到了麻烦

Following are the rules I've came up with:以下是我想出的规则:

  • can be up to 4 words (with hyphens or not)最多可包含 4 个单词(带或不带连字符)
  • one apostrophe is allowed一个撇号是允许的
  • only regular letters are allowed (no special characters like;@#$%^&*()={}[]""?|/>/? ¶ © etc)只允许使用常规字母(没有特殊字符,例如;@#$%^&*()={}[]""?|/>/? ¶ © etc)
  • numbers are disallowed数字是不允许的
  • case insensitive不区分大小写
  • multiple languages support (English, Russian, Norwegian, etc..) (so both Unicode and Cyrillic must be supported)支持多种语言(英语、俄语、挪威语等)(因此必须同时支持 Unicode 和西里尔文)
  • either whole string matches or nothing整个字符串匹配或不匹配

Few examples (in 3 languages):几个例子(3 种语言):

// match:
one two three four
one-two-three-four
one-two-three four
vær så snill
тест регекс
re-read
under the hood
ONe
rabbit's lair

// not-match:
one two three four five
one two three four@
one-two-three-four five
rabbit"s lair
one' two's
one1
1900

Given the expected result provided above - could someone point me to right direction on how to create a validation rule like that?鉴于上面提供的预期结果 - 有人可以指出我如何创建这样的验证规则的正确方向吗? If that matters - I will be writing validation logic in C# so I have more tools than just Regex available at my disposal.如果这很重要 - 我将在C#中编写验证逻辑,所以我有更多的工具可供我使用,而不仅仅是Regex

If that is going to be of any help - I have been testing several solutions, like these ^[\p{Ll}\p{Lt}]+$ and (?=\S*['-])([a-zA-Z'-]+)$ .如果这会有所帮助-我一直在测试几种解决方案,例如^[\p{Ll}\p{Lt}]+$(?=\S*['-])([a-zA-Z'-]+)$ The first regex seems to be doing a great job allowing just the letters I need (En, No and Rus), whereas the second rule set is doing great in using the Lookahead concept.第一个正则表达式似乎做得很好,只允许我需要的字母(En、No 和 Rus),而第二个规则集在使用Lookahead概念方面做得很好。

  • \p{Ll} or \p{Lowercase_Letter} : a lowercase letter that has an uppercase variant. \p{Ll}\p{Lowercase_Letter} :具有大写变体的小写字母。
  • \p{Lu} or \p{Uppercase_Letter} : an uppercase letter that has a lowercase variant. \p{Lu}\p{Uppercase_Letter} :具有小写变体的大写字母。
  • \p{Lt} or \p{Titlecase_Letter} : a letter that appears at the start of a word when only the first letter of the word is capitalized. \p{Lt}\p{Titlecase_Letter} :当单词的首字母大写时,出现在单词开头的字母。
  • \p{L&} or \p{Letter&} : a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt). \p{L&}\p{Letter&} :以小写和大写变体形式存在的字母(Ll、Lu 和 Lt 的组合)。
  • \p{Lm} or \p{Modifier_Letter} : a special character that is used like a letter. \p{Lm}\p{Modifier_Letter} :像字母一样使用的特殊字符。
  • \p{Lo} or \p{Other_Letter} : a letter or ideograph that does not have lowercase and uppercase variants. \p{Lo}\p{Other_Letter} :没有小写和大写变体的字母或表意文字。

Needless to say, neither of the solutions I have been testing take into account all the rules I defined above..不用说,我一直在测试的解决方案都没有考虑到我上面定义的所有规则。

You can use您可以使用

\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+){0,3}\z

See the regex demo .请参阅正则表达式演示 Details :详情

  • \A - start of string \A - 字符串的开头
  • (??(::[^']*'){2}) - the string cannot contain two apostrophes (??(::[^']*'){2}) - 字符串不能包含两个撇号
  • \p{L}+ - one or more Unicode letters \p{L}+ - 一个或多个 Unicode 字母
  • (?:[\s'-]\p{L}+){0,3} - zero to three occurrences of (?:[\s'-]\p{L}+){0,3} - 零到三次出现
    • [\s'-] - a whitespace, ' or - char [\s'-] - 空格, '-字符
    • \p{L}+ - one or more Unicode letters \p{L}+ - 一个或多个 Unicode 字母
  • \z - the very end of string. \z - 字符串的最后。

In C#, you can use it as在 C# 中,您可以将其用作

var IsValid = Regex.IsMatch(text, @"\A(?!(?:[^']*'){2})\p{L}+(?:[\s'-]\p{L}+");{0,3}\z")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM