简体   繁体   English

如何获取正则表达式来检查字符串是否只包含字母字符[az]或[AZ]?

[英]How can I get a regex to check that a string only contains alpha characters [a-z] or [A-Z]?

I'm trying to create a regex to verify that a given string only has alpha characters az or AZ. 我正在尝试创建一个正则表达式来验证给定的字符串只有字母az或AZ。 The string can be up to 25 letters long. 该字符串最长可达25个字母。 (I'm not sure if regex can check length of strings) (我不确定正则表达式是否可以检查字符串的长度)

Examples: 例子:
1. "abcdef" = true; 1. "abcdef" = true;
2. "a2bdef" = false ; 2. "a2bdef" = false ;
3. "333" = false; 3. "333" = false;
4. "j" = true; 4. "j" = true;
5. "aaaaaaaaaaaaaaaaaaaaaaaaaa" = false; 5. "aaaaaaaaaaaaaaaaaaaaaaaaaa" = false; //26 letters // 26封信

Here is what I have so far... can't figure out what's wrong with it though 以下是我到目前为止的情况......虽然无法弄清楚它有什么问题

Regex alphaPattern = new Regex("[^az]|[^AZ]");

I would think that would mean that the string could contain only upper or lower case letters from az, but when I match it to a string with all letters it returns false... 我认为这意味着字符串只能包含来自az的大写或小写字母,但是当我将它与所有字母的字符串匹配时,它返回false ...

Also, any suggestions regarding efficiency of using regex vs. other verifying methods would be greatly appreciated. 此外,任何有关使用正则表达式与其他验证方法的效率的建议将非常感激。

Regex lettersOnly = new Regex("^[a-zA-Z]{1,25}$");
  • ^ means "begin matching at start of string" ^表示“在字符串开头处开始匹配”
  • [a-zA-Z] means "match lower case and upper case letters az" [a-zA-Z]表示“匹配小写字母和大写字母az”
  • {1,25} means "match the previous item (the character class, see above) 1 to 25 times" {1,25}表示“匹配前一项(字符类,见上文)1至25次”
  • $ means "only match if cursor is at end of string" $表示“仅匹配光标位于字符串末尾”

I'm trying to create a regex to verify that a given string only has alpha characters az or AZ. 我正在尝试创建一个正则表达式来验证给定的字符串只有字母az或AZ。

Easily done as many of the others have indicated using what are known as "character classes". 很容易完成,因为许多其他人已经指出使用所谓的“字符类”。 Essentially, these allow us to specifiy a range of values to use for matching: (NOTE: for simplification, I am assuming implict ^ and $ anchors which are explained later in this post) 从本质上讲,这些允许我们指定一系列值来用于匹配:(注意:为了简化,我假设隐含^和$ anchors,这将在本文后面解释)

[az] Match any single lower-case letter. [az]匹配任何单个小写字母。
ex: a matches, 8 doesn't match 例如:匹配,8不匹配

[AZ] Match any single upper-case letter. [AZ]匹配任何单个大写字母。
ex: A matches, a doesn't match 例如:匹配,a不匹配

[0-9] Match any single digit zero to nine [0-9]匹配任何单个数字零到九
ex: 8 matches, a doesn't match 例如:8场比赛,a不匹配

[aeiou] Match only on a or e or i or o or u. [aeiou]仅在a或e或i或o或u上匹配。 ex: o matches, z doesn't match 例如:匹配,z不匹配

[a-zA-Z] Match any single lower-case OR upper-case letter. [a-zA-Z]匹配任何单个小写或大写字母。 ex: A matches, a matches, 3 doesn't match 例如:匹配,匹配,3不匹配

These can, naturally, be negated as well: [^az] Match anything that is NOT an lower-case letter ex: 5 matches, A matches, a doesn't match 当然,这些也可以被否定: [^ az]匹配任何不是小写字母ex的东西:5个匹配,A个匹配,a匹配不匹配

[^AZ] Match anything that is NOT an upper-case letter ex: 5 matches, A doesn't matche, a matches [^ AZ]匹配任何不是大写字母ex:5匹配,A不匹配,匹配

[^0-9] Match anything that is NOT a number ex: 5 doesn't match, A matches, a matches [^ 0-9]匹配任何不是数字的东西:5不匹配,A匹配,匹配

[^Aa69] Match anything as long as it is not A or a or 6 or 9 ex: 5 matches, A doesn't match, a doesn't match, 3 matches [^ Aa69]匹配任何东西,只要它不是A或a或6或9 ex:5匹配,A不匹配,a不匹配,3匹配

To see some common character classes, go to: http://www.regular-expressions.info/reference.html 要查看一些常见的字符类,请访问: http//www.regular-expressions.info/reference.html

The string can be up to 25 letters long. 该字符串最长可达25个字母。 (I'm not sure if regex can check length of strings) (我不确定正则表达式是否可以检查字符串的长度)

You can absolutely check "length" but not in the way you might imagine. 你绝对可以检查“长度”但不是你想象的方式。 We measure repetition, NOT length strictly speaking using {}: 我们使用{}严格测量重复,而不是长度:

a{2} Match two a's together. a {2}将两个a匹配在一起。
ex: a doesn't match, aa matches, aca doesn't match 例如:a不匹配,aa匹配,aca不匹配

4{3} Match three 4's together. 4 {3}将三个4匹配在一起。 ex: 4 doesn't match, 44 doesn't match, 444 matches, 4434 doesn't match 例如:4不匹配,44不匹配,444匹配,4434不匹配

Repetition has values we can set to have lower and upper limits: 重复具有我们可以设置为具有下限和上限的值:

a{2,} Match on two or more a's together. a {2,}匹配两个或多个a。 ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa matches ex:a不匹配,aa匹配,aaa匹配,aba不匹配,aaaaaaaaa匹配

a{2,5} Match on two to five a's together. {2,5}匹配两到五个a。 ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa doesn't match ex:a不匹配,aa匹配,aaa匹配,aba不匹配,aaaaaaaaa不匹配

Repetition extends to character classes, so: [az]{5} Match any five lower-case characters together. 重复扩展到字符类,因此: [az] {5}将任意五个小写字符匹配在一起。 ex: bubba matches, Bubba doesn't match, BUBBA doesn't match, asdjo matches ex:bubba匹配,Bubba不匹配,BUBBA不匹配,asdjo匹配

[AZ]{2,5} Match two to five upper-case characters together. [AZ] {2,5}将两到五个大写字符匹配在一起。 ex: bubba doesn't match, Bubba doesn't match, BUBBA matches, BUBBETTE doesn't match ex:bubba不匹配,Bubba不匹配,BUBBA比赛,BUBBETTE不匹配

[0-9]{4,8} Match four to eight numbers together. [0-9] {4,8}将四到八个数字匹配在一起。 ex: bubba doesn't match, 15835 matches, 44 doesn't match, 3456876353456 doesn't match 例如:bubba不匹配,15835匹配,44不匹配,3456876353456不匹配

[a3g]{2} Match an a OR 3 OR g if they show up twice together. [a3g] {2}如果它们一起出现两次,则匹配一个OR 3 OR g。 ex: aa matches, ba doesn't match, 33 matches, 38 doesn't match, a3 DOESN'T match 例如:aa匹配,ba不匹配,33匹配,38不匹配,a3不匹配

Now let's look at your regex: [^az]|[^AZ] Translation: Match anything as long as it is NOT a lowercase letter OR an upper-case letter. 现在让我们来看看你的正则表达式: [^ az] | [^ AZ]翻译:匹配任何东西,只要它不是小写字母或大写字母。

To fix it so it meets your needs, we would rewrite it like this: Step 1: Remove the negation [az]|[AZ] Translation: Find any lowercase letter OR uppercase letter. 要修复它以满足您的需求,我们会像这样重写它:第1步:删除否定[az] | [AZ]翻译:找到任何小写字母或大写字母。

Step 2: While not stricly needed, let's clean up the OR logic a bit [a-zA-Z] Translation: Find any lowercase letter OR uppercase letter. 第2步:虽然不是非常需要,但是让我们稍微清理OR逻辑[a-zA-Z]翻译:找到任何小写字母或大写字母。 Same as above but now using only a single set of []. 与上面相同,但现在只使用一组[]。

Step 3: Now let's indicate "length" [a-zA-Z]{1,25} Translation: Find any lowercase letter OR uppercase letter repeated one to twenty-five times. 第3步:现在让我们指出“长度” [a-zA-Z] {1,25}翻译:找到任何小写字母或大写字母重复一到二十五次。

This is where things get funky. 这是事情变得时髦的地方。 You might think you were done here and you may well be depending on the technology you are using. 你可能认为你在这里完成了,你很可能依赖于你正在使用的技术。

Strictly speaking the regex [a-zA-Z]{1,25} will match one to twenty-five upper or lower-case letters ANYWHERE on a line: 严格地说正则表达式[A-ZA-Z] {1,25}ANYWHERE匹配线之一25上或小写字母:

[a-zA-Z]{1,25} a matches, aZgD matches, BUBBA matches, 243242hello242552 MATCHES [a-zA-Z] {1,25} a匹配,aZgD匹配,BUBBA匹配,243242hello242552 MATCHES

In fact, every example I have given so far will do the same. 事实上,到目前为止我给出的每个例子都会做同样的事情。 If that is what you want then you are in good shape but based on your question, I'm guessing you ONLY want one to twenty-five upper or lower-case letters on the entire line. 如果这就是你想要的,那么你的状态很好,但根据你的问题,我猜你只需要整行上一到二十五个大写或小写字母。 For that we turn to anchors. 为此我们转向锚点。 Anchors allow us to specify those pesky details: 锚点允许我们指定那些讨厌的细节:

^ beginning of a line ^行的开头
(I know, we just used this for negation earlier, don't get me started) (我知道,我们刚才用这个来否定,不要让我开始)

$ end of a line $ end of a line

We can use them like this: 我们可以像这样使用它们:

^a{3} From the beginning of the line match a three times together ex: aaa matches, 123aaa doesn't match, aaa123 matches ^ a {3}从一行开始匹配三次ex:aaa匹配,123aaa不匹配,aaa123匹配

a{3}$ Match a three times together at the end of a line ex: aaa matches, 123aaa matches, aaa123 doesn't match a {3} $在一行结束时匹配三次:aaa匹配,123aaa匹配,aaa123不匹配

^a{3}$ Match a three times together for the ENTIRE line ex: aaa matches, 123aaa doesn't match, aaa123 doesn't match ^ A {3} $ 条生产线的前三次的搭配在一起:AAA比赛,123aaa不匹配,AAA123不匹配

Notice that aaa matches in all cases because it has three a's at the beginning and end of the line technically speaking. 请注意,aaa在所有情况下都匹配,因为从技术上讲,它在行的开头和结尾有三个a。

So the final, technically correct solution, for finding a "word" that is "up to five characters long" on a line would be: 因此,最终的,技术上正确的解决方案是,在一条线上找到“最多五个字符长”的“单词”将是:

^[a-zA-Z]{1,25}$ ^ [A-ZA-Z] {1,25} $

The funky part is that some technologies implicitly put anchors in the regex for you and some don't. 时髦的部分是,有些技术隐含地为你设置了正则表达式,有些技术没有。 You just have to test your regex or read the docs to see if you have implicit anchors. 您只需要测试正则表达式或阅读文档以查看是否有隐式锚点。

/// <summary>
/// Checks if string contains only letters a-z and A-Z and should not be more than 25 characters in length
/// </summary>
/// <param name="value">String to be matched</param>
/// <returns>True if matches, false otherwise</returns>
public static bool IsValidString(string value)
{
    string pattern = @"^[a-zA-Z]{1,25}$";
    return Regex.IsMatch(value, pattern);
}

The string can be up to 25 letters long. 该字符串最长可达25个字母。 (I'm not sure if regex can check length of strings) (我不确定正则表达式是否可以检查字符串的长度)

Regexes ceartanly can check length of a string - as can be seen from the answers posted by others. 正则表达式可以检查字符串的长度 - 从其他人发布的答案中可以看出。

However, when you are validating a user input (say, a username), I would advise doing that check separately. 但是,当您验证用户输入(例如,用户名)时,我会建议单独执行该检查。

The problem is, that regex can only tell you if a string matched it or not. 问题是,正则表达式只能告诉你字符串是否与之匹配。 It won't tell why it didn't match. 它不会告诉它为什么不匹配。 Was the text too long or did it contain unallowed characters - you can't tell. 文本太长还是包含不允许的字符 - 你无法分辨。 It's far from friendly, when a program says: "The supplied username contained invalid characters or was too long". 当一个程序说:“提供的用户名包含无效字符或太长”时,它远非友好。 Instead you should provide separate error messages for different situations. 相反,您应该为不同的情况提供单独的错误消息。

The regular expression you are using is an alternation of [^az] and [^AZ] . 您正在使用的正则表达式是[^az][^AZ]的交替。 And the expressions [^…] mean to match any character other than those described in the character set. 表达式[^…]意味着匹配除字符集中描述的字符之外的任何字符。

So overall your expression means to match either any single character other than az or other than AZ . 总的来说,表达式意味着匹配除az之外的任何单个字符或AZ之外的任何单个字符。

But you rather need a regular expression that matches a-zA-Z only: 但您需要一个仅与a-zA-Z匹配的正则表达式:

[a-zA-Z]

And to specify the length of that, anchor the expression with the start ( ^ ) and end ( $ ) of the string and describe the length with the { n , m } quantifier, meaning at least n but not more than m repetitions: 并指定其长度,使用字符串的开始( ^ )和结束( $ )锚定表达式,并使用{ n , m }量词描述长度,表示至少n但不超过m次重复:

^[a-zA-Z]{0,25}$

Do I understand correctly that it can only contain either uppercase or lowercase letters? 难道我理解正确的话,它只能包含大写 小写字母?

new Regex("^([a-z]{1,25}|[A-Z]{1,25})$")

A regular expression seems to be the right thing to use for this case. 对于这种情况,正则表达式似乎是正确的用法。

By the way, the caret ("^") at the first place inside a character class means "not", so your " [^az]|[^AZ] " would mean "not any lowercase letter, or not any uppercase letter" (disregarding that az are not all letters). 顺便说一下,字符类中第一个位置的插入符号(“^”)表示“不”,所以你的“ [^az]|[^AZ] ”意思是“不是任何小写字母,或者不是任何大写字母“(不管az不是所有字母)。

There are excellent interactive tools for developing and testing regex expressions: 有很好的交互式工具可用于开发和测试正则表达式:

They're a great help because they tell you right away if your expression works as expected and even allow you to step through and debug. 它们是一个很好的帮助,因为它们会立即告诉您表达是否符合预期,甚至允许您单步调试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM