简体   繁体   English

用正则表达式限制单词数

[英]Limit the number of words with regular expression

The regular expression which you gave: ^(?:\\b\\w+\\b[\\s\\r\\n]*){1,250}$ to limit 250 words over multiple lines works if it doesn't have any special characters. 您提供的正则表达式^(?:\\b\\w+\\b[\\s\\r\\n]*){1,250}$如果没有任何特殊字符,则可以将250个单词限制为多行。

What should I do if I need to search for number of words which also consists special characters? 如果我需要搜索还包含特殊字符的单词数怎么办? Something like this an example: 像这样的例子:

--> Hi! i need help with regular expression, please help me. <--

最简单的方法是对单词字符进行分组,并将这些分组限制在特定范围内(1-250):

^\W*(\w+(\W+|$)){1,250}$

I am not familiar with C# so I will describe the regex. 我对C#不熟悉,因此我将描述正则表达式。

Method 1: 方法1:

You are basically looking for this: 您基本上是在寻找这个:

(\b[^\s]+\b){1,250}

In java: 在Java中:

\\s is any whitespace character. \\s是任何空格字符。

[^\\s]+ is a sequence of non-whitespace characters. [^\\s]+是一系列非空白字符。

\\b is a word boundary. \\b是单词边界。

You can translate the regex to C#. 您可以将正则表达式转换为C#。

Method 2: 方法2:

Tokenize the input text into whitespace delimited words. 将输入文本标记为空格分隔的单词。 In java, this is done by: 在Java中,这是通过以下方式完成的:

String[] tokens = inputString.split("\\s+");

where the regex is \\s+ 正则表达式为\\s+

Now you can count the length of the array and implement your logic to reject the words beyond 250. 现在,您可以计算数组的长度,并实现逻辑以拒绝超过250个字。

Method 3: 方法3:

Define a pattern to capture whitespace as a 'capturing group'. 定义一个模式以将空白捕获为“捕获组”。

(\s+)

Now you can do a count the number of matches in your pattern matcher using a while loop. 现在,您可以使用while循环对模式匹配器中的匹配数进行计数。 This is essentially kinda same as Method 2 but without involving the creation of the array of tokens. 这本质上与方法2相同,但不涉及创建令牌数组。

A bit late to answer but none of the solutions here worked: 回答迟了一点,但是这里没有解决方案:

^([a-zA-Z0-9]+[^a-zA-Z0-9]*){1,8}$ ^([A-ZA-Z0-9] + [^ A-ZA-Z0-9] *){1,8} $

where {1,8} defines how many wordt you want {1,8}定义您想要多少字

You can use the {a,b} quantifiers on any expression, like so: 您可以在任何表达式上使用{a,b}量词,如下所示:

.{1,256}
[\d\w_?]{1,567}
(0x)?[0-9A-F]{1,}

So, in your case, you could use: 因此,就您而言,您可以使用:

^(?:\b\w+\b[_!?\s\r\n]*){1,250}$

Where the _!? _!? can be any special characters. 可以是任何特殊字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM