简体   繁体   English

如何使用正则表达式计算文本区域中的单词和数字?

[英]How to count the words and numbers in textarea using regular expression?

I have problem on regular expression, its like i am trying to count the number of word in text area but i am not getting desired output. 我对正则表达式有问题,就像我试图计算文本区域中的单词数,但我没有得到想要的输出。 The main problem is, that it does not count the numbers for example "Hello world 123" it counts only 2. and for "123" it does not count at all. 主要问题是,它不计算数字,例如“ Hello world 123”仅计算2。而对于“ 123”,则完全不计算。 my regular expression is response.trim().replace(/\\b[\\s,-:;'"_]*\\b/gi, ' ').split(' '); 我的正则表达式是response.trim().replace(/\\b[\\s,-:;'"_]*\\b/gi, ' ').split(' ');

You should use /\\b|\\d+/gi to search for word boundaries or numbers, then count the number of elements in the array. 您应该使用/\\b|\\d+/gi搜索单词边界数字,然后计算数组中元素的数量。

var array = response.trim().match(/\b|\d+/gi);
var count = array.length;

As you've tagged this with php I assume a PHP answer is acceptable, in which case you don't need a regular expression. 正如您用php标记的那样,我认为PHP答案是可以接受的,在这种情况下,您不需要正则表达式。 You can just use str_word_count : 您可以只使用str_word_count

echo str_word_count("Hello world 123!", 0, '0..9'); // 3

Notice the third parameter which allows you to specify additional characters which make up a word. 注意第三个参数,它允许您指定组成一个单词的其他字符。 As a default, numbers are not included, hence the addition here. 默认情况下,不包括数字,因此此处为数字。

Alternatively you can use preg_match_all : 另外,您可以使用preg_match_all

preg_match_all('/\b[a-z\d]+\b/i', $string);

This will only count letters and numbers as word-characters. 这只会将字母和数字视为单词字符。

Your solution is almost perfect, but there are two problems: 您的解决方案几乎是完美的,但是有两个问题:

  1. replace "at least one" occurrence of word separator characters ( + ) instead of any ( * ) 替换出现的“至少一个”单词分隔符( + )而不是任何( *
  2. you have a ,-; 你有一个,-; character range in your character class ( [...] ), which unfortunately include all numbers. 字符类( [...] )中的字符范围,不幸的是包括所有数字。 When you want to match - (dash) put it always at the beginning of the character class! 当您想匹配时- (破折号)总是放在角色类的开头!

So the corrected regular expression: /\\b[-\\s,:;'"_]+\\b/gi 因此,更正后的正则表达式为: /\\b[-\\s,:;'"_]+\\b/gi

Edit: If you need to match every non-alphanumeric character, the use [\\W_] 编辑:如果需要匹配每个非字母数字字符,请使用[\\W_]

You can use 您可以使用

array = response.trim().match(/\w+/g);
count = array.length;

In your array only the words (alphanumerical strings) will be stored. 在您的数组中,只会存储单词(字母数字字符串)。

For the record, \\w is short for [a-zA-Z0-9] , which means it won't catch correctly words with special characters, like journée , but it will return 6 for I'd like 1 cup...plz! 根据记录, \\w[a-zA-Z0-9]缩写,这意味着它将无法正确捕获带有特殊字符的单词,例如journée ,但是它将返回6,因为I'd like 1 cup...plz! .

Take a look at this demo. 看一下这个演示。

DEMO DEMO

You can use : response.replace(/['?:_!'"@#$&%\\^*()\\\\\\/.-]/g,"").split(/[ \\n\\r]/); 您可以使用: response.replace(/['?:_!'"@#$&%\\^*()\\\\\\/.-]/g,"").split(/[ \\n\\r]/);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM