简体   繁体   中英

Regex to count the total number of words in a string

The total number of words in this string is 11. But my code returns 13.

var txt = "Helllo, my -! This is a great day to say helllo.\n\n\tHelllo! 2 3 4 23";
txt = txt.replace(/[0-9]/g, '');
var words_count = txt.match(/\S+/g).length;

\\S+ will match any non-space character, which will include substrings like -! . You might match sequences of non-space characters which also include at least one alphabetical character in them, with \\S*[az]\\S* :

 var txt = "Helllo, my -! This is a great day to sayhelllo.\\n\\n\\tHelllo! 2 3 4 23"; console.log(txt.match(/\\S*[az]\\S*/gi).length);

If you can count on what you want to count as a "word" to start with an alphabetical character, you can remove the leading \\S* .

If you want to make the trailing \\S* more restrictive, you could whitelist a list of permitted characters inside "words", like ' if you want:

 var txt = "Helllo, my -! This is a great day to sayhelllo.\\n\\n\\tHelllo! 2 3 4 23"; console.log(txt.match(/[az][a-z']*/gi).length);

(to add more characters to the whitelist, just expand the [a-z'] character set to whatever you need)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM