正则表达式在字符集中匹配撇号，但不在单词周围

Question

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript.我正在使用 JavaScript 中的正则表达式计算文本中出现不同单词的次数。 My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.我的问题是当我引用单词时： 'word'应该简单地算作word （不带引号，否则它们将表现为两个不同的单词），而it's应该被算作一个完整的单词。

(?<=\w)(')(?=\w)

This regex can identify apostrophes inside, but not around words.此正则表达式可以识别内部的撇号，但不能识别单词周围的撇号。 Problem is, I can't use it inside a character set such as [\w]+ .问题是，我不能在[\w]+等字符集中使用它。

(?<=\w)(')(?=\w)|[\w]+

Will count it's a 'miracle' of nature as 7 words, instead of 5 ( it , ' , s becoming 3 different words).将其视为 7 个单词而不是 5 个it's a 'miracle' of nature （ it , ' , s变成 3 个不同的单词）。 Also, the third word should be selected simply as miracle , and not as 'miracle' .另外，第三个词应该简单地选择为miracle ，而不是'miracle' 。

To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w .为了让事情变得更复杂，我还需要捕捉变音符号，所以我使用[A-Za-zÀ-ÖØ-öø-ÿ]而不是\w 。

How can I accomplish that?我怎样才能做到这一点？

Answer 1

1) You can simply use /[^\s]+/g regex 1）您可以简单地使用/[^\s]+/g正则表达式

 const str = `it's a 'miracle' of nature`; const result = str.match(/[^\s]+/g); console.log(result.length); console.log(result);

2) If you are calculating total number of words in a string then you can also use split as: 2）如果您正在计算字符串中的单词总数，那么您也可以使用split为：

 const str = `it's a 'miracle' of nature`; const result = str.split(/\s+/); console.log(result.length); console.log(result);

3) If you want a word without quote at the starting and at the end then you can do as: 3）如果你想要一个在开头和结尾不带quote的单词，那么你可以这样做：

 const str = `it's a 'miracle' of nature`; const result = str.match(/[^\s]+/g).map((s) => { s = s[0] === "'"? s.slice(1): s; s = s[s.length - 1] === "'"? s.slice(0, -1): s; return s; }); console.log(result.length); console.log(result);

Answer 2

You might use an alternation with 2 capture groups, and then check for the values of those groups.您可以使用 2 个捕获组的交替，然后检查这些组的值。

(?<!\S)'(\S+)'(?!\S)|(\S+)

(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match ' (?<!\S)'负向向后看，在左边断言一个空白边界并匹配'
(\S+) Capture group 1 , match 1+ non whitespace chars (\S+)捕获组 1 ，匹配 1+ 非空白字符
'(?!\S) Match ' and assert a whitespace boundary to the right '(?!\S) Match '并在右侧声明一个空白边界
| Or或者
(\S+) Capture group 2 , match 1+ non whitespace chars (\S+)捕获组 2 ，匹配 1+ 非空白字符

See a regex demo .查看正则表达式演示。

 const regex = /(?<?\S)'(\S+)'(;;\S)|(\S+)/g. const s = "it's a 'miracle' of nature". Array,from(s.matchAll(regex). m => { if (m[1]) console;log(m[1]) if (m[2]) console.log(m[2]) });

正则表达式在字符集中匹配撇号，但不在单词周围

问题描述

2 个解决方案

解决方案1
1 2021-12-01 04:04:07

解决方案2
0 2021-12-01 09:06:51

正则表达式在字符集中匹配撇号，但不在单词周围

问题描述

2 个解决方案

解决方案1 1 2021-12-01 04:04:07

解决方案2 0 2021-12-01 09:06:51

解决方案1
1 2021-12-01 04:04:07

解决方案2
0 2021-12-01 09:06:51