如何构建正则表达式以匹配模式，同时排除某些与模式匹配的已知单词

Question

How do I build a regex to match a pattern while excluding certain known words that would match the pattern.如何构建正则表达式以匹配模式，同时排除某些与模式匹配的已知单词。 In example I have this string:例如，我有这个字符串：

I like to dream at going to do hikin g.我喜欢梦想去远足。

and I have the following regex: \\b(.{1,2}(\\s|.|-|_)){2,}我有以下正则表达式： \\b(.{1,2}(\\s|.|-|_)){2,}

This matches:这匹配：

to dream at梦想在

to do hikin g.做远足 g.

What I want is to change this regex in a way to match:我想要的是以匹配的方式更改此正则表达式：

dream梦

hikin g.远足。

If I change it to this \\b([^(to)]{1,2}(\\s|.|-|_)){2,}如果我把它改成这个\\b([^(to)]{1,2}(\\s|.|-|_)){2,}

it will partially work but it would exclude individual letters like 't' 'o' instead of the entire word 'to'它会部分工作，但它会排除单个字母，如“t”“o”而不是整个单词“to”

How to solve this?如何解决这个问题？

Answer 1

You may use您可以使用

/\b(?!(?:I|at|[td]o)\b)\w{1,2}(?:[\W_](?!(?:I|at|[td]o)\b)\w{1,2})*\b/

See this Rubular demo看到这个 Rubular 演示

It matches它匹配

\\b - a word boundary \\b - 单词边界
(?!(?:I|at|[td]o)\\b)\\w{1,2} - followed with a 1 or 2 word char word not equal to I , at , to or do (?!(?:I|at|[td]o)\\b)\\w{1,2} - 后跟不等于I 、 at 、 to或do的 1 或 2 个字字符
(?:[\\W_](?!(?:I|at|[td]o)\\b)\\w{1,2})* - 0+ repetitions of: (?:[\\W_](?!(?:I|at|[td]o)\\b)\\w{1,2})* - 0+ 次重复：
- [\\W_] - a non-word char or _ [\\W_] - 非字字符或_
- (?!(?:I|at|[td]o)\\b)\\w{1,2} - see above (?!(?:I|at|[td]o)\\b)\\w{1,2} - 见上文
\\b - a word boundary. \\b - 单词边界。

如何构建正则表达式以匹配模式，同时排除某些与模式匹配的已知单词

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-02-27 23:44:24

如何构建正则表达式以匹配模式，同时排除某些与模式匹配的已知单词

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-02-27 23:44:24

解决方案1
2 已采纳 2018-02-27 23:44:24