[英]How do I build a regex to match a pattern while excluding certain known words that would match the pattern
How do I build a regex to match a pattern while excluding certain known words that would match the pattern.如何构建正则表达式以匹配模式,同时排除某些与模式匹配的已知单词。 In example I have this string:
例如,我有这个字符串:
I like to dream at going to do hikin g.我喜欢梦想去远足。
and I have the following regex: \\b(.{1,2}(\\s|.|-|_)){2,}我有以下正则表达式: \\b(.{1,2}(\\s|.|-|_)){2,}
This matches:这匹配:
to dream at梦想在
to do hikin g.做远足 g.
What I want is to change this regex in a way to match:我想要的是以匹配的方式更改此正则表达式:
dream梦
hikin g.远足。
If I change it to this \\b([^(to)]{1,2}(\\s|.|-|_)){2,}如果我把它改成这个\\b([^(to)]{1,2}(\\s|.|-|_)){2,}
it will partially work but it would exclude individual letters like 't' 'o' instead of the entire word 'to'它会部分工作,但它会排除单个字母,如“t”“o”而不是整个单词“to”
How to solve this?如何解决这个问题?
You may use您可以使用
/\b(?!(?:I|at|[td]o)\b)\w{1,2}(?:[\W_](?!(?:I|at|[td]o)\b)\w{1,2})*\b/
See this Rubular demo看到这个 Rubular 演示
It matches它匹配
\\b
- a word boundary \\b
- 单词边界(?!(?:I|at|[td]o)\\b)\\w{1,2}
- followed with a 1 or 2 word char word not equal to I
, at
, to
or do
(?!(?:I|at|[td]o)\\b)\\w{1,2}
- 后跟不等于I
、 at
、 to
或do
的 1 或 2 个字字符(?:[\\W_](?!(?:I|at|[td]o)\\b)\\w{1,2})*
- 0+ repetitions of: (?:[\\W_](?!(?:I|at|[td]o)\\b)\\w{1,2})*
- 0+ 次重复:
[\\W_]
- a non-word char or _
[\\W_]
- 非字字符或_
(?!(?:I|at|[td]o)\\b)\\w{1,2}
- see above (?!(?:I|at|[td]o)\\b)\\w{1,2}
- 见上文\\b
- a word boundary. \\b
- 单词边界。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.