简体   繁体   English

如何构建正则表达式以匹配模式,同时排除某些与模式匹配的已知单词

[英]How do I build a regex to match a pattern while excluding certain known words that would match the pattern

How do I build a regex to match a pattern while excluding certain known words that would match the pattern.如何构建正则表达式以匹配模式,同时排除某些与模式匹配的已知单词。 In example I have this string:例如,我有这个字符串:

I like to dream at going to do hikin g.我喜欢梦想去远足。

and I have the following regex: \\b(.{1,2}(\\s|.|-|_)){2,}我有以下正则表达式: \\b(.{1,2}(\\s|.|-|_)){2,}

This matches:这匹配:

to dream at梦想在

to do hikin g.做远足 g.

What I want is to change this regex in a way to match:我想要的是以匹配的方式更改此正则表达式:

dream

hikin g.远足。

If I change it to this \\b([^(to)]{1,2}(\\s|.|-|_)){2,}如果我把它改成这个\\b([^(to)]{1,2}(\\s|.|-|_)){2,}

it will partially work but it would exclude individual letters like 't' 'o' instead of the entire word 'to'它会部分工作,但它会排除单个字母,如“t”“o”而不是整个单词“to”

How to solve this?如何解决这个问题?

You may use您可以使用

/\b(?!(?:I|at|[td]o)\b)\w{1,2}(?:[\W_](?!(?:I|at|[td]o)\b)\w{1,2})*\b/

See this Rubular demo看到这个 Rubular 演示

It matches它匹配

  • \\b - a word boundary \\b - 单词边界
  • (?!(?:I|at|[td]o)\\b)\\w{1,2} - followed with a 1 or 2 word char word not equal to I , at , to or do (?!(?:I|at|[td]o)\\b)\\w{1,2} - 后跟不等于Iattodo的 1 或 2 个字字符
  • (?:[\\W_](?!(?:I|at|[td]o)\\b)\\w{1,2})* - 0+ repetitions of: (?:[\\W_](?!(?:I|at|[td]o)\\b)\\w{1,2})* - 0+ 次重复:
    • [\\W_] - a non-word char or _ [\\W_] - 非字字符或_
    • (?!(?:I|at|[td]o)\\b)\\w{1,2} - see above (?!(?:I|at|[td]o)\\b)\\w{1,2} - 见上文
  • \\b - a word boundary. \\b - 单词边界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM