简体   繁体   English

正则表达式至少匹配两次“ this”或“ that”

[英]regex to match of the occurrence for either “this” or “that” at least twice in a sentence

I want create a regex in PHP that searches the sentences in a text which contain "this" or "that" at least twice (so at least twice "this" or at least twice "that") 我想在PHP中创建一个正则表达式,以搜索包含“ this”或“ that”的文本中的句子至少两次(因此,至少两次“ this”或至少两次“ that”)

We got stuck at: 我们被困在:

([^.?!]*(\bthis|that\b){2,}[^.?!]*[.|!|?]+)

Use this Pattern (\\b(?:this|that)\\b).*?\\1 Demo 使用此模式(\\b(?:this|that)\\b).*?\\1 演示

(               # Capturing Group (1)
  \b            # <word boundary>
  (?:           # Non Capturing Group
    this        # "this"
    |           # OR
    that        # "that"
  )             # End of Non Capturing Group
  \b            # <word boundary>
)               # End of Capturing Group (1)
.               # Any character except line break
*?              # (zero or more)(lazy)
\1              # Back reference to group (1)

This is mostly Wiktor's pattern with a deviation to isolate the sentences and omit the leading white-space characters from the fullstring matches. 这主要是Wiktor的模式,带有偏差以隔离句子并从全串匹配中省略前导空白字符。

Pattern: /\\b[^.?!]*\\b(th(?:is|at))\\b[^.?!]*(\\b\\1\\b)[^.?!]*\\b[.!?]/i 模式:/ /\\b[^.?!]*\\b(th(?:is|at))\\b[^.?!]*(\\b\\1\\b)[^.?!]*\\b[.!?]/i : /\\b[^.?!]*\\b(th(?:is|at))\\b[^.?!]*(\\b\\1\\b)[^.?!]*\\b[.!?]/i

Here is a sample text that will demonstrate how the other answers will not correctly disqualify unwanted matches for "word boundary" or "case-insensitive" reasons: ( Demo - capture group applied to \\b\\1\\b in the demo to show which substrings are qualifying the sentences for matching ) 这是一个示例文本,将演示由于“单词边界”或“不区分大小写”的原因,其他答案将如何不正确取消不需要的匹配项:( 演示 - 在演示中应用于\\b\\1\\b捕获组以显示子字符串使符合条件的句子符合条件

This is nothing.
That is what that will be.
The Indian policeman hit the thief with his lathis before pushing him into the thistles.
This Indian policeman hit the thief with this lathis before pushing him into the thistles.  This is that and that.
The Indian policeman hit the thief with this lathis before pushing him into the thistles.

To see the official breakdown of the pattern, refer to the demo link. 要查看该模式的正式细分,请参阅演示链接。

In plain terms: 简而言之:

/                  #start of pattern
\b                 #match start of a sentence on a "word character"
[^.?!]*            #match zero or more characters not a dot, question mark, or exclamation
\b(th(?:is|at))\b  #match whole word "this" or "that"  (not thistle)
[^.?!]*            #match zero or more characters not a dot, question mark, or exclamation
\b\1\b             #match the earlier captured whole word "this" or "that"
[^.?!]*            #match zero or more characters not a dot, question mark, or exclamation
\b                 #match second last character of sentence as "word character"
[.!?]              #match the end of a sentence: dot, question mark, exclamation
/                  #end of pattern
i                  #make pattern case-insensitive

The pattern will match three of the five sentences from the above sample text: 该模式将匹配上述示例文本中的五个句子中的三个:

That this is what that will be.
This Indian policeman hit the thief with this lathis before pushing him into the thistles.
This is that and that.

*note, previously I was using \\s*\\K at the start of my pattern to omit the white-space characters. *注意,以前我在模式开始时使用\\s*\\K来省略空格字符。 I've elected to alter my pattern to use additional word boundary meta-characters for improved efficiency. 我选择更改模式以使用其他单词边界元字符来提高效率。 If this doesn't work with your project text, it may be better to revert to my original pattern . 如果这不适用于您的项目文本,则最好恢复到我的原始模式

Use this 用这个

.*(this|that).*(this|that).*

http://regexr.com/3ggq5 http://regexr.com/3ggq5

UPDATE : 更新

This is another way, based in your regex: 这是基于正则表达式的另一种方式:

.*(this\s?|that\s?){2,}.*[\.\n]*

http://regexr.com/3ggq8 http://regexr.com/3ggq8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM