簡體   English   中英

正則表達式至少匹配兩次“ this”或“ that”

[英]regex to match of the occurrence for either “this” or “that” at least twice in a sentence

我想在PHP中創建一個正則表達式,以搜索包含“ this”或“ that”的文本中的句子至少兩次(因此,至少兩次“ this”或至少兩次“ that”)

我們被困在:

([^.?!]*(\bthis|that\b){2,}[^.?!]*[.|!|?]+)

使用此模式(\\b(?:this|that)\\b).*?\\1 演示

(               # Capturing Group (1)
  \b            # <word boundary>
  (?:           # Non Capturing Group
    this        # "this"
    |           # OR
    that        # "that"
  )             # End of Non Capturing Group
  \b            # <word boundary>
)               # End of Capturing Group (1)
.               # Any character except line break
*?              # (zero or more)(lazy)
\1              # Back reference to group (1)

這主要是Wiktor的模式,帶有偏差以隔離句子並從全串匹配中省略前導空白字符。

模式:/ /\\b[^.?!]*\\b(th(?:is|at))\\b[^.?!]*(\\b\\1\\b)[^.?!]*\\b[.!?]/i : /\\b[^.?!]*\\b(th(?:is|at))\\b[^.?!]*(\\b\\1\\b)[^.?!]*\\b[.!?]/i

這是一個示例文本,將演示由於“單詞邊界”或“不區分大小寫”的原因,其他答案將如何不正確取消不需要的匹配項:( 演示 - 在演示中應用於\\b\\1\\b捕獲組以顯示子字符串使符合條件的句子符合條件

This is nothing.
That is what that will be.
The Indian policeman hit the thief with his lathis before pushing him into the thistles.
This Indian policeman hit the thief with this lathis before pushing him into the thistles.  This is that and that.
The Indian policeman hit the thief with this lathis before pushing him into the thistles.

要查看該模式的正式細分,請參閱演示鏈接。

簡而言之:

/                  #start of pattern
\b                 #match start of a sentence on a "word character"
[^.?!]*            #match zero or more characters not a dot, question mark, or exclamation
\b(th(?:is|at))\b  #match whole word "this" or "that"  (not thistle)
[^.?!]*            #match zero or more characters not a dot, question mark, or exclamation
\b\1\b             #match the earlier captured whole word "this" or "that"
[^.?!]*            #match zero or more characters not a dot, question mark, or exclamation
\b                 #match second last character of sentence as "word character"
[.!?]              #match the end of a sentence: dot, question mark, exclamation
/                  #end of pattern
i                  #make pattern case-insensitive

該模式將匹配上述示例文本中的五個句子中的三個:

That this is what that will be.
This Indian policeman hit the thief with this lathis before pushing him into the thistles.
This is that and that.

*注意,以前我在模式開始時使用\\s*\\K來省略空格字符。 我選擇更改模式以使用其他單詞邊界元字符來提高效率。 如果這不適用於您的項目文本,則最好恢復到我的原始模式

用這個

.*(this|that).*(this|that).*

http://regexr.com/3ggq5

更新

這是基於正則表達式的另一種方式:

.*(this\s?|that\s?){2,}.*[\.\n]*

http://regexr.com/3ggq8

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM