Python正則表達式排除包含單詞的文本

Question

我試圖在python中使用正則表達式過濾文本。 目的是：檢查文本中的單詞W前面沒有X還是后面沒有Y。因此，可以這樣說：

W = “白天” ，X = “可怕” ，Y = “輕”

"what a beautiful day it is" => should pass
"nice day"          => should pass    
"awful day"         => should fail
"such an awful day" => should fail
"the day light"     => should fail
"awful day light"   => should fail
"day light"         => should fail

我已經嘗試過幾種方法，例如：

r".*\b(?!awful\b)day\b.*"
r"\W*\b(?!awful\b)day\b.*"  => to be able to include \n \r since '.' doesnt

r".*\b(day)\b(?!light\b).*"
r"\W*\b(day)\b(?!light\b)\W*"  => to be able to include \n \r since '.' doesnt

如此完整的示例將是（應該失敗）

if (re.search(r".*\b(?!awful\b)day\b.*", "such an awful day", re.UNICODE):
    print "Found awful day! no good!"

仍然想知道該怎么做！ 有任何想法嗎？

Answer 1

像這樣嗎

 # ^(?s)((?!X).)*W((?!Y).)*$

 ^ 
 (?s)
 (
      (?! X )
      . 
 )*
 W 
 (
      (?! Y )
      . 
 )*
 $

或者，帶有單詞邊界

 # ^(?s)((?!\bX\b).)*\bW\b((?!\bY\b).)*$

 ^ 
 (?s)
 (
      (?! \b X \b )
      . 
 )*
 \b W \b 
 (
      (?! \b Y \b )
      . 
 )*
 $

編輯 -尚不清楚您是否要用空格分隔X <-> W <-> Y
或任意數量的字符。 這個擴展的注釋示例顯示了兩種方式。
祝好運！
注意- (?add-remove)構造是修飾符組。 通常是一種方法
在正則表達式中嵌入諸如s（全部否定），i（忽略大小寫）等選項。
其中(?s)表示添加Dot-All修飾符，而(?si)相同，但忽略大小寫。

 #  ^(?s)(?!.*(?:\bX\b\s+\bW\b|\bW\b\s+\bY\b))(?:.*\b(W)\b.*|.*)$

 # This regex validates W is not preceded by X
 # nor followed by Y.
 # It also optionally finds W.
 # Only fails if its invalid.
 # If passed, can check if W present by
 # examining capture group 1.

 ^                         # Beginning of string
 (?s)                      # Modifier group, with s = DOT_ALL
 (?!                       # Negative looahead assertion
      .*                        # 0 or more any character (dot-all is set, so we match newlines too)
      (?:
           \b X \b \s+ \b W \b       # Trying to match X, 1 or more whitespaces, then W
        |  \b W \b \s+ \b Y \b       # Or, Trying to match W, 1 or more whitespaces, then Y

           # Substitute this to find any interval between X<->W<->Y
           #    \b X \b .* \b W \b       <- Trying to match X, 0 or more any char, then W
           # |  \b W \b .* \b Y \b       <- Or, Trying to match W, 0 or more any char, then Y
      )
 )

 # Still at start of line. 
 # If here, we didn't find any X<->W, nor W<->Y.
 # Opotioinally finds W in group 1.
 (?:
      .* \b 
      ( W )                     # (1), W
      \b .* 
   |  
      .* 
 )
 $                         # End of string

Answer 2

你快到了。 嘗試：

(?<!\bawful\b )\bday\b(?!\s+\blight\b)

演示：

st='''\
"what a beautiful day it is" => should pass
"nice day"          => should pass    
"awful day"         => should fail
"such an awful day" => should fail
"the day light"     => should fail
"awful day light"   => should fail
"day light"         => should fail'''

W, X, Y = 'day', 'awful', 'light'
pat=r'(?<!\b{}\b )\b{}\b(?!\s+\b{}\b)'.format(X, W, Y)

import re

for line in st.splitlines():
    m=re.search(pat, line)
    if m:
        print line

Python正則表達式排除包含單詞的文本

問題描述

2 個解決方案

解決方案1
2 已采納

解決方案2
2 2014-01-27 22:47:04

Python正則表達式排除包含單詞的文本

問題描述

2 個解決方案

解決方案1 2 已采納

解決方案2 2 2014-01-27 22:47:04

解決方案1
2 已采納

解決方案2
2 2014-01-27 22:47:04