如何在 python 中使用正則表達式從段落中提取連字符或星號之間的句子

Question

import re
line="Hello world -- sam -- , How are you? what are *you* doing?"
pattern=r"(?<=\-|\*)(.*?)(?=\-\*)"
print(re.findall(pattern,line))

我得到的 output 是“無”。 幫我解釋一下——我應該使用哪種模式，這樣我才能得到這個 output：

sam
you

Answer 1

你在找這個嗎？

 /[-]{2}\s*(.*?)[-]{2}\s*|[\*]{1}\s*(.*?)[\*]{1}\s*/gm

捕獲組 1。

這是預覽https://regex101.com/r/ms1dxy/5

細節：

1st Alternative [-]{2}\s*(.*?)[-]{2}\s*

[-]{2} match character - exactly 2 times.

\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times

1st Capturing Group (.*?)

.*? matches any character (except for line terminators) between zero and unlimited times

[-]{2} match character - exactly 2 times.

\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times


2nd Alternative [\*]{1}\s*(.*?)[\*]{1}\s*

[\*]{1} match character * exactly 1 time.

\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times

1st Capturing Group (.*?)

.*? matches any character (except for line terminators) between zero and unlimited times

[\*]{1} match character * exactly 1 time.

\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times

Answer 2

您的問題對正則表達式的約束沒有足夠的了解，無法獲得正確的答案。 但是，如果這個 ( RegEx ) 對你來說是新的，那似乎很好。 我（實際上）想說的是：

這會起作用：

((?:--[\w\s]+--)|(?:\*[\w\s]+\*))

在這一個中， token和“定界符”之間允許有任意/未指定數量的空格。

...但是這個RegEx也可以工作 - 它會匹配String's不同子集（包括您在問題中提供的子集）：

((?:-- \w+ --)|(?:\*\w+\*))

此RegEx精確匹配您在示例中提供的空格數，但會拒絕您可能想到的其他匹配項。 這是所問問題中示例的不明確部分。 下面，標記將與上面的表達式不匹配（它們都不匹配）：

 "How are you * doing * today?" "Do you think --Regular Expressions-- are useful to programmers?" "This particular -- #token3 -- has a non-word symbol in it"

這個 Regular-Expression 可能是最“包羅萬象”的解決方案，但也許您不需要匹配不含單詞的Tokens ：

((?:--[^-\n]+--)|(?:\*[^\*\n]+\*))

此正則表達式將匹配任何文本作為令牌- 包含換行符\n或指定分隔符*或-的文本除外。 例如，閱讀以下示例：

 "This example -- token has spaces and the $ symbol -- This does match," "This one *here-has-a-few-dashes*. which suits this regex just fine." "This example --misses-completely-- because the token contains the delimiter!"

簡而言之，就 python 的正則表達式而言，可能已經發布了數十種變體，所有這些變體都可以解決該問題中提到的一個示例。 此外，可能還需要使用其他后（后）reg-ex 匹配處理。 例如，您可能需要 String 的trim() function 或 String replace ……我個人無法分辨。 堅持下去。

Answer 3

您不會消耗所有連續的左右上下文。 這是環顧四周的錯誤使用。

采用

[-*]+\s*([^\s*-].*?)\s*[-*]+

見證明。

解釋

--------------------------------------------------------------------------------
  [-*]+                    any character of: '-', '*' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^\s*-]                  any character except: whitespace (\n,
                             \r, \t, \f, and " "), '*', '-'
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))

Python 代碼：

import re
line="Hello world -- sam -- , How are you? what are *you* doing?"
pattern=r"[-*]+\s*([^\s*-].*?)\s*[-*]+"
print(re.findall(pattern,line))

結果：

['sam', 'you']

如何在 python 中使用正則表達式從段落中提取連字符或星號之間的句子

問題描述

3 個解決方案

解決方案1
0 2020-09-20 13:33:19

解決方案2
0 2020-09-20 14:09:32

解決方案3
0 2020-09-20 20:50:30

如何在 python 中使用正則表達式從段落中提取連字符或星號之間的句子

問題描述

3 個解決方案

解決方案1 0 2020-09-20 13:33:19

解決方案2 0 2020-09-20 14:09:32

解決方案3 0 2020-09-20 20:50:30

解決方案1
0 2020-09-20 13:33:19

解決方案2
0 2020-09-20 14:09:32

解決方案3
0 2020-09-20 20:50:30