[英]How to extract a sentence between hyphen or asterisk from a paragraph using regex in python
import re
line="Hello world -- sam -- , How are you? what are *you* doing?"
pattern=r"(?<=\-|\*)(.*?)(?=\-\*)"
print(re.findall(pattern,line))
我得到的 output 是“無”。 幫我解釋一下——我應該使用哪種模式,這樣我才能得到這個 output:
sam
you
你在找這個嗎?
/[-]{2}\s*(.*?)[-]{2}\s*|[\*]{1}\s*(.*?)[\*]{1}\s*/gm
捕獲組 1。
這是預覽https://regex101.com/r/ms1dxy/5
細節:
1st Alternative [-]{2}\s*(.*?)[-]{2}\s*
[-]{2} match character - exactly 2 times.
\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times
1st Capturing Group (.*?)
.*? matches any character (except for line terminators) between zero and unlimited times
[-]{2} match character - exactly 2 times.
\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times
2nd Alternative [\*]{1}\s*(.*?)[\*]{1}\s*
[\*]{1} match character * exactly 1 time.
\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times
1st Capturing Group (.*?)
.*? matches any character (except for line terminators) between zero and unlimited times
[\*]{1} match character * exactly 1 time.
\s* matches any whitespace character (equal to [\r\n\t\f\v ]) between zero and unlimited times
您的問題對正則表達式的約束沒有足夠的了解,無法獲得正確的答案。 但是,如果這個 ( RegEx
) 對你來說是新的,那似乎很好。 我(實際上)想說的是:
這會起作用:
((?:--[\w\s]+--)|(?:\*[\w\s]+\*))
在這一個中, token
和“定界符”之間允許有任意/未指定數量的空格。
...但是這個RegEx
也可以工作 - 它會匹配String's
不同子集(包括您在問題中提供的子集):
((?:-- \w+ --)|(?:\*\w+\*))
此RegEx
精確匹配您在示例中提供的空格數,但會拒絕您可能想到的其他匹配項。 這是所問問題中示例的不明確部分。 下面,標記將與上面的表達式不匹配(它們都不匹配):
"How are you * doing * today?" "Do you think --Regular Expressions-- are useful to programmers?" "This particular -- #token3 -- has a non-word symbol in it"
這個 Regular-Expression 可能是最“包羅萬象”的解決方案,但也許您不需要匹配不含單詞的Tokens :
((?:--[^-\n]+--)|(?:\*[^\*\n]+\*))
此正則表達式將匹配任何文本作為令牌- 包含換行符\n
或指定分隔符*
或-
的文本除外。 例如,閱讀以下示例:
"This example -- token has spaces and the $ symbol -- This does match," "This one *here-has-a-few-dashes*. which suits this regex just fine." "This example --misses-completely-- because the token contains the delimiter!"
簡而言之,就 python 的正則表達式而言,可能已經發布了數十種變體,所有這些變體都可以解決該問題中提到的一個示例。 此外,可能還需要使用其他后(后)reg-ex 匹配處理。 例如,您可能需要 String 的trim()
function 或 String replace
……我個人無法分辨。 堅持下去。
您不會消耗所有連續的左右上下文。 這是環顧四周的錯誤使用。
采用
[-*]+\s*([^\s*-].*?)\s*[-*]+
見證明。
解釋
--------------------------------------------------------------------------------
[-*]+ any character of: '-', '*' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^\s*-] any character except: whitespace (\n,
\r, \t, \f, and " "), '*', '-'
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
import re
line="Hello world -- sam -- , How are you? what are *you* doing?"
pattern=r"[-*]+\s*([^\s*-].*?)\s*[-*]+"
print(re.findall(pattern,line))
結果:
['sam', 'you']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.