簡體   English   中英

使用正則表達式過濾在 - 或 * 之間嵌入文本部分的行

[英]Filter lines having text portions embedded either between - or * using regular expression

我必須使用正則表達式過濾在 - 或 * 之間嵌入文本部分的行

    zenPython = '''
    The Zen of Python, by Tim Peters
    
    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!

    '''
    portions=[]
    fp = io.StringIO(zenPython)
    
    zenlines = fp.readlines()
    
    zenlines = [ line.strip() for line in zenlines ]
    
    patterns = r"[-*] ?([^-*].*?) ?[-*]"
    texts = zenlines
    for line in lines:
      for text in texts:
        if re.search(patterns, text):
            portion = re.findall(patterns,text)
            portions.append(str(portion).replace('[\'','').replace('\']',''))
            return portions

需要 output:

['and preferably only one', 'right']

但是我得到['and preferably only one']我能知道為什么我沒有得到('right')嗎?

正如@coelhudo 在其回答中所說,您沒有得到預期的結果,因為main的 function 一旦找到匹配項就會返回。

只需將 return 語句移動到 function 的根級別即可解決問題(或者我們可以猜測是問題)。


也就是說,您的代碼中仍然存在問題:

  1. 永遠不會設置lines變量,並且for line in lines:使 function 崩潰
  2. "[-*]?([^-*].*?)?[-*]"可以匹配不需要的模式。 例如,字符串*This is not a test-將匹配正則表達式(參見此處)。 使用([-*])?([^-*].*?)?\1代替通過重用匹配第一個捕獲組的值來確保開始的“強調”字符與結束字符相同。
  3. 您可以直接訪問匹配項的字符串值,而不是將整個匹配項轉換為字符串並替換不需要的字符:
# portions.append(str(portion).replace('[\'','').replace('\']',''))  # hard to understand
portions.append(portion[0])  # much better
  1. 您的代碼假定每行只有一個匹配項,這對於 Python 的 Zen 是正確的,但對於任何其他文本可能是錯誤的。 所以你應該讓代碼處理這種情況。

這是您的 function 的重寫版本,解決了上述問題:

import io
import re


EMPHASIS_RE = re.compile(r"([-*]) ?([^-*].*?) ?\1")  # (2) will match only emphasis that start and end with the same character
ZEN = '''
    The Zen of Python, by Tim Peters

    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!
'''


def main():
    portions = []
    for line in map(str.strip, ZEN.split('\n')):
        emphasis = EMPHASIS_RE.findall(line)  # (4) find all the matches in the line
        if emphasis:
            # (3) gets directly the wanted part of the matches
            # and (4) add all matches in the line to the portions list
            portions.extend((match[1] for match in emphasis))
    return portions


print(main())  # ['and preferably only one', 'right']

它缺少一個結果,因為 function main 在完成循環之前返回。

改變:

for line in lines:
    for text in texts:
        if re.search(patterns, text):
            portion = re.findall(patterns,text)
            portions.append(str(portion).replace('[\'','').replace('\']',''))
            return portions

對此:

for line in lines:
    for text in texts:
        if re.search(patterns, text):
            portion = re.findall(patterns,text)
            portions.append(str(portion).replace('[\'','').replace('\']',''))
return portions

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM