[英]Remove specific sentences from string
我有以下格式的字符串:(有 3 個或更多空格的句子和這些句子之間的句子是表格數據的一部分)
Some Sentence
Some sentence
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
Some paragraph
Some paragraph
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
我需要從包含 3 個或更多空格的字符串中刪除所有句子,記住應該保留實際的段落內容。
下面是我的方法,它沒有給我准確的結果,我也不喜歡使用 range(5):
for i in range(5):
result = re.sub('[\\n-].* {3,}.*\\n', '', result)
print(result)
我的邏輯輸出:
Some Sentence
Some sentence
Additions based on tax positions related to the
Additions based on tax positions related to prior
Some paragraph
Some paragraph
Additions based on tax positions related to the
Additions based on tax positions related to prior
預期輸出:
Some Sentence
Some sentence
Some paragraph
Some paragraph
還可以做些什么來刪除句子之間的句子(有 3 個或更多空格)?
sentences = """
Some Sentence
Some sentence
Additions based on tax positions related to the
Additions based on tax positions related to prior
Some paragraph
Some paragraph
Additions based on tax positions related to the
Additions based on tax positions related to prior
"""
splitted_sentences = sentences.split('\n')
only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3]
short_sentences_str = '\n'.join(only_short_sentences)
print(short_sentences_str)
輸出:
Some Sentence
Some sentence
Some paragraph
Some paragraph
如果你想丟棄空行 - 轉換為這個版本的列表理解:
only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3 and line]
這是預期的結果嗎?
輸入:
sentences = """
Some Sentence
Some sentence
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
Some paragraph
Some paragraph
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
"""
輸出:
Some Sentence
Some sentence
Some paragraph
Some paragraph
對此有一個簡單的正則表達式(我已將您的輸入放入文件“test.txt”中):
grep -v " .* .* " test.txt
如您所見,它只是三個空格,中間是".*"
,它代表“每個可能的字符,重復未知次數(可能為零)”。
哦,差點忘了:在"-v"
代表“的事情不要在結果中看到”。
顯然您知道re
Python 庫,因此您可能知道如何將此正則表達式嵌入到您的 Python 源代碼中。
祝你好運
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.