從字符串中刪除特定句子

Question

我有以下格式的字符串：（有 3 個或更多空格的句子和這些句子之間的句子是表格數據的一部分）

Some Sentence
Some sentence


Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -

Some paragraph
Some paragraph

Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -

我需要從包含 3 個或更多空格的字符串中刪除所有句子，記住應該保留實際的段落內容。

下面是我的方法，它沒有給我准確的結果，我也不喜歡使用 range(5)：

for i in range(5):
result = re.sub('[\\n-].* {3,}.*\\n', '', result)
print(result)

我的邏輯輸出：

Some Sentence
Some sentence


Additions based on tax positions related to the
Additions based on tax positions related to prior



Some paragraph
Some paragraph

Additions based on tax positions related to the
Additions based on tax positions related to prior

預期輸出：

Some Sentence
Some sentence


Some paragraph
Some paragraph

還可以做些什么來刪除句子之間的句子（有 3 個或更多空格）？

Answer 1

sentences = """
Some Sentence
Some sentence


Additions based on tax positions related to the
Additions based on tax positions related to prior



Some paragraph
Some paragraph

Additions based on tax positions related to the
Additions based on tax positions related to prior
"""

splitted_sentences = sentences.split('\n')

only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3]
short_sentences_str = '\n'.join(only_short_sentences)
print(short_sentences_str)

輸出：

Some Sentence
Some sentence





Some paragraph
Some paragraph

如果你想丟棄空行 - 轉換為這個版本的列表理解：

only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3 and line]

這是預期的結果嗎？

已編輯

輸入：

sentences = """
Some Sentence
Some sentence


Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -

Some paragraph
Some paragraph

Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -
"""

輸出：

Some Sentence
Some sentence






Some paragraph
Some paragraph

Answer 2

對此有一個簡單的正則表達式（我已將您的輸入放入文件“test.txt”中）：

grep -v " .* .* " test.txt

如您所見，它只是三個空格，中間是".*" ，它代表“每個可能的字符，重復未知次數（可能為零）”。
哦，差點忘了：在"-v"代表“的事情不要在結果中看到”。

顯然您知道re Python 庫，因此您可能知道如何將此正則表達式嵌入到您的 Python 源代碼中。

祝你好運

從字符串中刪除特定句子

問題描述

2 個解決方案

解決方案1
2 2020-09-18 12:14:03

已編輯

解決方案2
0 2020-09-18 12:22:02

從字符串中刪除特定句子

問題描述

2 個解決方案

解決方案1 2 2020-09-18 12:14:03

已編輯

解決方案2 0 2020-09-18 12:22:02

解決方案1
2 2020-09-18 12:14:03

解決方案2
0 2020-09-18 12:22:02