簡體   English   中英

從字符串中刪除特定句子

[英]Remove specific sentences from string

我有以下格式的字符串:(有 3 個或更多空格的句子和這些句子之間的句子是表格數據的一部分)

Some Sentence
Some sentence


Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -

Some paragraph
Some paragraph

Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -

我需要從包含 3 個或更多空格的字符串中刪除所有句子,記住應該保留實際的段落內容。

下面是我的方法,它沒有給我准確的結果,我也不喜歡使用 range(5):

for i in range(5):
result = re.sub('[\\n-].* {3,}.*\\n', '', result)
print(result)

我的邏輯輸出:

Some Sentence
Some sentence


Additions based on tax positions related to the
Additions based on tax positions related to prior



Some paragraph
Some paragraph

Additions based on tax positions related to the
Additions based on tax positions related to prior


預期輸出:

Some Sentence
Some sentence


Some paragraph
Some paragraph



還可以做些什么來刪除句子之間的句子(有 3 個或更多空格)?

sentences = """
Some Sentence
Some sentence


Additions based on tax positions related to the
Additions based on tax positions related to prior



Some paragraph
Some paragraph

Additions based on tax positions related to the
Additions based on tax positions related to prior
"""

splitted_sentences = sentences.split('\n')

only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3]
short_sentences_str = '\n'.join(only_short_sentences)
print(short_sentences_str)

輸出:

Some Sentence
Some sentence





Some paragraph
Some paragraph

如果你想丟棄空行 - 轉換為這個版本的列表理解:

only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3 and line]

這是預期的結果嗎?

已編輯

輸入:

sentences = """
Some Sentence
Some sentence


Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -

Some paragraph
Some paragraph

Balance at January 1,                                $421            $51
Additions based on tax positions related to the

current year                                                    4        34         9

Additions based on acquisitions                           -       -       2
Additions based on tax positions related to prior

years                                                    21       13     374
Reductions for tax positions of prior years                (54)     (43)      -
"""

輸出:

Some Sentence
Some sentence






Some paragraph
Some paragraph

對此有一個簡單的正則表達式(我已將您的輸入放入文件“test.txt”中):

grep -v " .* .* " test.txt

如您所見,它只是三個空格,中間是".*" ,它代表“每個可能的字符,重復未知次數(可能為零)”。
哦,差點忘了:在"-v"代表“的事情不要在結果中看到”。

顯然您知道re Python 庫,因此您可能知道如何將此正則表達式嵌入到您的 Python 源代碼中。

祝你好運

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM