[英]Remove specific sentences from string
我有以下格式的字符串:(有 3 个或更多空格的句子和这些句子之间的句子是表格数据的一部分)
Some Sentence
Some sentence
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
Some paragraph
Some paragraph
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
我需要从包含 3 个或更多空格的字符串中删除所有句子,记住应该保留实际的段落内容。
下面是我的方法,它没有给我准确的结果,我也不喜欢使用 range(5):
for i in range(5):
result = re.sub('[\\n-].* {3,}.*\\n', '', result)
print(result)
我的逻辑输出:
Some Sentence
Some sentence
Additions based on tax positions related to the
Additions based on tax positions related to prior
Some paragraph
Some paragraph
Additions based on tax positions related to the
Additions based on tax positions related to prior
预期输出:
Some Sentence
Some sentence
Some paragraph
Some paragraph
还可以做些什么来删除句子之间的句子(有 3 个或更多空格)?
sentences = """
Some Sentence
Some sentence
Additions based on tax positions related to the
Additions based on tax positions related to prior
Some paragraph
Some paragraph
Additions based on tax positions related to the
Additions based on tax positions related to prior
"""
splitted_sentences = sentences.split('\n')
only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3]
short_sentences_str = '\n'.join(only_short_sentences)
print(short_sentences_str)
输出:
Some Sentence
Some sentence
Some paragraph
Some paragraph
如果你想丢弃空行 - 转换为这个版本的列表理解:
only_short_sentences = [line for line in splitted_sentences if len(line.split()) <3 and line]
这是预期的结果吗?
输入:
sentences = """
Some Sentence
Some sentence
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
Some paragraph
Some paragraph
Balance at January 1, $421 $51
Additions based on tax positions related to the
current year 4 34 9
Additions based on acquisitions - - 2
Additions based on tax positions related to prior
years 21 13 374
Reductions for tax positions of prior years (54) (43) -
"""
输出:
Some Sentence
Some sentence
Some paragraph
Some paragraph
对此有一个简单的正则表达式(我已将您的输入放入文件“test.txt”中):
grep -v " .* .* " test.txt
如您所见,它只是三个空格,中间是".*"
,它代表“每个可能的字符,重复未知次数(可能为零)”。
哦,差点忘了:在"-v"
代表“的事情不要在结果中看到”。
显然您知道re
Python 库,因此您可能知道如何将此正则表达式嵌入到您的 Python 源代码中。
祝你好运
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.