简体   繁体   English

如何使用Python修改文本文件

[英]How to modify a text file using Python

I have this following text file: 我有以下文本文件:

  1. It's hard to explain puns to kleptomaniacs because they always take things literally. 很难向双关语者解释双关语,因为他们总是从字面上理解事物。

  2. I used to think the brain was the most important organ. 我曾经认为大脑是最重要的器官。 Then I thought, look what's telling me that. 然后我想,看看有什么告诉我的。

I use the following script to get rid of the numberings and newlines: 我使用以下脚本来删除编号和换行符:

import re
with open('jokes.txt', 'r+') as original_file:
    modfile = original_file.read()
    modfile = re.sub("\d+\. ", "", modfile)
    modfile = re.sub("\n", "", modfile)
    original_file.seek(0)
    original_file.truncate()
    original_file.write(modfile)

After running the script, this how my text file is: 运行脚本后,我的文本文件是这样的:

It's hard to explain puns to kleptomaniacs because they always take things literally. 很难向双关语者解释双关语,因为他们总是从字面上理解事物。 I used to think the brain was the most important organ. 我曾经认为大脑是最重要的器官。 Then I thought, look what's telling me that. 然后我想,看看有什么告诉我的。

I'd like the file to be: 我希望文件是:

It's hard to explain puns to kleptomaniacs because they always take things literally. 很难向双关语者解释双关语,因为他们总是从字面上理解事物。
I used to think the brain was the most important organ. 我曾经认为大脑是最重要的器官。 Then I thought, look what's telling me that. 然后我想,看看有什么告诉我的。

How do I delete the new lines without mending all the lines? 如何在不修补所有行的情况下删除新行?

You can use a single replace, with the following regex: 您可以使用以下正则表达式使用单个替换:

re.sub(r"\d+\. |(?<!^)\n", "", modfile, flags=re.MULTILINE)

(?<!^)\\n will match a newline unless it's at the start of a line. (?<!^)\\n将匹配换行符,除非它位于一行的开头。 The flag re.MULTILINE makes ^ match every beginning of line. 标志re.MULTILINE使^匹配每行的开头。

regex101 demo regex101演示

In code: 在代码中:

import re
with open('jokes.txt', 'r+') as original_file:
    modfile = original_file.read()
    midfile = re.sub(r"\d+\. |(?<!^)\n", "", modfile, flags=re.MULTILINE)
    original_file.seek(0)
    original_file.truncate()
    original_file.write(modfile)

You can also use a negative lookahead instead of a lookbehind if you want: 如果需要,您还可以使用负向前瞻而不是后瞻:

r"\d+\. |\n(?!\n)"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM