简体   繁体   English

在Python中从.txt文件中删除页码

[英]Removing page numbers from a .txt file in Python

I am trying to load a .txt file of an ebook and remove lines that contain page numbers. 我正在尝试加载电子书的.txt文件并删除包含页码的行。 The book looks like: 这本书看起来像:

2
Words
More words.

More words.

3
More words.

Here is what I have so far: 这是我到目前为止的内容:

x = 1

with open("first.txt","r") as input:
    with open("last.txt","wb") as output: 
        for line in input:
            if line != str(x) + "\n":
                output.write(line + "\n")
                x + x + 1

My output file comes out with all of the white space (new lines) removed (which I don't want) and it does not even remove the numbers. 我的输出文件出来后,所有空白(换行)都被删除了(我不想要),它甚至没有删除数字。 Does anyone have any ideas? 有人有什么想法吗? Thanks! 谢谢!

1) You don't have to open your file for binary open("last.txt","wb") -> open("last.txt","w") 2) x + x + 1 -> x += 1 1)您不必为二进制文件open("last.txt","wb") -> open("last.txt","w")打开文件2) x + x + 1 > x += 1

But, you could do it far simpler 但是,您可以轻松得多

with open("first.txt","r") as input:
    with open("last.txt","w") as output: 
        for line in input:
            line = line.strip() # clear white space
            try: 
                int(line) #is this a number ?
            except ValueError:
                output.write(line + "\n")

Check if you can convert the line to an integer and skip this line if that succeeds. 检查是否可以将行转换为整数,如果成功,请跳过此行。 Not the quickest solution, but should work. 不是最快的解决方案,但应该可以。

try:
   int(line)
   # skip storing that line
   continue
except ValueError:
   # save the line to output

Use regular expressions to ignore lines that contain just a number. 使用正则表达式忽略仅包含数字的行。

import sys
import re

pattern = re.compile("""^\d+$""")

for line in sys.stdin:
    if not pattern.match(line):
        sys.stdout.write(line)

Improved solution - one less indentation level, avoid unnecessary strip and string summation, explicit exception caught. 改进的解决方案-减少了一个缩进级别,避免了不必要的strip和字符串求和,捕获了显式异常。

with open("first.txt","r") as input_file, open("last.txt","w") as output_file:
    for line in input_file:
        try: 
            int(line)
        except ValueError:
            output_file.write(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM