在Python中从.txt文件中删除页码

Question

I am trying to load a .txt file of an ebook and remove lines that contain page numbers. 我正在尝试加载电子书的.txt文件并删除包含页码的行。 The book looks like: 这本书看起来像：

2
Words
More words.

More words.

3
More words.

Here is what I have so far: 这是我到目前为止的内容：

x = 1

with open("first.txt","r") as input:
    with open("last.txt","wb") as output: 
        for line in input:
            if line != str(x) + "\n":
                output.write(line + "\n")
                x + x + 1

My output file comes out with all of the white space (new lines) removed (which I don't want) and it does not even remove the numbers. 我的输出文件出来后，所有空白（换行）都被删除了（我不想要），它甚至没有删除数字。 Does anyone have any ideas? 有人有什么想法吗？ Thanks! 谢谢！

Answer 1

1) You don't have to open your file for binary open("last.txt","wb") -> open("last.txt","w") 2) x + x + 1 -> x += 1 1）您不必为二进制文件open("last.txt","wb") -> open("last.txt","w")打开文件2） x + x + 1 > x += 1

But, you could do it far simpler 但是，您可以轻松得多

with open("first.txt","r") as input:
    with open("last.txt","w") as output: 
        for line in input:
            line = line.strip() # clear white space
            try: 
                int(line) #is this a number ?
            except ValueError:
                output.write(line + "\n")

Answer 2

Check if you can convert the line to an integer and skip this line if that succeeds. 检查是否可以将行转换为整数，如果成功，请跳过此行。 Not the quickest solution, but should work. 不是最快的解决方案，但应该可以。

try:
   int(line)
   # skip storing that line
   continue
except ValueError:
   # save the line to output

Answer 3

Use regular expressions to ignore lines that contain just a number. 使用正则表达式忽略仅包含数字的行。

import sys
import re

pattern = re.compile("""^\d+$""")

for line in sys.stdin:
    if not pattern.match(line):
        sys.stdout.write(line)

Answer 4

Improved solution - one less indentation level, avoid unnecessary strip and string summation, explicit exception caught. 改进的解决方案-减少了一个缩进级别，避免了不必要的strip和字符串求和，捕获了显式异常。

with open("first.txt","r") as input_file, open("last.txt","w") as output_file:
    for line in input_file:
        try: 
            int(line)
        except ValueError:
            output_file.write(line)

在Python中从.txt文件中删除页码

问题描述

4 个解决方案

解决方案1
3 已采纳 2015-04-08 08:10:05

解决方案2
0 2015-04-08 08:11:42

解决方案3
0 2015-04-08 08:15:05

解决方案4
0 2015-04-08 08:20:49

在Python中从.txt文件中删除页码

问题描述

4 个解决方案

解决方案1 3 已采纳 2015-04-08 08:10:05

解决方案2 0 2015-04-08 08:11:42

解决方案3 0 2015-04-08 08:15:05

解决方案4 0 2015-04-08 08:20:49

解决方案1
3 已采纳 2015-04-08 08:10:05

解决方案2
0 2015-04-08 08:11:42

解决方案3
0 2015-04-08 08:15:05

解决方案4
0 2015-04-08 08:20:49