如何在Python中仅将某些行写入文件？

Question

I have a file that looks like this(have to put in code box so it resembles file): 我有一个看起来像这样的文件（必须放在代码框中，使其类似于文件）：

text
(starts with parentheses)
         tabbed info
text
(starts with parentheses)
         tabbed info

...repeat

I want to grab only "text" lines from the file(or every fourth line) and copy them to another file. 我只想从文件（或每四行）中抓取“文本”行，然后将其复制到另一个文件中。 This is the code I have, but it copies everything to the new file: 这是我的代码，但是将所有内容复制到新文件中：

import sys

def process_file(filename):

    output_file = open("data.txt", 'w')

    input_file = open(filename, "r")
    for line in input_file:
        line = line.strip()
                if not line.startswith("(") or line.startswith(""):
                        output_file.write(line)        
    output_file.close()
if __name__ == "__main__":
process_file(sys.argv[1])

Answer 1

The reason why your script is copying every line is because line.startswith("") is True, no matter what line equals. 您的脚本复制每一行的原因是因为无论哪一line相等， line.startswith("")为True。

You might try using isspace to test if line begins with a space: 您可以尝试使用isspace测试line是否以空格开头：

def process_file(filename):
    with open("data.txt", 'w') as output_file:
        with open(filename, "r") as input_file:
            for line in input_file:
                line=line.rstrip()
                if not line.startswith("(") or line[:1].isspace():
                    output_file.write(line)

Answer 2

with open('data.txt','w') as of:
    of.write(''.join(textline
                     for textline in open(filename)
                     if textline[0] not in ' \t(')
             )

To write every fourth line use slice result[::4] 要写每四行，请使用slice结果[:: 4]

with open('data.txt','w') as of:
    of.write(''.join([textline
                     for textline in open(filename)
                     if textline[0] not in ' \t('][::4])
             )

I need not to rstrip the newlines as I use them with write. 当我在写操作中使用换行符时，我不需要将其换行。

Answer 3

In addition to line.startswith("") always being true, line.strip() will remove the leading tab forcing the tabbed data to be written as well. 除了line.startswith("")始终为true之外， line.strip()还将删除前导制表符，从而迫使制表符数据也要写入。 change it to line.rstrip() and use \\t to test for a tab. 将其更改为line.rstrip()并使用\\t测试选项卡。 That part of your code should look like: 您的代码部分应如下所示：

line = line.rstrip()
if not line.startswith(('(', '\t')):
    #....

In response to your question in the comments: 在评论中回答您的问题：

#edited in response to comments in post
for i, line in input_file:
    if i % 4 == 0:
        output_file.write(line)

Answer 4

try: 尝试：

if not line.startswith("(") and not line.startswith("\t"):

without doing line.strip() (this will strip the tabs) 而不做line.strip（）（这将删除选项卡）

Answer 5

So the issue is that (1) you are misusing boolean logic, and (2) every possible line starts with "". 因此，问题在于（1）您滥用布尔逻辑，并且（2）每行都以“”开头。

First, the boolean logic: 首先，布尔逻辑：

The way the or operator works is that it returns True if either of its operands is True. or运算符的工作方式是，如果其两个操作数中的任何一个为True，则返回True。 The operands are "not line.startswith('(')" and "line.startswith('')". Note that the not only applies to one of the operands. If you want to apply it to the total result of the or expression, you will have to put the whole thing in parentheses. 操作数是“ not line.startswith（'（'）”和“ line.startswith（''）”。请注意，此操作数不仅适用于其中一个操作数。如果要将其应用于或的总结果表达式，则必须将整个内容放在括号中。

The second issue is your use of the startswith() method with a zero-length strong as an argument. 第二个问题是您使用带零长度强作为参数的startswith（）方法。 This essentially says "match any string where the first zero characters are nothing. It matches any strong you could give it. 这实际上是说“匹配前零个字符都不为零的任何字符串。它匹配您可以提供的任何强值。

See other answers for what you should be doing here. 请参阅其他答案，了解您应该在这里做什么。

如何在Python中仅将某些行写入文件？

问题描述

5 个解决方案

解决方案1
1 2010-08-07 19:19:29

解决方案2
1 2010-08-07 19:39:29

解决方案3
0 已采纳 2010-08-07 19:23:23

解决方案4
0 2010-08-07 19:24:05

解决方案5
0 2010-08-07 19:27:52

如何在Python中仅将某些行写入文件？

问题描述

5 个解决方案

解决方案1 1 2010-08-07 19:19:29

解决方案2 1 2010-08-07 19:39:29

解决方案3 0 已采纳 2010-08-07 19:23:23

解决方案4 0 2010-08-07 19:24:05

解决方案5 0 2010-08-07 19:27:52

解决方案1
1 2010-08-07 19:19:29

解决方案2
1 2010-08-07 19:39:29

解决方案3
0 已采纳 2010-08-07 19:23:23

解决方案4
0 2010-08-07 19:24:05

解决方案5
0 2010-08-07 19:27:52