简体   繁体   中英

How do I write only certain lines to a file in Python?

I have a file that looks like this(have to put in code box so it resembles file):

text
(starts with parentheses)
         tabbed info
text
(starts with parentheses)
         tabbed info

...repeat

I want to grab only "text" lines from the file(or every fourth line) and copy them to another file. This is the code I have, but it copies everything to the new file:

import sys

def process_file(filename):

    output_file = open("data.txt", 'w')

    input_file = open(filename, "r")
    for line in input_file:
        line = line.strip()
                if not line.startswith("(") or line.startswith(""):
                        output_file.write(line)        
    output_file.close()
if __name__ == "__main__":
process_file(sys.argv[1])

The reason why your script is copying every line is because line.startswith("") is True, no matter what line equals.

You might try using isspace to test if line begins with a space:

def process_file(filename):
    with open("data.txt", 'w') as output_file:
        with open(filename, "r") as input_file:
            for line in input_file:
                line=line.rstrip()
                if not line.startswith("(") or line[:1].isspace():
                    output_file.write(line) 
with open('data.txt','w') as of:
    of.write(''.join(textline
                     for textline in open(filename)
                     if textline[0] not in ' \t(')
             )

To write every fourth line use slice result[::4]

with open('data.txt','w') as of:
    of.write(''.join([textline
                     for textline in open(filename)
                     if textline[0] not in ' \t('][::4])
             )

I need not to rstrip the newlines as I use them with write.

In addition to line.startswith("") always being true, line.strip() will remove the leading tab forcing the tabbed data to be written as well. change it to line.rstrip() and use \\t to test for a tab. That part of your code should look like:

line = line.rstrip()
if not line.startswith(('(', '\t')):
    #....

In response to your question in the comments:

#edited in response to comments in post
for i, line in input_file:
    if i % 4 == 0:
        output_file.write(line)

try:

if not line.startswith("(") and not line.startswith("\t"):

without doing line.strip() (this will strip the tabs)

So the issue is that (1) you are misusing boolean logic, and (2) every possible line starts with "".

First, the boolean logic:

The way the or operator works is that it returns True if either of its operands is True. The operands are "not line.startswith('(')" and "line.startswith('')". Note that the not only applies to one of the operands. If you want to apply it to the total result of the or expression, you will have to put the whole thing in parentheses.

The second issue is your use of the startswith() method with a zero-length strong as an argument. This essentially says "match any string where the first zero characters are nothing. It matches any strong you could give it.

See other answers for what you should be doing here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM