简体   繁体   中英

Python remove all lines starting with pattern

So I have 7000+ txt files that look something like this:

1 0.51 0.73 0.81

0 0.24 0.31 0.18

2 0.71 0.47 0.96

1 0.15 0.25 0.48

And as output I want:

0 0.24 0.31 0.18

2 0.71 0.47 0.96

I wrote the code combining multiple sources and it looks like this:

    #!/usr/bin/env python3
  2 import glob
  3 import os
  4 import pathlib
  5 import re
  6 path = './*.txt'
  7 
  8 for filename in glob.glob(path):
  9     with open(filename, 'r') as f:
 10         for line in f.readlines():
 13             if not (line.startswith('1')):
 14                 print(line)
 15                 out = open(filename, 'w')
 16                 out.write(line)
 17         f.close()

But the output for the upper example is:

2 0.71 0.47 0.96

How can I fix the code to give me the correct output?

This is because you overwrite the output in the for-loop. You can either save to a different file:

path = 'test.txt'
output = 'out.txt'
for filename in glob.glob(path):
    
    with open(filename, 'r') as f:
        out = open(outfile, 'w')
        for line in f.readlines():
            
            if not (line.startswith('1')):
                print(line)
                out.write(line)
        f.close()

or you can use append to make an array and then write that to the same file:

import glob
import os
import pathlib
import re

path = 'test.txt'
output = []
for filename in glob.glob(path):
    
    with open(filename, 'r') as f:
        for line in f.readlines():
            if not (line.startswith('1')):
                print(line)
                output.append(line)
            
        with open(path, 'w') as w:
            for line in output:
                print(line)
                w.write(line)
        f.close()

The problem is that you're re-initializing the output file on every row. This can be fixed by opening the output file earlier and using it for every line.

#!/usr/bin/env python3
from glob import glob
import os
import pathlib
import re

for filename in glob('./*.txt'):
    with open(filename,'r') as original_file:
        original_lines=original_file.readlines()
    with open(filename,'w') as updated_file:
        updated_file.writelines(
            line
            for line in original_lines
            if not line.startswith('1')
        )

The error is here:

open(filename, 'w')

This will overwrite on every iteration of the loop, so you only get the last entry.

open(filename, 'a')

This will a ppend the content. But better is to open the out file only once, outside of the loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM