简体   繁体   中英

Removing specific blank lines from txt/srt files inside a directory and its sub-directories by python

I have a lot of subtitles files with the below format.

1

00:00:01,000 --> 00:00:02,008
some dummy text

2

00:00:02,008 --> 00:00:05,006
some dummy text
some dummy text

3

00:00:05,006 --> 00:00:08,008
some dummy text
some dummy text

I would like to convert them into below by removing the blank line between the time and its previous number.

1
00:00:01,000 --> 00:00:02,008
some dummy text

2
00:00:02,008 --> 00:00:05,006
some dummy text
some dummy text

3
00:00:05,006 --> 00:00:08,008
some dummy text
some dummy text

As they are numerous files, I need a piece of code to be applied for all files inside a directory and it's sub-directories. Is there any chance to overwrite the existing files?

Here is how you can use os.walk() and re.sub() :

import os
import re

for root, dirs, files in os.walk('C:\\Users\\User\\Desktop\\Folder\\'):
    for file in files:
        if file.endswith('.txt'):
            fpath = os.path.join(root, file)
            with open(fpath, 'r') as f:
                t = re.sub('(?<=\d)\n*(?=\d\d\:\d\d:\d\d\,\d\d\d)','\n',f.read())
            with open(fpath, 'w') as f:
                f.write(t)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM