简体   繁体   中英

Powershell multiline regex: subtitles , How?

At this website: http://softwarerecipes.blogspot.com/2011/01/powershell-script-for-fixing-subtitles.html . there is script for fixing offset of subtitles files like this:

SrtFixer.ps1 -Path 'Movie.srt' -Offset '-00:00:26.700' but isn't available.

So i found a script in C# ( here: http://www.codeproject.com/Articles/32834/Subtitle-Synchronization-with-C )

My first step is to write the right regex for the following structure :

2
00:00:01,775 --> 00:00:04,528
- I want you to admit what you are.
- You still refuse to see

I'll try this:

$file = "D:\subtitles\Hannibal - 02x10 - eng.srt"

$regex ='(?m)[0-9]{1,5}\n([0-9]{2}:[0-9]{2}:[0-9]{2}),([0-9]{3}) --> ([0-9]{2}:[0-9]  {2}:[0-9]{2}),([0-9]{3})\n((.+\n)*)'

$text =get-content($file) -Raw

$text | Select-String $regex -AllMatches |
    Foreach {$_.Matches} | Foreach {$_.Value}

The regex seems correct (regexhero) but not any matches.

Can you help me ?

EDIT:

My mistake. The correct regex

$regex ='(?m)[0-9]{1,5}\n([0-9]{2}:[0-9]{2}:[0-9]{2}),([0-9]{3})\s-->\s([0-9]{2}:[0-9]{2}:[0-9]{2}),([0-9]{3})\n((.+\n)*)'

but it doesn't work.

The regex isn't correct, at least when checked with The Regex Coach and your sample text. That's because ([0-9]{2}:[0-9] {2}:[0-9]{2}) requires some odd spacing within the timestamps. Removing those provides a match for the sample input.

That being said, how about getting a paid version of the content? Those usually have subtitles in good order.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM