convert C# code to powershell: scriptblock-delegate

Question

this is the code i want to convert to the"powershell way":

private static Regex unit = new Regex(
        @"(?<sequence>\d+)\r\n(?<start>\d{2}\:\d{2}\:\d{2},\d{3}) --\> (?<end>\d{2}\:\d{2}\:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)", 
        RegexOptions.Compiled | RegexOptions.ECMAScript);


    output.Write(
        unit.Replace(input.ReadToEnd(), delegate(Match m)
        {
            return m.Value.Replace(
                String.Format("{0}\r\n{1} --> {2}\r\n",
                    m.Groups["sequence"].Value,
                    m.Groups["start"   ].Value,
                    m.Groups["end"     ].Value),
                String.Format(
                    "{0}\r\n{1:HH\\:mm\\:ss\\,fff} --> " + 
                    "{2:HH\\:mm\\:ss\\,fff}\r\n",informatifetcourrier.com   CuImE
                    sequence++,
                    DateTime.Parse(m.Groups["start"].Value.Replace(",","."))
                            .AddSeconds(offset),
                    DateTime.Parse(m.Groups["end"  ].Value.Replace(",","."))
                            .AddSeconds(offset)));

And my attempt:

$text=@'
2
00:00:03,601 --> 00:00:06,603
<i>Vous devrez trouver quelqu'un
qui pense différemment pour l'attraper.</i>
'@

$regex ='(?m)(?<sequence>\d+)\r\n(?<start>\d{2}\:\d{2}\:\d{2},\d{3}) --\> (?<end>\d{2}\:\d{2}\:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)'

$r = New-Object System.Text.RegularExpressions.Regex $regex

$MatchEvaluator = 
{  
    param($m) 

    $m.value.replace([string]::Format("{0}\r\n{1} --> {2}\r\n",
        $m.Groups["sequence"].Value,
        $m.Groups["start"   ].Value,
        $m.Groups["end"     ].Value),
    [string]::Format("{0}\r\n{1:HH\\:mm\\:ss\\,fff} --> {2:HH\\:mm\\:ss\\,fff}\r\n",
        [datetime]::Parse($m.Groups["start"].Value.Replace(",",".")).AddSeconds(1),
        [datetime]::Parse($m.Groups["end"  ].Value.Replace(",",".")).AddSeconds(1)))
}
$result = $r.Replace($text, $MatchEvaluator)

but it doesn't work. Thank you for your help.

I know i have to use scriptblock-delegate in order to aim my purpose.

Answer 1

You've got several problems going on here. Here is a version that works:

$text=@'
2
00:00:03,601 --> 00:00:06,603
<i>Vous devrez trouver quelqu'un
qui pense différemment pour l'attraper.</i>
'@

$regex = [regex]'(?m)(?<sequence>\d+)\s*$\s*(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\s*$\s*(?<text>.*$\s*.*$)'

$MatchEvaluator = {  
    param($m) 

    $oldValue = "{0}`r`n{1} --> {2}`r`n" -f $m.Groups["sequence"].Value,
                    $m.Groups["start"].Value, $m.Groups["end"].Value
    $seq   = 5 + $m.Groups["sequence"].Value
    $start = ([DateTime]$m.Groups["start"].Value.Replace(",",".")).AddSeconds(1)
    $end   = ([DateTime]$m.Groups["end"].Value.Replace(",",".")).AddSeconds(1)
    $newValue = "{0}`r`n{1:HH:mm:ss,fff} --> {2:HH:mm:ss,fff}`r`n" -f $seq,$start,$end
    $m.value.replace($oldValue, $newValue)
}

$result = $regex.Replace($text, $MatchEvaluator)
$result

First up, in PowerShell double-quoted strings you use `r`n for CRLF. Second, you were missing an argument for the replacement string $seq above. Third, you don't need to escape the : in the regexes. Fourth, the -f operator is a wrapper on top of and more convenient to use than [String]::Format() .

This outputs:

7
00:00:04,601 --> 00:00:07,603
<i>Vous devrez trouver quelqu'un
qui pense différemment pour l'attraper.</i>

I didn't know how you wanted to modify the sequence number so I just added 5 to it.

Answer 2

thank you very much Keith Hill. here is my working code:

$file = "D:\subtitles\Hannibal - 02x10 - eng.srt"
$text =get-content($file) -Raw     # powershell V3

Write-Output "offset, in seconds (+1,1, -2,75):"

[Double]$offset = 0

while(![Double]::TryParse((Read-host),[ref] $offset))

{

Write-Output "Not a Number. Do again"

 }

$regex = [regex]'(?m)(?<sequence>\d+)\s*$\s*(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\s*$\s*(?<text>.*$\s*.*$)'

$MatchEvaluator = {  
param($m) 

$oldValue = "{0}`r`n{1} --> {2}`r`n" -f $m.Groups["sequence"].Value,
                $m.Groups["start"].Value, $m.Groups["end"].Value
$seq   =+$m.Groups["sequence"].Value
$start = ([DateTime]$m.Groups["start"].Value.Replace(",",".")).AddSeconds($offset)
$end   = ([DateTime]$m.Groups["end"].Value.Replace(",",".")).AddSeconds($offset)
$newValue = "{0}`r`n{1:HH:mm:ss,fff} --> {2:HH:mm:ss,fff}`r`n" -f $seq, $start,$end
$m.value.replace($oldValue, $newValue)
}

$result = $regex.Replace($text, $MatchEvaluator) | out-file -Encoding utf8 "D:\subtitles\Hannibal - 02x10 - eng_offset_$offset.srt"

The next step for me it's to merge English and french subtitles (for example: 70% in french and 30% in english). Any advices will be welcome.

Answer 3

Some details for instructional purposes:

The comment by mjolinor is correct in that the regex is indeed incorrect BECAUSE powershell represents end of line in a here-string by a single \\n.

Also, as given above, there are no \\n's at the end of the string since the end of a here-string is MARKED by \\n'@ (ie '@ at the beginning of a line) so the last \\n is part of the end marker, not the string.

Unfortunately, just removing the extraneous \\r and \\n character escapes won't work. Without a concrete match to define where <text> finishes, [\\s\\S]*? will match as empty (the smallest match that lets the whole pattern succeed). To match all the <text> either use [\\s\\S]* (greedy version) or force the match to go to the end of the string by using [\\s\\S]*?$.

Further, : and > are not metacharacters (like . or *) and so do not need escaping (though it doesn't hurt). Keith Hill fixed this but did not mention it. Also, specifying the MultiLine option flag (?m) has no purpose since the original pattern has no affected anchors (^ and $) and the C# version didn't set it anyway. Even if the <text> capturing group uses [\\s\\S]*?$, this $ matches the end of the string, not an intermediate \\n (though it would leave a terminating \\n unmatched if it existed). Thus the (repaired as opposed to replaced) regex should have been:

 (?<sequence>\d+)\n(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\n(?<text>[\s\S]*)

Note: this explanation is derived from the original question. The subsequently posted "working code" shows that the $text value is obtained (raw) from a file and so probably does contain \\r\\n as the end of line marker.

While I would say that Keith Hill's answer using \\s*$\\s* to match the end of line is more robust since it matches both \\n and \\r\\n (and any following or as yet unmatched preceding white space), if the structure of the file is known and fixed then using unbounded quantifiers to match fixed parts can lead to subtle errors. In this case, using \\s*$\\s* to match \\r\\n between the <end> and <text> capturing groups will cause any white space at the beginning of the <text> to be discarded. If the end of line marker could be only \\n or \\r\\n then \\r?\\n is safer.

Also, the use of .*$\\s*.*$ means that <text> matches a (possibly empty) line followed by any number of lines (including 0) containing only 0 or more white space followed by a (possibly empty) line. While this works in the original question where the regular expression is parsing a single example entry with 2 <text> lines, it is likely that the file contains many entries. By referring to the original (and presumably working) C# version which contains \\r\\n\\r\\n at the end, it appears that the <text> can have any number of lines and that the entries are delimited by a blank line. This would also explain the use of the "lazy" pattern [\\s\\S]*?\\r\\n\\r\\n to capture the <text> up to (and including) the next blank line instead of capturing everything up to the LAST blank line (greedy [\\s\\S]*\\r\\n\\r\\n).

Thus, the "working code" pattern should probably have been:

  (?<sequence>\d+)\r\n(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)

IE just the C# version with no escape for : or >. So the basic error made by cool25 was to store the test string in a powershell here-string thereby changing it so that it no longer represented the actual data to be parsed. The lesson here (apart from the actual programming) is that when creating test data for a routine, ensure that the source of the test data is as similar as possible to the source of the actual data. In this case, since the routine was intended to process a file of many entries, the best test data would have been a file of one entry.

convert C# code to powershell: scriptblock-delegate

Question

3 answers

solution1
1 2014-05-14 22:13:37

solution2
0 2014-05-15 12:19:37

solution3
0 2020-06-24 18:30:42

convert C# code to powershell: scriptblock-delegate

Question

3 answers

solution1 1 2014-05-14 22:13:37

solution2 0 2014-05-15 12:19:37

solution3 0 2020-06-24 18:30:42

solution1
1 2014-05-14 22:13:37

solution2
0 2014-05-15 12:19:37

solution3
0 2020-06-24 18:30:42