简体   繁体   English

将 C# 代码转换为 powershell:scriptblock-delegate

[英]convert C# code to powershell: scriptblock-delegate

this is the code i want to convert to the"powershell way":这是我想转换为“powershell方式”的代码:

private static Regex unit = new Regex(
        @"(?<sequence>\d+)\r\n(?<start>\d{2}\:\d{2}\:\d{2},\d{3}) --\> (?<end>\d{2}\:\d{2}\:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)", 
        RegexOptions.Compiled | RegexOptions.ECMAScript);


    output.Write(
        unit.Replace(input.ReadToEnd(), delegate(Match m)
        {
            return m.Value.Replace(
                String.Format("{0}\r\n{1} --> {2}\r\n",
                    m.Groups["sequence"].Value,
                    m.Groups["start"   ].Value,
                    m.Groups["end"     ].Value),
                String.Format(
                    "{0}\r\n{1:HH\\:mm\\:ss\\,fff} --> " + 
                    "{2:HH\\:mm\\:ss\\,fff}\r\n",informatifetcourrier.com   CuImE
                    sequence++,
                    DateTime.Parse(m.Groups["start"].Value.Replace(",","."))
                            .AddSeconds(offset),
                    DateTime.Parse(m.Groups["end"  ].Value.Replace(",","."))
                            .AddSeconds(offset)));

And my attempt:而我的尝试:

$text=@'
2
00:00:03,601 --> 00:00:06,603
<i>Vous devrez trouver quelqu'un
qui pense différemment pour l'attraper.</i>
'@

$regex ='(?m)(?<sequence>\d+)\r\n(?<start>\d{2}\:\d{2}\:\d{2},\d{3}) --\> (?<end>\d{2}\:\d{2}\:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)'

$r = New-Object System.Text.RegularExpressions.Regex $regex

$MatchEvaluator = 
{  
    param($m) 

    $m.value.replace([string]::Format("{0}\r\n{1} --> {2}\r\n",
        $m.Groups["sequence"].Value,
        $m.Groups["start"   ].Value,
        $m.Groups["end"     ].Value),
    [string]::Format("{0}\r\n{1:HH\\:mm\\:ss\\,fff} --> {2:HH\\:mm\\:ss\\,fff}\r\n",
        [datetime]::Parse($m.Groups["start"].Value.Replace(",",".")).AddSeconds(1),
        [datetime]::Parse($m.Groups["end"  ].Value.Replace(",",".")).AddSeconds(1)))
}
$result = $r.Replace($text, $MatchEvaluator)

but it doesn't work.但它不起作用。 Thank you for your help.感谢您的帮助。

I know i have to use scriptblock-delegate in order to aim my purpose.我知道我必须使用 scriptblock-delegate 才能达到我的目的。

You've got several problems going on here.你在这里遇到了几个问题。 Here is a version that works:这是一个有效的版本:

$text=@'
2
00:00:03,601 --> 00:00:06,603
<i>Vous devrez trouver quelqu'un
qui pense différemment pour l'attraper.</i>
'@

$regex = [regex]'(?m)(?<sequence>\d+)\s*$\s*(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\s*$\s*(?<text>.*$\s*.*$)'

$MatchEvaluator = {  
    param($m) 

    $oldValue = "{0}`r`n{1} --> {2}`r`n" -f $m.Groups["sequence"].Value,
                    $m.Groups["start"].Value, $m.Groups["end"].Value
    $seq   = 5 + $m.Groups["sequence"].Value
    $start = ([DateTime]$m.Groups["start"].Value.Replace(",",".")).AddSeconds(1)
    $end   = ([DateTime]$m.Groups["end"].Value.Replace(",",".")).AddSeconds(1)
    $newValue = "{0}`r`n{1:HH:mm:ss,fff} --> {2:HH:mm:ss,fff}`r`n" -f $seq,$start,$end
    $m.value.replace($oldValue, $newValue)
}

$result = $regex.Replace($text, $MatchEvaluator)
$result

First up, in PowerShell double-quoted strings you use `r`n for CRLF.首先,在 PowerShell 双引号字符串中,您将`r`n用于 CRLF。 Second, you were missing an argument for the replacement string $seq above.其次,您缺少上面替换字符串$seq的参数。 Third, you don't need to escape the : in the regexes.第三,您不需要在正则表达式中转义: Fourth, the -f operator is a wrapper on top of and more convenient to use than [String]::Format() .第四, -f运算符是一个包装器,比[String]::Format()更方便使用。

This outputs:这输出:

7
00:00:04,601 --> 00:00:07,603
<i>Vous devrez trouver quelqu'un
qui pense différemment pour l'attraper.</i>

I didn't know how you wanted to modify the sequence number so I just added 5 to it.我不知道您想如何修改序列号,所以我只添加了 5。

thank you very much Keith Hill.非常感谢基思·希尔。 here is my working code:这是我的工作代码:

$file = "D:\subtitles\Hannibal - 02x10 - eng.srt"
$text =get-content($file) -Raw     # powershell V3

Write-Output "offset, in seconds (+1,1, -2,75):"

[Double]$offset = 0

while(![Double]::TryParse((Read-host),[ref] $offset))

{

Write-Output "Not a Number. Do again"

 }

$regex = [regex]'(?m)(?<sequence>\d+)\s*$\s*(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\s*$\s*(?<text>.*$\s*.*$)'

$MatchEvaluator = {  
param($m) 

$oldValue = "{0}`r`n{1} --> {2}`r`n" -f $m.Groups["sequence"].Value,
                $m.Groups["start"].Value, $m.Groups["end"].Value
$seq   =+$m.Groups["sequence"].Value
$start = ([DateTime]$m.Groups["start"].Value.Replace(",",".")).AddSeconds($offset)
$end   = ([DateTime]$m.Groups["end"].Value.Replace(",",".")).AddSeconds($offset)
$newValue = "{0}`r`n{1:HH:mm:ss,fff} --> {2:HH:mm:ss,fff}`r`n" -f $seq, $start,$end
$m.value.replace($oldValue, $newValue)
}

$result = $regex.Replace($text, $MatchEvaluator) | out-file -Encoding utf8 "D:\subtitles\Hannibal - 02x10 - eng_offset_$offset.srt"

The next step for me it's to merge English and french subtitles (for example: 70% in french and 30% in english).我的下一步是合并英语和法语字幕(例如:70% 的法语和 30% 的英语)。 Any advices will be welcome.任何建议将受到欢迎。

Some details for instructional purposes:一些用于教学目的的细节:

The comment by mjolinor is correct in that the regex is indeed incorrect BECAUSE powershell represents end of line in a here-string by a single \\n. mjolinor 的评论是正确的,因为正则表达式确实不正确,因为 powershell 用单个 \\n 表示此处字符串中的行尾。

Also, as given above, there are no \\n's at the end of the string since the end of a here-string is MARKED by \\n'@ (ie '@ at the beginning of a line) so the last \\n is part of the end marker, not the string.此外,如上所述,字符串末尾没有 \\n,因为此处字符串的末尾由 \\n'@ 标记(即行首的 '@),因此最后一个 \\n 是一部分结束标记,而不是字符串。

Unfortunately, just removing the extraneous \\r and \\n character escapes won't work.不幸的是,仅仅删除无关的 \\r 和 \\n 字符转义是行不通的。 Without a concrete match to define where <text> finishes, [\\s\\S]*?没有具体的匹配来定义 <text> 在哪里结束, [\\s\\S]*? will match as empty (the smallest match that lets the whole pattern succeed).将匹配为空(让整个模式成功的最小匹配)。 To match all the <text> either use [\\s\\S]* (greedy version) or force the match to go to the end of the string by using [\\s\\S]*?$.要匹配所有 <text>,请使用 [\\s\\S]*(贪婪版本)或使用 [\\s\\S]*?$ 强制匹配到字符串的末尾。

Further, : and > are not metacharacters (like . or *) and so do not need escaping (though it doesn't hurt).此外,: 和 > 不是元字符(如 . 或 *),因此不需要转义(尽管它不会造成伤害)。 Keith Hill fixed this but did not mention it.基思希尔修复了这个问题,但没有提到它。 Also, specifying the MultiLine option flag (?m) has no purpose since the original pattern has no affected anchors (^ and $) and the C# version didn't set it anyway.此外,指定 MultiLine 选项标志 (?m) 没有任何意义,因为原始模式没有受影响的锚点(^ 和 $)并且 C# 版本无论如何都没有设置它。 Even if the <text> capturing group uses [\\s\\S]*?$, this $ matches the end of the string, not an intermediate \\n (though it would leave a terminating \\n unmatched if it existed).即使 <text> 捕获组使用 [\\s\\S]*?$,这个 $ 也匹配字符串的结尾,而不是中间的 \\n(尽管如果存在,它会留下不匹配的终止 \\n)。 Thus the (repaired as opposed to replaced) regex should have been:因此(修复而不是替换)正则表达式应该是:

 (?<sequence>\d+)\n(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\n(?<text>[\s\S]*)

Note: this explanation is derived from the original question.注意:此解释来自原始问题。 The subsequently posted "working code" shows that the $text value is obtained (raw) from a file and so probably does contain \\r\\n as the end of line marker.随后发布的“工作代码”显示 $text 值是从文件中获取(原始)的,因此可能确实包含 \\r\\n 作为行尾标记。

While I would say that Keith Hill's answer using \\s*$\\s* to match the end of line is more robust since it matches both \\n and \\r\\n (and any following or as yet unmatched preceding white space), if the structure of the file is known and fixed then using unbounded quantifiers to match fixed parts can lead to subtle errors.虽然我会说 Keith Hill 使用 \\s*$\\s* 来匹配行尾的答案更可靠,因为它同时匹配 \\n 和 \\r\\n(以及任何后面的或尚未匹配的前面的空格),如果文件的结构是已知和固定的,然后使用无界量词来匹配固定部分可能会导致细微的错误。 In this case, using \\s*$\\s* to match \\r\\n between the <end> and <text> capturing groups will cause any white space at the beginning of the <text> to be discarded.在这种情况下,使用 \\s*$\\s* 匹配 <end> 和 <text> 捕获组之间的 \\r\\n 将导致 <text> 开头的任何空格被丢弃。 If the end of line marker could be only \\n or \\r\\n then \\r?\\n is safer.如果行尾标记只能是 \\n 或 \\r\\n 那么 \\r?\\n 更安全。

Also, the use of .*$\\s*.*$ means that <text> matches a (possibly empty) line followed by any number of lines (including 0) containing only 0 or more white space followed by a (possibly empty) line.此外,使用 .*$\\s*.*$ 意味着 <text> 匹配一个(可能为空)行,后跟任意数量的行(包括 0)只包含 0 个或多个空格,后跟一个(可能为空)线。 While this works in the original question where the regular expression is parsing a single example entry with 2 <text> lines, it is likely that the file contains many entries.虽然这适用于正则表达式解析具有 2 个 <text> 行的单个示例条目的原始问题,但该文件可能包含许多条目。 By referring to the original (and presumably working) C# version which contains \\r\\n\\r\\n at the end, it appears that the <text> can have any number of lines and that the entries are delimited by a blank line.通过参考最后包含 \\r\\n\\r\\n 的原始(并且可能有效)C# 版本,看起来 <text> 可以有任意数量的行,并且条目由空行分隔。 This would also explain the use of the "lazy" pattern [\\s\\S]*?\\r\\n\\r\\n to capture the <text> up to (and including) the next blank line instead of capturing everything up to the LAST blank line (greedy [\\s\\S]*\\r\\n\\r\\n).这也可以解释使用“懒惰”模式 [\\s\\S]*?\\r\\n\\r\\n 来捕获 <text> 直到(并包括)下一个空行而不是捕获所有内容最后一个空行(贪婪的 [\\s\\S]*\\r\\n\\r\\n)。

Thus, the "working code" pattern should probably have been:因此,“工作代码”模式可能应该是:

  (?<sequence>\d+)\r\n(?<start>\d{2}:\d{2}:\d{2},\d{3}) --> (?<end>\d{2}:\d{2}:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)

IE just the C# version with no escape for : or >. IE 只是 C# 版本,没有转义:或 >。 So the basic error made by cool25 was to store the test string in a powershell here-string thereby changing it so that it no longer represented the actual data to be parsed.因此,cool25 犯的基本错误是将测试字符串存储在 powershell here-string 中,从而对其进行更改,使其不再代表要解析的实际数据。 The lesson here (apart from the actual programming) is that when creating test data for a routine, ensure that the source of the test data is as similar as possible to the source of the actual data.这里的教训(除了实际的编程)是,在为例程创建测试数据时,请确保测试数据的来源与实际数据的来源尽可能相似。 In this case, since the routine was intended to process a file of many entries, the best test data would have been a file of one entry.在这种情况下,由于该例程旨在处理包含多个条目的文件,因此最好的测试数据应该是包含一个条目的文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM