简体   繁体   English

您如何 state 中的正则表达式 C# 跳过一个字符,用另一个替换一个字符并在特定的 position 处添加一个新字符?

[英]How do you state a regular expression in C# to skip a character, replace one with another and add a new character at a specific position?

I have a C# program that takes as input a subtitle text file with contents like this:我有一个 C# 程序,它将字幕文本文件作为输入,内容如下:

1
00: 00: 07.966 -> 00: 00: 11.166
How's the sea?
- This is great. 

2
00: 00: 12.967 -> 00: 00: 15.766
It's really pretty.

What I want to do is basically correct it, so that it will skip any spaces, replace the .我想要做的基本上是纠正它,这样它就会跳过任何空格,替换. character with the , character and add another hyphen to the -> string, so that it will become --> .字符与,字符并在->字符串中添加另一个连字符,使其变为--> For the previous example, the correct output would be:对于前面的示例,正确的 output 将是:

1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great. 

2
00:00:12,967 --> 00:00:15,766
It's really pretty.

So far, I've thought about iterating through each line and checking if it starts and ends with a digit, like so:到目前为止,我已经考虑过遍历每一行并检查它是否以数字开头和结尾,如下所示:

if (line.StartsWith("[0-9]") && line.EndsWith("[0-9]")) {
}

I don't know how to state the regular expression to do this, though.不过,我不知道如何使用 state 正则表达式来做到这一点。 Please take note that my input can have spaces anywhere at the subtitle timing line, not just after the : character, so the string can end up being as worse as this:请注意,我的输入在字幕时间线的任何地方都可以有空格,而不仅仅是在:字符之后,因此字符串最终可能会变得更糟:

"^ 0 0 : 0 0 : 0 7 . 9 6 6 -> 0 0 : 0 0 : 1 1 . 1 6 6 $"

It may not be a single regex that does everything, but I think that is actually an advantage and the logic is easy to follow and modify.它可能不是一个可以做所有事情的正则表达式,但我认为这实际上是一个优势,并且逻辑易于遵循和修改。

using var input = new StreamReader(inputPath);
using var output = new StreamWriter(outputPath);

// matches a timestamp line with a "->" and no alpha characters
var timestampRegex = new Regex(@"[^A-Za-z]*-\s*>[^A-Za-z]*");

string line;
while((line = input.ReadLine()) != null)
{
    // if a timestamp line is found then it is modified
    if (timestampRegex.IsMatch(line))
    {
        line = Regex.Replace(line, @"\s", ""); // remove all whitespace
        line = line.Replace("->", " --> "); // update arrow style
    }

    output.WriteLine(line);
}

You can solve it with the regular expression:您可以使用正则表达式解决它:

(?m)(?:\G(?!\A)|^(?=\d.*\d\r?$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?

The replacement will be $1$2$3$3$4 .替换将是$1$2$3$3$4

See regex proof .请参阅正则表达式证明

EXPLANATION解释

--------------------------------------------------------------------------------
  (?m)                     set flags for this block (with ^ and $
                           matching start and end of line) (case-
                           sensitive) (with . not matching \n)
                           (matching whitespace and # normally)
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \G                       where the last m//g left off
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \A                       the beginning of the string
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    ^                        the beginning of a "line"
--------------------------------------------------------------------------------
    (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
      .*                       any character except \n (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
      \r?                      '\r' (carriage return) (optional
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      $                        before an optional \n, and the end of
                               a "line"
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
    :                        ':'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  [ \t]                    any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    (                        group and capture to \2:
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      ,                        ','
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      [ \t]                    any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
    )                        end of \2
--------------------------------------------------------------------------------
    (                        group and capture to \3:
--------------------------------------------------------------------------------
      -                        '-'
--------------------------------------------------------------------------------
    )                        end of \3
--------------------------------------------------------------------------------
    (                        group and capture to \4:
--------------------------------------------------------------------------------
      >                        '>'
--------------------------------------------------------------------------------
      [ \t]                    any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
    )                        end of \4
--------------------------------------------------------------------------------
  )?                       end of grouping

C# code : C# 代码

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(?:\G(?!\A)|^(?=\d.*\r?\d$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?";
        string substitution = @"$1$2$3$3$4";
        string input = @"1
00: 00: 07,966 -> 00: 00: 11,166
How's the sea?
- This is great. 

2
00: 00: 12,967 -> 00: 00: 15,766
It's really pretty.";
        RegexOptions options = RegexOptions.Multiline;
        
        Regex regex = new Regex(pattern, options);
        string result = regex.Replace(input, substitution);
        Console.Write(result);
    }
}

Results :结果

1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great. 

2
00:00:12,967 --> 00:00:15,766
It's really pretty.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM