簡體   English   中英

您如何 state 中的正則表達式 C# 跳過一個字符,用另一個替換一個字符並在特定的 position 處添加一個新字符?

[英]How do you state a regular expression in C# to skip a character, replace one with another and add a new character at a specific position?

我有一個 C# 程序,它將字幕文本文件作為輸入,內容如下:

1
00: 00: 07.966 -> 00: 00: 11.166
How's the sea?
- This is great. 

2
00: 00: 12.967 -> 00: 00: 15.766
It's really pretty.

我想要做的基本上是糾正它,這樣它就會跳過任何空格,替換. 字符與,字符並在->字符串中添加另一個連字符,使其變為--> 對於前面的示例,正確的 output 將是:

1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great. 

2
00:00:12,967 --> 00:00:15,766
It's really pretty.

到目前為止,我已經考慮過遍歷每一行並檢查它是否以數字開頭和結尾,如下所示:

if (line.StartsWith("[0-9]") && line.EndsWith("[0-9]")) {
}

不過,我不知道如何使用 state 正則表達式來做到這一點。 請注意,我的輸入在字幕時間線的任何地方都可以有空格,而不僅僅是在:字符之后,因此字符串最終可能會變得更糟:

"^ 0 0 : 0 0 : 0 7 . 9 6 6 -> 0 0 : 0 0 : 1 1 . 1 6 6 $"

它可能不是一個可以做所有事情的正則表達式,但我認為這實際上是一個優勢,並且邏輯易於遵循和修改。

using var input = new StreamReader(inputPath);
using var output = new StreamWriter(outputPath);

// matches a timestamp line with a "->" and no alpha characters
var timestampRegex = new Regex(@"[^A-Za-z]*-\s*>[^A-Za-z]*");

string line;
while((line = input.ReadLine()) != null)
{
    // if a timestamp line is found then it is modified
    if (timestampRegex.IsMatch(line))
    {
        line = Regex.Replace(line, @"\s", ""); // remove all whitespace
        line = line.Replace("->", " --> "); // update arrow style
    }

    output.WriteLine(line);
}

您可以使用正則表達式解決它:

(?m)(?:\G(?!\A)|^(?=\d.*\d\r?$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?

替換將是$1$2$3$3$4

請參閱正則表達式證明

解釋

--------------------------------------------------------------------------------
  (?m)                     set flags for this block (with ^ and $
                           matching start and end of line) (case-
                           sensitive) (with . not matching \n)
                           (matching whitespace and # normally)
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \G                       where the last m//g left off
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \A                       the beginning of the string
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    ^                        the beginning of a "line"
--------------------------------------------------------------------------------
    (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
      .*                       any character except \n (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
      \r?                      '\r' (carriage return) (optional
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      $                        before an optional \n, and the end of
                               a "line"
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
--------------------------------------------------------------------------------
    :                        ':'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  [ \t]                    any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    (                        group and capture to \2:
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      ,                        ','
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      [ \t]                    any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
    )                        end of \2
--------------------------------------------------------------------------------
    (                        group and capture to \3:
--------------------------------------------------------------------------------
      -                        '-'
--------------------------------------------------------------------------------
    )                        end of \3
--------------------------------------------------------------------------------
    (                        group and capture to \4:
--------------------------------------------------------------------------------
      >                        '>'
--------------------------------------------------------------------------------
      [ \t]                    any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
    )                        end of \4
--------------------------------------------------------------------------------
  )?                       end of grouping

C# 代碼

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(?:\G(?!\A)|^(?=\d.*\r?\d$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?";
        string substitution = @"$1$2$3$3$4";
        string input = @"1
00: 00: 07,966 -> 00: 00: 11,166
How's the sea?
- This is great. 

2
00: 00: 12,967 -> 00: 00: 15,766
It's really pretty.";
        RegexOptions options = RegexOptions.Multiline;
        
        Regex regex = new Regex(pattern, options);
        string result = regex.Replace(input, substitution);
        Console.Write(result);
    }
}

結果

1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great. 

2
00:00:12,967 --> 00:00:15,766
It's really pretty.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM