I have a C# program that takes as input a subtitle text file with contents like this:
1
00: 00: 07.966 -> 00: 00: 11.166
How's the sea?
- This is great.
2
00: 00: 12.967 -> 00: 00: 15.766
It's really pretty.
What I want to do is basically correct it, so that it will skip any spaces, replace the .
character with the ,
character and add another hyphen to the ->
string, so that it will become -->
. For the previous example, the correct output would be:
1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great.
2
00:00:12,967 --> 00:00:15,766
It's really pretty.
So far, I've thought about iterating through each line and checking if it starts and ends with a digit, like so:
if (line.StartsWith("[0-9]") && line.EndsWith("[0-9]")) {
}
I don't know how to state the regular expression to do this, though. Please take note that my input can have spaces anywhere at the subtitle timing line, not just after the :
character, so the string can end up being as worse as this:
"^ 0 0 : 0 0 : 0 7 . 9 6 6 -> 0 0 : 0 0 : 1 1 . 1 6 6 $"
It may not be a single regex that does everything, but I think that is actually an advantage and the logic is easy to follow and modify.
using var input = new StreamReader(inputPath);
using var output = new StreamWriter(outputPath);
// matches a timestamp line with a "->" and no alpha characters
var timestampRegex = new Regex(@"[^A-Za-z]*-\s*>[^A-Za-z]*");
string line;
while((line = input.ReadLine()) != null)
{
// if a timestamp line is found then it is modified
if (timestampRegex.IsMatch(line))
{
line = Regex.Replace(line, @"\s", ""); // remove all whitespace
line = line.Replace("->", " --> "); // update arrow style
}
output.WriteLine(line);
}
You can solve it with the regular expression:
(?m)(?:\G(?!\A)|^(?=\d.*\d\r?$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?
The replacement will be $1$2$3$3$4
.
See regex proof .
EXPLANATION
--------------------------------------------------------------------------------
(?m) set flags for this block (with ^ and $
matching start and end of line) (case-
sensitive) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\r? '\r' (carriage return) (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
a "line"
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
)? end of grouping
C# code :
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(?:\G(?!\A)|^(?=\d.*\r?\d$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?";
string substitution = @"$1$2$3$3$4";
string input = @"1
00: 00: 07,966 -> 00: 00: 11,166
How's the sea?
- This is great.
2
00: 00: 12,967 -> 00: 00: 15,766
It's really pretty.";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.Write(result);
}
}
Results :
1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great.
2
00:00:12,967 --> 00:00:15,766
It's really pretty.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.