简体   繁体   中英

Regex replace with bracket variable in C#

I am sure that has been asked before, but I cannot find the appropriate question(s).

Being new to C#'s Regex , I want to mimic what is possible eg with sed and awk where I would write s/_(20[0-9]{2})[.0-9]{1}/\1/g in order to find obtain a 4-digit year number after 2000 which is has an underscore as prefix and a number or a dot afterwards. The \1 refers to the value within brackets.

Example: Both files fx_201902.csv or fx_2019.csv should give me back myYear=2019 . I was not successful with:

string myYear = Regex.Replace(Path.GetFileName(x), @"_20([0-9]{2})[.0-9]{1}", "\1")

How do I have to escape? Or is this kind of replacement not possible? If so, how would I do that?

Edit: My issue how to do the /1 in C#, in other words how to extract a regex-variable. Please forgive me my typos in the original post - I am trying the new SO app and I submitted earlier than intended.

You might use a capturing group for the first 4 digits and match what is before and after the 4 digits.

.*_(20[0-9]{2})[0-9]*\.\w+$

Explanation

  • .*_ Match the last underscore
  • (20[0-9]{2}) Match 20 and 2 digits
  • [0-9]*\. Match 0 or more occurrences of a digit followed by a dot
  • \w+$ Match 1 or or more word chars till the end of the string.

Regex demo | C# demo

In the replacement use:

$1

For example

string[] strings = {"fx_2019.csv", "fx_201902.csv"};
foreach (string s in strings)
{
    string myYear = Regex.Replace(s, @".*_(20[0-9]{2})[0-9]*\.\w+$", "$1");
    Console.WriteLine(myYear);
}

Output

2019
2019

I'd suggest more robust regex: _(20(?:0[1-9]|[1-9][0-9]))[\d.]

Explanation:

_ - match _ literally

(...) - first capturing group

20 - match 20 literally

(?:...) - non-capturing group

0[1-9]|[1-9][0-9] - alternation: match 0 and digit other than 0 OR match digit other then zero followed by any digits - this allows you to match ANY year after 2000

[\d.] - match dot or digit

And below is how you use capturing groups:

var regex = new Regex(@"_(20(?:0[1-9]|[1-9][0-9]))[\d.]");
regex.Match("fx_201902.csv").Groups[1].Value;
// "2019"
regex.Match("fx_20190.csv").Groups[1].Value;
// "2019"
regex.Match("fx_2019.csv").Groups[1].Value;
// "2019"

To extract the year using Regex.Replace , you need to capture only the year part of the string into a group and replace the entire string with just the capture group. That means you need to also match the characters before and after the year using (for example)

^.*_(20[0-9]{2})[.0-9].*$

That can then be replaced with $1 eg

Regex r = new Regex(@"^.*_(20[0-9]{2})[.0-9].*$");
string filename = "fx_201902.csv";
string myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);
filename = "fx_2019.csv";
myYear = r.Replace(filename, "$1");
Console.WriteLine(myYear);

Output:

2019
2019

If you want to exclude the year 2000 from your match, change the regex to

^.*_(20(?:0[1-9]|[1-9][0-9]))[.0-9].*$

Your second example does not contains the month's digits. If you still want to capture, make it optional:

Regex.Replace(Path.GetFileName(x), @"_20([1-9]{2})([.0-9]{2})?", "\1")

Note that I only added 3 characters to your query: ( , ) and ?

If you want the returning value to be as expected: change the replacement to $1 from \1 as documented (with the correct parenthesis) and capture 2020, 2030, etc (still excluding 2000) with the usage of or operator and the combination of [0-9]{1} and [1-9]{1} :

Regex.Replace(Path.GetFileName(x), @"_(20(([1-9]{1})([0-9]{1})||([0-9]{1})([1-9]{1})))([.0-9]{2})?", "$1")

It worths mentioning that $3 and $4 matches the last and the 2nd last digit; and $2 matches with the last 2 digits (aka the combination of [0-9]{1} [1-9]{1} || [1-9]{1} [0-9]{1} ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM