简体   繁体   中英

Extract a specific part of the file name using Regex in c# .net

How do I only keep Math1 from a file name HS18_Math1.pdf, sometimes it can be Math1.pdf

Here are some examples of file names:

HS18_Dbs1.pdf //keep Dbs1.pdf

FS19_Dbs2.pdf //keep Dbs2.pdf

FS19_Math2.pdf //keep Math2.pdf

FS19_OO2.pdf //keep OO2.pdf

FS19_An1I.pdf //keep An1I.pdf

I do not have any prior experience in RegEx

Thanks in advance for anyone who wants to help me

This is fairly simple for regex to handle:

string newStr = Regex.Replace(yourInputStr, @"[a-zA-Z0-9]+\\_", String.Empty);

Here are some helpful resources: https://regexone.com/ https://www.dotnetperls.com/regex https://regex101.com/

If your data is always in the same format as all your examples, you could use Substring to help solve your problem and always substring from the index of the underscore plus 1.

here is an example:

string originalName = "FS19_Dbs2.pdf";
string newName = originalName.Substring(originalName.IndexOf("_") + 1);

The variable newName above has the string file name like you were asking for.

Edit: For a Regex solution that does the same as the substring example above, you can use this regex pattern that wil get the last index of the underscore character and take the rest of the string after the underscore.

Regex pattern:

[^_]*$

example:

Regex regexTest = new Regex(@"[^_]*$");
string originalName = "FS19_Dbs2.pdf";
var match = regexTest.Match(originalName);
string newName = match.Value;
// newName contains "Dbs2.pdf".

The variable newName above has the string file name like you were asking for.

You can use substring to obtain the result you want and do not necessarily need to use RegEx for this scenario. Below code would give you the required string with or without extension depending on your scenario.

public static void Main(string[] args)
{
    string s = "S18_Dbs1.pdf";
    string result;
    bool keepExtention = true;
    if(keepExtention)
        result = s.Substring(s.IndexOf('_') + 1);
    else
        result = s.Substring(s.IndexOf('_') + 1, s.IndexOf('.') - s.IndexOf('_') - 1);

    Console.WriteLine(result);
}

If you really are interested in solving it through Regex only (again I do not see a need in this scenario and would not recommend it)

    //this is the quivalent regex if you want to print the name with extension
    var r = new Regex(@"(?<=_).*");

    Console.WriteLine(r.Match(s));

    //this is the quivalent regex if you want to print the name without extension
    var r1 = new Regex(@"(?<=_).*(?=\.)");

    Console.WriteLine(r1.Match(s));

The ?<= is called positive lookbehind which would help in skipping the _ from match The .* is the the 'string' of chars after the underscore The ?=\\. is called positive lookahead which would help in matching until .

I would recommend you go through the documentation on regex before you start playing around with it and even before that you determine if your scenario can be solved without regex as it makes your code easy to understand by others besides other benefits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM