简体   繁体   中英

capturing specific or group in regex C#

I'm trying to parse match a file name like xxxxSystemCheckedOut.png where xxx can be any prefix to the file name and System and Checked out are keywords to identify.

EDIT: I wasn't being clear on all the possible file names and their results. So filenames can be

  • xxxx System.png produces (group 1: xxxx group 2: System)
  • xxxx SystemCheckedOut.png produces (group 1: xxxx group 2: System group 3: CheckedOut)
  • xxxx CheckedOut.png produces (group 1: xxxx group 2: CheckedOut)

this is my current regex, it matchs the file name like I want it to but can't get it to group in the right way. Using the previous example I'd like the groups to be like this:

  1. xxxx
  2. System
  3. CheckedOut
  4. .png

(?:([\\w]*)(CheckedOut|System)+(\\.[az]*)\\Z)

[EDIT] Give this a try.

Pattern: (.*?)(?:(System)|(CheckedOut)|(Cached))+(.png)\\Z

String: xxxxTESTSystemCached.png

Groups:

  1. xxxxTest
  2. System
  3. Cached
  4. .png

https://regex101.com/r/jE5eA4/1

UPDATE - Based on comments to other answers: This should work for all combinations of System/CheckedOut/Cached:

(\w+?)(System)?(CheckedOut)?(Cached)?(.png)

https://regex101.com/r/qT2sX9/1

Note that that the groups for missing keywords will still exist, so for example:

"abcdSystemCached.png" gives:

Match 1 : "abcd"
Match 2 : "System"
Match 3 :
Match 4 : "Cached"
Match 5 : ".png"

And "1234CheckedOutCached.png" gives:

Match 1 : "abcd"
Match 2 :
Match 3 : "CheckedOut"
Match 4 : "Cached"
Match 5 : ".png"

This is kinda nice as you know a particular keyword will always be a certain position, so it becomes like a flag.

From the comments: I actually need the groups separately so I know how to operate on the image, each keyword ends in different operations on the image

You really don't need to use separate capture buffers on the keywords.
If you need the order of the matched keywords relative to one another,
you'd use the below code. Even if you didn't need the order it could be
done like that.

 ( .*? )                       # (1)
 ( System | CheckedOut )+      # (2)
 \.png $

C#:

string fname = "xxxxSystemCheckedOutSystemSystemCheckedOutCheckedOut.png";
Regex RxFname = new Regex( @"(.*?)(System|CheckedOut)+\.png$" );

Match fnameMatch = RxFname.Match( fname );
if ( fnameMatch.Success )
{
    Console.WriteLine("Group 0 = {0}", fnameMatch.Groups[0].Value);
    Console.WriteLine("Group 1 = {0}", fnameMatch.Groups[1].Value);
    Console.WriteLine("Last Group 2 = {0}\n", fnameMatch.Groups[2].Value);

    CaptureCollection cc = fnameMatch.Groups[2].Captures;

    Console.WriteLine("Array and order of group 2 matches (collection):\n");
    for (int i = 0; i < cc.Count; i++)
    {
        Console.WriteLine("[{0}] = '{1}'", i, cc[i].Value);
    }
}

Output:

Group 0 = xxxxSystemCheckedOutSystemSystemCheckedOutCheckedOut.png
Group 1 = xxxx
Last Group 2 = CheckedOut

Array and order of group 2 matches (collection):

[0] = 'System'
[1] = 'CheckedOut'
[2] = 'System'
[3] = 'System'
[4] = 'CheckedOut'
[5] = 'CheckedOut'

I'm no Regex wizard, so if this can be shortened/tidied I'd love to know, but this groups like you want based on the keywords you gave:

Edited based on OPs clarification of the file structure

(\w+?)(system)?(checkedout)?(cached)?(.png)/ig

Regex101 Demo

Edit: beercohol and jon have me beat ;-)

I read somewhere (can't remember where) the more precise your pattern is, the better performance you'll get from it.

So try this pattern

"(\\w+?)(?:(System)|(CheckedOut))+(.png)"

Code Sample:

List<string> fileNames = new List<string>
{
    "xxxxSystemCheckedOut.png",         // Good
    "SystemCheckedOut.png",             // Good
    "1afweiljSystemCheckedOutdgf.png",  // Bad - Garbage characters before .png
    "asdf.png",                         // Bad - No System or CheckedOut
    "xxxxxxxSystemCheckedOut.bmp",      // Bad - Wrong file extension
    "xxSystem.png",                     // Good
    "xCheckedOut.png"                   // Good
};

foreach (Match match in fileNames.Select(fileName => Regex.Match(fileName, "(\\w+?)(?:(System)|(CheckedOut))+(.png)")))
{
    List<Group> matchedGroups = match.Groups.Cast<Group>().Where(group => !String.IsNullOrEmpty(group.Value)).ToList();
    if (matchedGroups.Count > 0)
    {
        matchedGroups.ForEach(Console.WriteLine);
        Console.WriteLine();
    }
}

Results:

xxxxSystemCheckedOut.png
xxxx
System
CheckedOut
.png

SystemCheckedOut.png
System
CheckedOut
.png

xxSystem.png
xx
System
.png

xCheckedOut.png
x
CheckedOut
.png

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM