简体   繁体   中英

C# sort and put back Regex.matches

Is there any way of using RegEx.Matches to find, and write back matched values but in different (alphabetical) order?

For now I have something like:

var pattern = @"(KEY `[\w]+?` \(`.*`*\))";
var keys = Regex.Matches(line, pattern);

Console.WriteLine("\n\n");
foreach (Match match in keys)
{
    Console.WriteLine(match.Index + " = " + match.Value.Replace("\n", "").Trim());
}

But what I really need is to take table.sql dump and sort existing INDEXES alphabetically, example code:

line = "...PRIMARY KEY (`communication_auto`),\n  KEY `idx_current` (`current`),\n  KEY `idx_communication` (`communication_id`,`current`),\n  KEY `idx_volunteer` (`volunteer_id`,`current`),\n  KEY `idx_template` (`template_id`,`current`)\n);"

Thanks J


Update: Thanks, m.buettner solution gave me basics that I could use to move on. I'm not so good at RegEx sadly, but I ended up with code that I believe can be still improved:

...
//sort INDEXES definitions alphabetically
if (line.Contains("  KEY `")) line = Regex.Replace(
    line,
    @"[ ]+(KEY `[\w]+` \([\w`,]+\),?\s*)+",
    ReplaceCallbackLinq
);

static string ReplaceCallbackLinq(Match match) 
{
    var result = String.Join(",\n  ",
        from Capture item in match.Groups[1].Captures
        orderby item.Value.Trim()
        select item.Value.Trim().Replace("),", ")")
    );
    return "  " + result + "\n";
}


Update: There is also a case when index field is longer than 255 chars mysql trims index up to 255 and writes it like this:

KEY `idx3` (`app_property_definition_id`,`value`(255),`audit_current`),

so, in order to match this case too I had to change some code: in ReplaceCallbackLinq:

select item.Value.Trim().Replace("`),", "`)")

and regex definition to:

@"[ ]+(KEY `[\w]+` \([\w`(\(255\)),]+\),?\s*)+",

This cannot be done with regex alone. But you could use a callback function and make use of .NET's unique capability of capturing multiple things with the same capturing group. This way you avoid using Matches and writing everything back by yourself. Instead you can use the built-in Replace function. My example below simply sorts the KEY phrases and puts them back as they were (so it does nothing but sort they phrases within the SQL statement). If you want a different output you can easily achieve that by capturing different parts of the pattern and adjusting the Join operation at the very end.

First we need a match evaluator to pass the callback:

MatchEvaluator evaluator = new MatchEvaluator(ReplaceCallback);

Then we write a regex that matches the whole set of indices at once, capturing the index-names in a capturing group. We put this in the overload of Replace that takes an evaluator:

output = Regex.Replace(
    input,
    @"(KEY `([\w]+)` \(`[^`]*`(?:,`[^`]*`)*\),?\s*)+",
    evaluator
);

Now in most languages this would not be useful, because due to the repetition capturing group 1 would always contain only the first or last thing that was captured (same as capturing group 2). But luckily, you are using C#, and .NET's regex engine is just one powerful beast. So let's have a look at the callback function and how to use the multiple captures:

static string ReplaceCallback(Match match)
{
    int captureCount = match.Groups[1].Captures.Count;
    string[] indexNameArray = new string[captureCount];
    string[] keyBlockArray = new string[captureCount];
    for (int i = 0; i < captureCount; i++)
    {
        keyBlockArray[i] = match.Groups[1].Captures[i].Value;
        indexNameArray[i] = match.Groups[2].Captures[i].Value;
    }
    Array.Sort(indexNameArray, keyBlockArray);
    return String.Join("\n  ", keyBlockArray);
}

match.Groups[i].Captures lets us access the multiple captures of a single group. Since these are Capture objects which do not seem really useful right now, we build two string arrays from their values. Then we use Array.Sort which sorts two arrays based on the values of one (which is considered the key). As the "key" we use the capturing of the table name. As the "value" we use the full capture of one complete KEY ..., block. This sorts the full blocks by their names. Then we can simply join together the blocks, add in the whitespace separator that was used before and return them.

Not sure if I fully understand the question, but does changing the foreach to:

foreach (Match match in keys.Cast<Match>().OrderBy(m => m.Value))

do what you want?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM