简体   繁体   中英

How to capitalize 1st letter (ignoring non a-z) with regex in c#?

There are tons of posts regarding how to capitalize the first letter with C#, but I specifically am struggling how to do this when ignoring prefixed non-letter characters and tags inside them. Eg,

<style=blah>capitalize the word, 'capitalize'</style>

How to ignore potential <> tags (or non-letter chars before it, like asterisk * ) and the contents within them, THEN capitalize "capitalize"?

I tried:

public static string CapitalizeFirstCharToUpperRegex(string str)
{
    // Check for empty string.  
    if (string.IsNullOrEmpty(str))
        return string.Empty;

    // Return char and concat substring. 
    // Start @ first char, no matter what (avoid <tags>, etc)
    string pattern = @"(^.*?)([a-z])(.+)";

    // Extract middle, then upper 1st char
    string middleUpperFirst = Regex.Replace(str, pattern, "$2");
    middleUpperFirst = CapitalizeFirstCharToUpper(str); // Works

    // Inject the middle back in
    string final = $"$1{middleUpperFirst}$3";
    return Regex.Replace(str, pattern, final);
}

EDIT:

Input: <style=foo>first non-tagged word 1st char upper</style>

Expected output: <style=foo>First non-tagged word 1st char upper</style>

Using look-behind regex feature you can match the first 'capitalize' without > parenthesis and then you can capitalize the output.
The regex is the following:

(?<=<.*>)\w+

It will match the first word after the > parenthesis

You may use

<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)

The regex does the following:

  • <[^<>]*> - matches < , any 0+ chars other than < and > and then >
  • | - or
  • (?<!\\p{L}) - finds a position not immediately preceded with a letter
  • (\\p{L}) - captures into Group 1 any letter
  • (\\p{L}*) - captures into Group 2 any 0+ letters (that is necessary if you want to lowercase the rest of the word).

Then, check if Group 2 matched, and if yes, capitalize the first group value and lowercase the second one, else, return the whole value:

var result = Regex.Replace(s, @"<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)", m =>
                m.Groups[1].Success ? 
                  m.Groups[1].Value.ToUpper() + m.Groups[2].Value.ToLower() :
                  m.Value);

If you do not need to lowercase the rest of the word, remove the second group and the code related to it:

var result = Regex.Replace(s, @"<[^<>]*>|(?<!\p{L})(\p{L})", m =>
                m.Groups[1].Success ? 
                  m.Groups[1].Value.ToUpper() : m.Value);

To only replace the first occurrence using this approach, you need to set a flag and reverse it once the first match is found:

var s = "<style=foo>first non-tagged word 1st char upper</style>";
var found = false;
var result = Regex.Replace(s, @"<[^<>]*>|(?<!\p{L})(\p{L})", m => {
            if (m.Groups[1].Success && !found) { 
                found = !found;
                return m.Groups[1].Value.ToUpper();
            } else {
                return m.Value;
            }
        });
Console.WriteLine(result); // => <style=foo>First non-tagged word 1st char upper</style>

See the C# demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM