简体   繁体   中英

Regex to keep the last 4 characters of a string of unknown length using C#

I need to use a regular expression to keep the last 4 characters of a string. I don't know the length of the string so I need to start at the end and count backwards. The program is written in c#.

Below are two example strings:

  • 840057
  • 1002945

I need the result to be (last 4 characters):

  • 0057
  • 2945

My original line of code used Regex.Replace but I could not find a regex to work as you can see in the comments below.

replacementVal = Regex.Replace(replacementVal, wildcard.Regex, wildcard.RegexReplaceBy);

I switched my code to use Regex.Match and then the regex (?s)[0-9]{4}$ worked perfectly (see below):

replacementVal = Regex.Replace(replacementVal, wildcard.Regex, wildcard.RegexReplaceBy);

However using Regex.Match breaks other regular expressions that I use, for example I use ^(.).* to retrieve the first letter of a name. This works when using Regex.Replace but fails when using Regex.Match.

My code is below, note the original line containing Regex.Replace is commented out.

Why does Regex.Match work with one expression and Regex.Replace work with another?

      /// Replaces a wildcard in a string
        /// </summary>
        /// <param name="str">The string for which to replace the wildcards</param>
        /// <param name="row">The DataRow in which the string exists</param>
        /// <param name="wildcard">The wildcard to replace</param>
        /// <returns>The string with the wildcard replaced</returns>
        private static string ReplaceWildcardInString(string str, DataRow row, Wildcard wildcard)
        {
            // If the string is null or empty, return it as is
            if (string.IsNullOrEmpty(str))
                return str;

            // This will hold the replacement value
            var replacementVal = string.Empty;

            // If the replacement column value is not empty
            if (!row.IsDBNullOrNull(wildcard.ReplaceByColumnName))
            {
                // Convert its value to string
                replacementVal = row[wildcard.ReplaceByColumnName].ToString();

                // Apply wildcard regex if given
                if (!string.IsNullOrEmpty(wildcard.Regex) && wildcard.RegexReplaceBy != null)
                    //replacementVal = Regex.Replace(replacementVal, wildcard.Regex, wildcard.RegexReplaceBy);
                    replacementVal = Regex.Match(replacementVal, wildcard.Regex).Value;
            }

            // Replace all wildcards with the replacement value (case insensitive)
            var wildcardPattern = Regex.Escape(string.Format("%{0}%", wildcard.Name));
            str = Regex.Replace(str, wildcardPattern, replacementVal, RegexOptions.Singleline | RegexOptions.IgnoreCase);

            // Return the new string
            return str;
        }

Many thank, I appreciate the help.

The Regex.Replace method replaces all non-overlapping substrings that match a regular expression pattern with a specified replacement.

The Regex.Match method searches the specified input string for the first occurrence of the regular expression.

So, when you have a string like 1002945 , and you want to get exactly 4 digits from the end, you may use

var result = Regex.Replace("1002945", @".*([0-9]{4})$", "$1", RegexOptions.Singleline);

or

var matchResult = Regex.Match("1002945", @"[0-9]{4}$");
if (matchResult.Success) 
{
    Console.WriteLine(matchResult.Value);
}

When you replace you must match the whole string, match and capture only the last four characters that are digits and assert the regex index is at the end of the string ( $ ). Note the use of RegexOptions.Singleline option allows . to match newline char, which it does not match by default. The replacement string should be $1 , the replacement backreference to the first capturing group that captures the digits.

When you use Regex.Match("1002945", @"[0-9]{4}$").Value , you match the 4 digits that are followed with either the end of string or a newline and end of string (it is because $ matches like that, if you do not want to allow a match before a newline and end of string, use \\z manchor). When the match is obtained, you may check if it was a success or failure using matchResult.Success , and if there was a match, get the matchResult.Value . You no longer need RegexOptions.Singleline since there is no . in the regex.

.*(?=.{4})$

will match everything up to the four last characters of the string. If you replace that match with String.Empty , only those four characters remain.

If the string contains fewer than four characters, they will remain in the string because the regex won't match at all so there is nothing to replace.

You don't need to use regex for that purpose.

string MyLast4Characters = MyString.Substring(((MyString.Length >= 4) ? (MyString.Length - 4) : (0)));

That part ((MyString.Length >= 4) ? (4) : (0)) is made to check if the original string is longer or equal to 4 characters, then it will return the lasts 4 characters, else the whole string

If this has to be regex, I think you want: .{4}(?=\\s|$)

But I agree that regex probably is not the best solution here.

A breakdown:

. : any character {4} : exacty four times (?= : followed by \\s : white space | : or $ : a line ending ) : end the followed by section

I guess this is something with your RegexOptions . In my example I use SingleLine mode ( (?s) ) and multi-line string:

static void RegexTest()
{
    string str = "i am long string\r\nwith the number 1002945";
    string pattern = @"(?s)[0-9]{4}$"; // or @"(?s).{4}$"
    string num = Regex.Match(str, pattern).Value;
}

I would use the Regex.Match method.
It matches only what you need.

You can use it one of two ways.

string str = "asdf 12345";
if (str.Length > 4)
{
    // Abbreviated ..
    Console.WriteLine( "{0}", Regex.Match(str, @"(?s).{5}$").Value );

    // Verbose ...
    Regex rx = new Regex(@"(?s).{5}$");
    str = rx.Match(str).Value;
    Console.WriteLine( "{0}", str );
}
else {} // Do something else

Output

12345
12345

you can try and use is Reverse() for this purpose

Eg:-

string input = "1002945";
string rev = new string(input.Reverse().ToArray());
string res = null;

Match match = Regex.Match(rev, @"\d{4}");
if (match != null && !String.IsNullOrEmpty(match.Value))
{
   res = new string(match.Value.Reverse().ToArray());
}

output:-

2945

Dot.fiddle sample

I would use Regex.Match as much as possible with the matching groups :

string str = "Hello :) 1002945";
string pattern = @"(.).*(\d{4})$";
Match match = Regex.Match(str, pattern);
if (match.Success)
{
    string firstChar = match.Groups[1].Value;
    string lastNumber = match.Groups[2].Value;
    Console.WriteLine("First character : " + firstChar);
    Console.WriteLine("Last number : " + lastNumber);
}

Output :

First character : H
Last number : 2945

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM