简体   繁体   中英

Extract specific number from string with fixed pattern in C#

This might sound like a very basic question, but it's one that's given me quite a lot of trouble in C# .

Assume I have, for example, the following String s known as my chosenTarget.title s:

2008/SD128934 - Wordz aaaaand more words (1233-26-21)
20998/AD1234 - Wordz and less words (1263-21-21)
208/ASD12345 - Wordz and more words (1833-21-21)

Now as you can see, all three String s are different in some ways.

What I need is to extract a very specific part of these String s, but getting the subtleties right is what confuses me, and I was wondering if some of you knew better than I.

What I know is that the String s will always come in the following pattern:

yearNumber + "/" + aFewLetters + theDesiredNumber + " - " + descriptiveText + " (" + someDate + ")"

In the above example, what I would want to return to me would be:

128934
1234
12345

I need to extract theDesiredNumber .

Now, I'm not (that) lazy so I have made a few attempts myself:

var a = chosenTarget.title.Substring(chosenTarget.title.IndexOf("/") + 1, chosenTarget.title.Length - chosenTarget.title.IndexOf("/"));

What this has done is sliced out yearNumber and the / , leaving me with aFewLetter before theDesiredNumber .

I have a hard time properly removing the rest however, and I was wondering if any of you could aid me in the matter?

It sounds as if you only need to extract the number behind the first / which ends at - . You could use a combination of string methods and LINQ:

int startIndex = str.IndexOf("/");
string number = null;
if (startIndex >= 0 )
{
    int endIndex = str.IndexOf(" - ", startIndex);
    if (endIndex >= 0)
    {
        startIndex++;
        string token = str.Substring(startIndex, endIndex - startIndex); // SD128934
        number = String.Concat(token.Where(char.IsDigit)); // 128934
    }
}

Another mainly LINQ approach using String.Split :

number = String.Concat(
            str.Split(new[] { " - " }, StringSplitOptions.None)[0]
              .Split('/')
              .Last()
              .Where(char.IsDigit));

Try this:

 int indexSlash = chosenTarget.title.IndexOf("/");
 int indexDash = chosenTarget.title.IndexOf("-");
 string out = new string(chosenTarget.title.Substring(indexSlash,indexDash-indexSlash).Where(c => Char.IsDigit(c)).ToArray());

You can use a regex:

var pattern = "(?:[0-9]+/\w+)[0-9]";
var matcher = new Regex(pattern);
var result = matcher.Matches(yourEntireSetOfLinesInAString);

Or you can loop every line and use Match instead of Matches. In this case you don't need to build a "matcher" in every iteration but build it outside the loop

Regex is your friend:

(new [] {"2008/SD128934 - Wordz aaaaand more words (1233-26-21)",
"20998/AD1234 - Wordz and less words (1263-21-21)",
"208/ASD12345 - Wordz and more words (1833-21-21)"})
.Select(x => new Regex(@"\d+/[A-Z]+(\d+)").Match(x).Groups[1].Value)

The pattern you had recognized is very important, here is the solution:

const string pattern = @"\d+\/[a-zA-Z]+(\d+).*$";
string s1 = @"2008/SD128934 - Wordz aaaaand more words(1233-26-21)";
string s2 = @"20998/AD1234 - Wordz and less words(1263-21-21)";
string s3 = @"208/ASD12345 - Wordz and more words(1833-21-21)";
var strings = new List<string> { s1, s2, s3 };
var desiredNumber = string.Empty;

foreach (var s in strings)
{
    var match = Regex.Match(s, pattern);
    if (match.Success)
    {
        desiredNumber = match.Groups[1].Value;
    }
}

I would use a RegEx for this, the string you're looking for is in Match.Groups[1]

        string composite = "2008/SD128934 - Wordz aaaaand more words (1233-26-21)";
        Match m= Regex.Match(composite,@"^\d{4}\/[a-zA-Z]+(\d+)");
        if (m.Success) Console.WriteLine(m.Groups[1]);

The breakdown of the RegEx is as follows

"^\d{4}\/[a-zA-Z]+(\d+)"

^           - Indicates that it's the beginning of the string
\d{4}       - Four digits
\/          - /
[a-zA-Z]+   - More than one letters
(\d+)       - More than one digits (the parenthesis indicate that this part is captured as a group - in this case group 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM