[英]Extract specific number from string with fixed pattern in C#
This might sound like a very basic question, but it's one that's given me quite a lot of trouble in C#
. 这听起来像是一个非常基本的问题,但这是给C#
带来很多麻烦的一个问题。
Assume I have, for example, the following String
s known as my chosenTarget.title
s: 例如,假设我有以下String
称为我的chosenTarget.title
:
2008/SD128934 - Wordz aaaaand more words (1233-26-21)
20998/AD1234 - Wordz and less words (1263-21-21)
208/ASD12345 - Wordz and more words (1833-21-21)
Now as you can see, all three String
s are different in some ways. 现在您可以看到,所有三个String
在某些方面都不同。
What I need is to extract a very specific part of these String
s, but getting the subtleties right is what confuses me, and I was wondering if some of you knew better than I. 我需要提取这些String
的非常具体的部分,但是正确处理这些细微之处会使我感到困惑,我想知道你们中的某些人是否比我更了解。
What I know is that the String
s will always come in the following pattern: 我所知道的是String
总是会以以下模式出现:
yearNumber + "/" + aFewLetters + theDesiredNumber + " - " + descriptiveText + " (" + someDate + ")"
In the above example, what I would want to return to me would be: 在上面的示例中,我想返回给我的是:
128934
1234
12345
I need to extract theDesiredNumber
. 我需要提取theDesiredNumber
。
Now, I'm not (that) lazy so I have made a few attempts myself: 现在,我并不那么懒,所以我自己做了一些尝试:
var a = chosenTarget.title.Substring(chosenTarget.title.IndexOf("/") + 1, chosenTarget.title.Length - chosenTarget.title.IndexOf("/"));
What this has done is sliced out yearNumber
and the /
, leaving me with aFewLetter
before theDesiredNumber
. 这是什么做的切出yearNumber
和/
,留下我与aFewLetter
前theDesiredNumber
。
I have a hard time properly removing the rest however, and I was wondering if any of you could aid me in the matter? 但是,我很难将剩余的部分适当地删除,我想知道你们中的任何人是否可以帮助我?
It sounds as if you only need to extract the number behind the first /
which ends at -
. 听起来好像只需要提取第一个/
后面的数字-
。 You could use a combination of string methods and LINQ: 您可以结合使用字符串方法和LINQ:
int startIndex = str.IndexOf("/");
string number = null;
if (startIndex >= 0 )
{
int endIndex = str.IndexOf(" - ", startIndex);
if (endIndex >= 0)
{
startIndex++;
string token = str.Substring(startIndex, endIndex - startIndex); // SD128934
number = String.Concat(token.Where(char.IsDigit)); // 128934
}
}
Another mainly LINQ approach using String.Split
: 另一个主要的使用String.Split
LINQ方法:
number = String.Concat(
str.Split(new[] { " - " }, StringSplitOptions.None)[0]
.Split('/')
.Last()
.Where(char.IsDigit));
Try this: 尝试这个:
int indexSlash = chosenTarget.title.IndexOf("/");
int indexDash = chosenTarget.title.IndexOf("-");
string out = new string(chosenTarget.title.Substring(indexSlash,indexDash-indexSlash).Where(c => Char.IsDigit(c)).ToArray());
You can use a regex: 您可以使用正则表达式:
var pattern = "(?:[0-9]+/\w+)[0-9]";
var matcher = new Regex(pattern);
var result = matcher.Matches(yourEntireSetOfLinesInAString);
Or you can loop every line and use Match instead of Matches. 或者,您可以循环每一行并使用“匹配”而不是“匹配”。 In this case you don't need to build a "matcher" in every iteration but build it outside the loop 在这种情况下,您不需要在每次迭代中都构建一个“匹配器”,而是在循环外构建它。
Regex is your friend: 正则表达式是您的朋友:
(new [] {"2008/SD128934 - Wordz aaaaand more words (1233-26-21)",
"20998/AD1234 - Wordz and less words (1263-21-21)",
"208/ASD12345 - Wordz and more words (1833-21-21)"})
.Select(x => new Regex(@"\d+/[A-Z]+(\d+)").Match(x).Groups[1].Value)
The pattern you had recognized is very important, here is the solution: 您认识到的模式非常重要,这是解决方案:
const string pattern = @"\d+\/[a-zA-Z]+(\d+).*$";
string s1 = @"2008/SD128934 - Wordz aaaaand more words(1233-26-21)";
string s2 = @"20998/AD1234 - Wordz and less words(1263-21-21)";
string s3 = @"208/ASD12345 - Wordz and more words(1833-21-21)";
var strings = new List<string> { s1, s2, s3 };
var desiredNumber = string.Empty;
foreach (var s in strings)
{
var match = Regex.Match(s, pattern);
if (match.Success)
{
desiredNumber = match.Groups[1].Value;
}
}
I would use a RegEx for this, the string you're looking for is in Match.Groups[1] 我将为此使用RegEx,您要查找的字符串在Match.Groups [1]中
string composite = "2008/SD128934 - Wordz aaaaand more words (1233-26-21)";
Match m= Regex.Match(composite,@"^\d{4}\/[a-zA-Z]+(\d+)");
if (m.Success) Console.WriteLine(m.Groups[1]);
The breakdown of the RegEx is as follows RegEx的细分如下
"^\d{4}\/[a-zA-Z]+(\d+)"
^ - Indicates that it's the beginning of the string
\d{4} - Four digits
\/ - /
[a-zA-Z]+ - More than one letters
(\d+) - More than one digits (the parenthesis indicate that this part is captured as a group - in this case group 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.