简体   繁体   English

如何分割不一致的字符串的一部分?

[英]How can I split part of a string that is inconsistent?

I have the following string: 我有以下字符串:

01-21-27-0000-00-048 and it is easy to split it apart because each section is separated by a - , but sometimes this string is represented as 01-21-27-0000-00048 , so splitting it is not as easy because the last 2 parts are combined. 01-21-27-0000-00-048 ,很容易将其拆分,因为每个部分都用-分隔,但是有时此字符串表示为01-21-27-0000-00048 ,因此拆分不是之所以容易是因为最后两个部分被合并了。 How can I handle this? 我该如何处理? Also, what about the case where it might be something like 01-21-27-0000-00.048 另外,如果情况可能是01-21-27-0000-00.048

In case anyone is curious, this is a parcel number and it varies from county to county and a county can have 1 format or they can have 100 formats. 万一有人好奇,这是一个包裹号,每个县都有不同,一个县可以有1种格式,也可以有100种格式。

This is a very good case for using regular expressions. 这是使用正则表达式的一个很好的例子。 You string matches the following regexp: 您的字符串与以下正则表达式匹配:

(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})

Match the input against this expression, and harvest the six groups of digits from the match: 将输入与该表达式匹配,并从匹配中收获六组数字:

var str = new[] {
    "01-21-27-0000-00048", "01-21-27-0000-00.048", "01-21-27-0000-00-048"
};
foreach (var s in str) {
    var m = Regex.Match(s, @"(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})");
    for (var i = 1 /* one, not zero */ ; i != m.Groups.Count ; i++) {
        Console.Write("{0} ", m.Groups[i]);
    }
    Console.WriteLine();
}

If you would like to allow for other characters, say, letters in the segments that are separated by dashes, you could use \\w instead of \\d to denote a letter, a digit, or an underscore. 如果要允许其他字符(例如,用短划线分隔的段中的字母),则可以使用\\w代替\\d来表示字母,数字或下划线。 If you would like to allow an unspecified number of such characters within a known range, say, two to four, you can use {2,4} in the regexp instead of the more specific {2} , which means "exactly two". 如果您希望在已知范围内允许未指定数量的此类字符,例如2到4,则可以在正则表达式中使用{2,4}而不是更具体的{2} ,即“正好两个”。 For example, 例如,

(\w{2,3})-(\w{2})-(\w{2})-(\d{4})-(\d{2})[.-]?(\d{3})

lets the first segment contain two to three digits or letters, and also allow for letters in segments two and three. 让第一段包含两到三个数字或字母,还允许第二段和第三段中的字母。

Normalize the string first. 首先规范化字符串。

Ie if you know that the last part is always three characters, then insert a - as the fourth-to-last character, then split the resultant string. 即,如果您知道最后部分始终是三个字符,则插入-作为倒数第四个字符,然后分割结果字符串。 Along the same line, convert the dot '.' 沿着同一行,将点“。”转换为“。”。 to a dash '-' and split that string. 到破折号“-”并拆分该字符串。

Replace all the char which are not digit with emptyString (''). 将所有不是数字的字符替换为emptyString ('')。

then any of your string become in the format like 那么您的任何字符串都将变为以下格式

012127000000048

now you can use the divide it in (2, 2, 2, 4, 2, 3) parts. 现在您可以将其分为(2,2,2,4,4,2,3)部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM