简体   繁体   中英

Java string.split vs. C# Regex.split - limit to certain number of fields

I am a Java developer, but am working on a C# project. What I need to do is split a String by a delimiter, but limit it to a certain number of fields. In Java, I can do this:

String message = "xx/xx - xxxxxxxxxxxxxxxxxxx - xxxxxxx";
String[] splitMessage = message.split("\\s-", 3);

In this case, it will split it by the - , but I want to also have it check for any space before the dash, and limit it to 3 fields of the String. The String coming through is broken down into ___ - ____________ - _________ with the first space being a date (like 12/31 ) the second space being a message about the string, and the third space being a location tied to the message. The reason I limit it to 3 fields so the array only has 3 elements. The reason I do this is because sometimes the message can have dashes in it to look like this: 12/31 - Test message - test - Test City, 11111 . So my Java code above would split it into this:

0: 12/31
1: Test message - test
2: Test City, 11111

I am trying to achieve something similar in C#, but am not sure how to limit it to a certain number of fields. This is my C# code:

var splitMessage = Regex.Split(Message, " -");

The problem is that without a limit, it splits it into 4 or 5 fields, instead of just the 3. For example, if this were the message: 12/31 - My test - don't use - just a test - Test City, 11111 , it would return a string[] with 5 indexes:

0: 12/31
1: My test
2: don't use
3: just a test
4: Test City, 11111

When I want it to return this:

0: 12/31
1: My test - don't use - just a test
2: Test City, 11111

Before you ask, I can't change the incoming String. I have to parse it the same why I did in Java. So is there an equivalent to limiting it to 3 fields? Is there a better way to do it besides using Regex.Split() ?

If you want to split based on the first and last instance of - , such that you get exactly three fields (so long as there are at least two dashes in the string), C# does actually have a neat trick for this. C# Regex allows for non-fixed-width lookbehinds. So the following regex:

(?<=^[^-]*)-|-(?=[^-]*$)

(<=      //start lookbehind
   ^     //look for start of string
   [^-]* //followed by any amount of non-dash characters
)        //end lookbehind
-        //match the dash
|        //OR
-        //match a dash
(?=      //lookahead for
   [^-]* //any amount of non-dash characters
   $     //then the end of the string
)        //end lookahead

Will match the first and last dash, and allow you to split the string the way you want to.

var splitMessage = Regex.Split(Message, "(?<=^[^-]*)-|-(?=[^-]*$)");

Note that this also has no problem splitting into fewer than three groups, if there are less dashes, but will not split into more than three.

You can't split like with the delimiter inside the one of the desired grouped, except when that is the last group.

You can however use a custom regex that consume as much as possible in the 2nd group to parse the said input:

var splitMessage = Regex.Match("12/31 - Test message - test - Test City, 11111", "^(.+?) - (.+) - (.+)$")
    .Groups
    .Cast<Group>()
    // skip first group which is the entire match
    .Skip(1)
    .Select(x => x.Value)
    .ToArray();

Given that the first group is "xx/xx", you can also opt to use this regex instead:

"^(../..) - (.+) - (.+)$"
// or, assuming they are date
"^(\d{2}/\d{2}) - (.+) - (.+)$"

EDIT: Or, you can just split by " - ", and then concatenate everything in the middle together when there is more than 3 matches:

var groups = "12/31 - Test message - test - Test City, 11111".Split(new[] { " - " }, StringSplitOptions.None);
if (groups.Length > 3)
{
    groups = new[]
    {
        groups[0],
        string.Join(" - ", groups.Skip(1).Take(groups.Length - 2)),
        groups[groups.Length - 1]
    };
}

Whe I have to split a string at certain delimiters including optional spaces, I do it usually this way:

String message = "xx/xx - xxxxxxxxxxxxxxxxxxx - xxxxxxx";
String[] splitMessage = message.split(" *- *", 3);    
System.out.println(Arrays.asList(splitMessage));

Outputs: [xx/xx, xxxxxxxxxxxxxxxxxxx, xxxxxxx]

String message = "12/31 - My test - don't use - just a test - Test City; 11111";
String[] splitMessage = message.split(" *- *", 3);    
System.out.println(Arrays.asList(splitMessage));

Outputs: [12/31, My test, don't use - just a test - Test City; 11111]

But you seem to want that something different:

splitMessage[0] shall contain the first part
splitMessage[1] shall contain the second and third part
splitMessage[2] shall contain the rest

How do you want to tell your computer that the second output element shall contain two parts? I think this is impossible except by splitting the string into all 5 parts and then re-concatenating the parts together as you want.

Maybe it's not clear what result you want. Can you specify the requirement more clearly: What shall happen if the input string contains more than 3 elements?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM