简体   繁体   中英

Format String : Parsing

I have a parsing question. I have a paragraph which has instances of : word . So basically it has a colon, two spaces, a word (could be anything), then two more spaces.

So when I have those instances I want to convert the string so I have

  1. A new line character after : and the word.
  2. Removed the double space after the word.
  3. Replace all double spaces with new line characters.

Don't know exactly how about to do this. I'm using C# to do this. Bullet point 2 above is what I'm having a hard time doing this.

Thanks

Assuming your original string is exactly in the form you described, this will do:

var newString = myString.Trim().Replace("  ", "\n");

The Trim() removes leading and trailing whitespaces, taking care of your spaces at the end of the string.

Then, the Replace replaces the remaining " " two space characters, with a "\n" new line character.

The result is assigned to the newString variable. This is needed, as myString will not change - as strings in .NET are immutable .

I suggest you read up on the String class and all its methods and properties.

Using RegularExpressions will give you exact matches on what you are looking for.

The regex match for a colon, two spaces, a word, then two more spaces is:

Dim reg as New Regex(":    [a-zA-Z]*    ")

[a-zA-Z] will look for any character within the alphabetical range. Can append 0-9 on as well if you accept numbers within the word. The * afterwards indicated that there can be 0 or more instances of the preceding value.

[a-zA-Z]* will attempt to do a full match of any set of contiguous alpha characters.

Upon further reading, you may use [\w] in place of [a-zA-Z0-9] if that's what you are looking for. This will match any 'word' character.

source: http://msdn.microsoft.com/en-us/library/ms972966.aspx

You can retrieve all the matches using reg.Matches(inputString) .

Review http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx for more information on regular expression replacements and your options from there out

edit: Before I was using \s to search for spaces. This will match any whitespace character including tabs, new lines and other. That is not what we want, so I reverted it back to search for exact space characters.

You can try

var str = ":  first  :  second  ";
var result = Regex.Replace(str, ":\\s{2}(?<word>[a-zA-Z0-9]+)\\s{2}",
                                                         ":\n${word}\n");

You can use string.TrimEnd - http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx - to trim spaces at the end of the string.

The following is an example using Regular Expressions. See also this question for more info.

Basically the pattern string tells the regex to look for a colon followed by two spaces. Then we save in a capture group named "word" whatever the word is surrounded by two spaces on either side. Finally two more spaces are specified to finish the pattern.

The replace uses a lambda which says for every match, replace it with a colon, a new line, the "lone" word, and another newline.

string Paragraph = "Jackdaws love my big sphinx of quartz:  fizz  The quick onyx goblin jumps over the lazy dwarf. Where:  buzz  The crazy dogs.";
string Pattern = @":  (?<word>\S*)  ";
string Result = Regex.Replace(Paragraph, Pattern, m =>
    {
        var LoneWord = m.Groups[1].Value;
         return @":" + Environment.NewLine + LoneWord + Environment.NewLine;
    },
    RegexOptions.IgnoreCase);

Input

Jackdaws love my big sphinx of quartz:  fizz  The quick onyx goblin jumps over the lazy dwarf. Where:  buzz  The crazy dogs.

Output

Jackdaws love my big sphinx of quartz:
fizz
The quick onyx goblin jumps over the lazy dwarf. Where:
buzz
The quick brown fox.

Note, for item 3 on your list, if you also want to replace individual occurrences of two spaces with newlines, you could do this:

Result = Result.Replace("  ", Environment.NewLine);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM