I am trying to read a text file which has delimiters of space and as well as double quotes and it is there is not a easy way to identify this scenario, I just wanted to check if this can be achieved using predefined Regular expression otherwise I need to start working on custom split
Here is the string
"myfile-one two" "1" 3 1453454.00 -134557.63 585.0 24444.8 -999 "NULL" "" 45.60 "" 67°32'5.23455"N 54°56'65.3454"W "NULL" 6.00
The output should be
myfile-one two
1
3
1453454.00
-134557.63
585.0
24444.8
-999
NULL
45.60
67°32'5.23455"N
54°56'65.3454"W
NULL
6.00
below code try to first split into space delimiter and this split even within the double quotes as well and made as separate entry
char[] space = new Char[] { ' ' };
string[] data = comp.Split(space, StringSplitOptions.RemoveEmptyEntries);
You may match any substrings between double quotes that are not enclosed with whitespaces and capture what is inside them into a named group, or match any 1+ non-whitespace chars and capture into the indentically named group and use
var results = Regex.Matches(str, @"(?<!\S)""(?<o>.*?)""(?!\S)|(?<o>\S+)")
.Cast<Match>()
.Select(m => m.Groups["o"].Value)
.ToList();
See the regex demo .
Pattern details
(?<!\\S)
- a whitespace or start of string is required immediately to the left of the current location "
- a double quotation mark (?<o>.*?)
- Group "o": any 0+ chars other than newline, as few as possible "
- a double quotation mark (?!\\S)
- a whitespace or end of string is required immediately to the right of the current location |
- or (?<o>\\S+)
- Group "o": any 1+ non-whitespace chars. .NET allows the use of the identically named groups inside one regex pattern accumulating the values found into the corresponding memory buffer that you may "collect" via .Select(m => m.Groups["o"].Value)
.
Since regex is impacting performance heavily and the described scenario is quite simple, I would like to offer a short, fast and regex free solution, that makes use of string
members only. In addition, the regex free approach is by far more readable and more robust.
// The escaped input string
var input = @"""myfile-one two"" ""1"" 3 1453454.00 -134557.63 585.0 24444.8 -999 ""NULL"" """" 45.60 """" 67°32'5.23455""N 54°56'65.3454""W ""NULL"" 6.00 ";
List<string> cleanedInputTokens = input
.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(token => token.Trim('"'))
.ToList();
The algorithm first splits the input into tokens and then trims leading and trailing specified characters. Because Split(Char[], StringSplitOptions)
and Trim(Char[])
both accept an array of characters, this pattern is also extensible and flexible.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.