简体   繁体   中英

Stripping text line with regular expression with c #

In the text shown below, I would need to extract the info in between the double quotes (The input is a text file)

Tag = "571EC002A-TD"

Tag = "571GI001-RUN"

Tag = "571GI001-TD"

The output should be,

571EC002A-TD

571GI001-RUN

571GI001-TD

How should I frame my regex in C# to match this and save it to a text file.

I was successful till reading all the lines into my code, but the regex gives me some undesirable values.

thanks and appreciate in advance.

A simple regex could be:

Regex tagRegex = new Regex(@"Tag\s?=\s?""(.+?)""");

Example with your input

UPDATE

For those that ask why not use String.Substring: The great advantage of regular expressions over string operations is that they don't generate temporary strings untily you actually ask for a matched value. Matches and groups contain only indexes to the source string. This cane be a huge advantage when processing log files.


You can match the content of a tag using a regex like

Tag\s*=\s*"(<tagValue>.*?)"

The ? in .*? results in a non-greedy search, ie only text up to the first double quote is extracted. Otherwise the pattern would match everything up to the last double quote.

(<tagValue>.*?) defines a named group. This way you can refer to the actual value captured by name and even use LINQ to process the values

The resulting C# code may look like this after escaping:

var myRegex=new Regex("Tag\\s*=\\s*\"(<tagValue>.*?)\"");
...
var tags=myRegex.Matches(someText)
                .OfType<Match>()
                .Select(match=>match.Groups["tagValue"].Value);

The result is an IEnumerable with all tag values. You can convert it to an array or List using ToArray() or ToList() just like any other IEnumerable

The equivalent code using a loop would be

var myRegex=new Regex("Tag\\s*=\\s*\"(<tagValue>.*?)\"");
...
List<string> tagValues=new List<string>();
foreach(Match m in myRegex.Matches(someText))
{
    tagValues.Add(m.Groups["tagValue"].Value;
}

The LINQ version though can be extended very easily. For example, File.ReadLines returns an IEnumerable and doesn't wait to load everything in memory before returning. You could write something like:

var tags=File.ReadLines(myBigLog)
             .SelectMany(line=>myRegex.Matches(line))
             .OfType<Match>()
             .Select(match=>match.Groups["tagValue"].Value);

If the tag names changed, you could also capture the tag name. If eg tags have a tag prefix you could use the pattern:

(?<tagName>tag\w+)\s*=\s*"(<tagValue>.*?)"

And extract both tag name and value in the Select function, eg :

.Select(match=>new {
             TagName=match.Groups["tagName"].Value,
             Value=match.Groups["tagValue"].Value
});

Regex.Matches is thread safe which means you can create one static Regex object and use it repeatedly, or even use PLINQ to match multiple lines in parallel simply by adding AsParallel() before the call to SelectMany .

If those strings will always be like that, you can go for a simpler approach by just using Substring :

line.Substring(7, line.Length - 8)

That will give you your desired output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM