简体   繁体   English

使用C#删除带有正则表达式的文本行

[英]Stripping text line with regular expression with c #

In the text shown below, I would need to extract the info in between the double quotes (The input is a text file) 在下面显示的文本中,我需要提取双引号之间的信息(输入是文本文件)

Tag = "571EC002A-TD"

Tag = "571GI001-RUN"

Tag = "571GI001-TD"

The output should be, 输出应该是

571EC002A-TD

571GI001-RUN

571GI001-TD

How should I frame my regex in C# to match this and save it to a text file. 如何在C#中构造正则表达式以使其匹配并将其保存到文本文件。

I was successful till reading all the lines into my code, but the regex gives me some undesirable values. 在将所有行读入代码之前,我一直很成功,但是regex给了我一些不希望的值。

thanks and appreciate in advance. 谢谢,并提前感谢。

A simple regex could be: 一个简单的正则表达式可以是:

Regex tagRegex = new Regex(@"Tag\s?=\s?""(.+?)""");

Example with your input 输入示例

UPDATE 更新

For those that ask why not use String.Substring: The great advantage of regular expressions over string operations is that they don't generate temporary strings untily you actually ask for a matched value. 对于那些问为什么不使用String.Substring的人:正则表达式相对于字符串操作的最大优点是,它们不会生成临时字符串,直到您真正要求匹配的值为止。 Matches and groups contain only indexes to the source string. 匹配项和组仅包含源字符串的索引。 This cane be a huge advantage when processing log files. 处理日志文件时,这是一个巨大的优势。


You can match the content of a tag using a regex like 您可以使用以下正则表达式来匹配标签的内容

Tag\s*=\s*"(<tagValue>.*?)"

The ? ? in .*? .*? results in a non-greedy search, ie only text up to the first double quote is extracted. 导致非贪婪搜索,即仅提取直到第一个双引号的文本。 Otherwise the pattern would match everything up to the last double quote. 否则,该模式将匹配所有内容,直到最后一个双引号为止。

(<tagValue>.*?) defines a named group. (<tagValue>.*?)定义一个命名组。 This way you can refer to the actual value captured by name and even use LINQ to process the values 这样,您可以引用按名称捕获的实际值,甚至可以使用LINQ处理值

The resulting C# code may look like this after escaping: 转义后,生成的C#代码可能如下所示:

var myRegex=new Regex("Tag\\s*=\\s*\"(<tagValue>.*?)\"");
...
var tags=myRegex.Matches(someText)
                .OfType<Match>()
                .Select(match=>match.Groups["tagValue"].Value);

The result is an IEnumerable with all tag values. 结果是带有所有标记值的IEnumerable。 You can convert it to an array or List using ToArray() or ToList() just like any other IEnumerable 您可以像其他任何IEnumerable一样,使用ToArray()ToList()将其转换为数组或列表

The equivalent code using a loop would be 使用循环的等效代码是

var myRegex=new Regex("Tag\\s*=\\s*\"(<tagValue>.*?)\"");
...
List<string> tagValues=new List<string>();
foreach(Match m in myRegex.Matches(someText))
{
    tagValues.Add(m.Groups["tagValue"].Value;
}

The LINQ version though can be extended very easily. LINQ版本虽然可以很容易地扩展。 For example, File.ReadLines returns an IEnumerable and doesn't wait to load everything in memory before returning. 例如, File.ReadLines返回IEnumerable,并且在返回之前不等待将所有内容加载到内存中。 You could write something like: 您可以这样写:

var tags=File.ReadLines(myBigLog)
             .SelectMany(line=>myRegex.Matches(line))
             .OfType<Match>()
             .Select(match=>match.Groups["tagValue"].Value);

If the tag names changed, you could also capture the tag name. 如果标签名称更改,您也可以捕获标签名称。 If eg tags have a tag prefix you could use the pattern: 例如,如果标签具有tag前缀,则可以使用以下模式:

(?<tagName>tag\w+)\s*=\s*"(<tagValue>.*?)"

And extract both tag name and value in the Select function, eg : 并在Select函数中提取标签名称和值,例如:

.Select(match=>new {
             TagName=match.Groups["tagName"].Value,
             Value=match.Groups["tagValue"].Value
});

Regex.Matches is thread safe which means you can create one static Regex object and use it repeatedly, or even use PLINQ to match multiple lines in parallel simply by adding AsParallel() before the call to SelectMany . Regex.Matches是线程安全的,这意味着您可以创建一个静态Regex对象并重复使用它,甚至可以通过在调用SelectMany之前添加AsParallel()甚至使用PLINQ来并行匹配多行。

If those strings will always be like that, you can go for a simpler approach by just using Substring : 如果这些字符串总是那样,您可以通过使用Substring来寻求更简单的方法:

line.Substring(7, line.Length - 8)

That will give you your desired output. 这将为您提供所需的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM