简体   繁体   中英

Regex to clean data from text between delimiters

I have some data that I want to process. It looks something like this:

[data]3456[/data]df[data]3424[/data]33[data]4324[/data]2214[data]3421[/data].. goes on

Anything between [/data] & [data] tags is just filler that I need to remove before the data can be used further. So I'm basically trying to remove df , 33 & 2214 in the above case. I'm trying to use a regex but I don't have a lot of experience using them. The data is in a .txt file and is read line by line. Any help will be appreciated!

while((line = reader.ReadLine()) !=null)
{
writer.WriteLine(Regex.Replace(line, ?? ,));
}

Small edit to the question: This scenario is also possible:

[data]3456[/data]456
435[data]4532[/data]

What to do in such a case?

Approach 1

We just collect all [data]...[/data] :

// Declare the regex as a private static readonly field
private static readonly Regex rx = new Regex(@"\[data\].*?\[/data\]", RegexOptions.Compiled);
// and then in the caller ....
writer.WriteLine(string.Join(string.Empty, rx.Matches(line).Cast<Match>().Select(p => p.Value).ToArray()));

Approach 2

You can use the following regex for a search & replace operation:

[^[\]]*(\[data\][^[]*\[/data\])[^[\]]*

With $1 as replacement.

See demo , the result is [data]3456[/data][data]3424[/data][data]4324[/data][data]3421[/data] (for Input 1) or [data]3456[/data][data]4532[/data] for Input 2 (see Context tab).

In C#:

writer.WriteLine(Regex.Replace(line, @"[^[\]]*(\[data\][^[]*\[/data\])[^[\]]*", "$1"));

Approach 3

Alternatively, you can use Regex.Split with further string.Join() :

var splts = Regex.Split(line, @"(?<=\[data\].*?\[/data\]).*?(?=\[data\]|$)");
writer.WriteLine(string.Join("", splts));

See IDEONE demo

Console.WriteLine(Regex.Replace("[data]3456[/data]df[data]3424[/data]33[data]4324[/data]2214[data]3421[/data]",
    @"(?<=\[/data\]).*?(?=\[data\])", string.Empty));

Replace the value between [/data] and [data]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM