简体   繁体   中英

How can I use lookbehind in a C# Regex in order to remove line breaks?

I have a text file with the repetitve structure as a header and a detail records such as

StopService::
697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::

I want to remove the line break between the header and the detail record so as to process them as a single record, as the detail record can contain line breaks as well I need to remove only the line breaks that follow directly the :: sign.

I'm not a pro when using regular expressions so I searched and tried to use this approach but it doesn't work:

 string text = File.ReadAllText(path);
 Regex.Replace(text, @"(?<=(:))(?!\1):\n", String.Empty);
 File.WriteAllText(path, text);

I also tried this:

Regex.Replace(text, @"(?<=::)\n", String.Empty);

Any idea how I can use a regex look-behind in this case? My output should look like this:

StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
    A@gmail.com::0::::

Non-regex Way

Read a file line by line. Check the first line and if it is equal to StopService:: do not add a newline ( Environment.Newline ) after it.


Regex way

You can match the line break after the first :: using a (?<=^[^:]*::) look-behind:

var str = "StopService::\r\n697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to\r\nA@gmail.com::0::::";
var rgx = new Regex(@"(?<=^[^:]*::)[\r\n]+");
Console.WriteLine(rgx.Replace(str, string.Empty));

Output:

StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::

See IDEONE demo

The look-behind ( (?<=...) ) matches:

  • ^ - Start of string
  • [^:]* - 0 or more characters other than :
  • :: - 2 colons

The [\\r\\n]+ pattern makes sure we match all newline symbols, even if there is more than one.

Try this:

Regex.Replace(yourtext, @"(?<=[::])[\\r\\n|\\n|\\r]", string.empty);

You were on the right track with the lookbehind idea. But you need to look for a newline and/or/both a carriage return...

Here's my quick attempt. It may need some tweaks, as I just dummied up two records for input.

The approach is to define a Regex that identifies the header, line break, and detail (which may include line breaks). Then, just run a replace that puts the header back together with the detail, throwing out the header/detail line break.

The RegexOptions.IgnorePatternWhitespace option is used to allow whitespace in the expression for better readability.

var text = "StopService::" + Environment.NewLine;
text += "697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "A@gmail.com::0::::"  + Environment.NewLine;
text += "StopService::" + Environment.NewLine;
text += "697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "A@gmail.com::0::::"  + Environment.NewLine;

var options = RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace;
var matchRegex = new Regex("(?<header>\\w+?::) \\r\\n (?<detail>.+?::::)", options );
var replacement = "${header}${detail}";

var newText = matchRegex.Replace(text,replacement);

Produces:

StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::
StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::

Javascript:

yourtext.replace(/(\r\n|\n|\r)/gm," ");

I haven't tested C# one. It should work something like below.

C#:

Regex.Replace(yourtext, @"/(\r\n|\n|\r)/gm", " ");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM