简体   繁体   English

如何在C#正则表达式中使用lookbehind以删除换行符?

[英]How can I use lookbehind in a C# Regex in order to remove line breaks?

I have a text file with the repetitve structure as a header and a detail records such as 我有一个带有repetitve结构的文本文件作为标题和详细记录,例如

StopService::
697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::

I want to remove the line break between the header and the detail record so as to process them as a single record, as the detail record can contain line breaks as well I need to remove only the line breaks that follow directly the :: sign. 我想删除标题和详细记录之间的换行符,以便将它们作为单个记录处理,因为详细记录也可以包含换行符,所以我只需要删除直接在::符号后面的换行符。

I'm not a pro when using regular expressions so I searched and tried to use this approach but it doesn't work: 使用正则表达式时我不是专业人士,因此我搜索并尝试使用此方法,但它不起作用:

 string text = File.ReadAllText(path);
 Regex.Replace(text, @"(?<=(:))(?!\1):\n", String.Empty);
 File.WriteAllText(path, text);

I also tried this: 我也试过这个:

Regex.Replace(text, @"(?<=::)\n", String.Empty);

Any idea how I can use a regex look-behind in this case? 知道如何在这种情况下使用正则表达式吗? My output should look like this: 我的输出应如下所示:

StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
    A@gmail.com::0::::

Non-regex Way 非正则表达式

Read a file line by line. 逐行读取文件。 Check the first line and if it is equal to StopService:: do not add a newline ( Environment.Newline ) after it. 检查第一行,如果它等于StopService::请不要在其后添加换行符( Environment.Newline )。


Regex way 正则表达式

You can match the line break after the first :: using a (?<=^[^:]*::) look-behind: 您可以使用(?<=^[^:]*::)后向匹配项来匹配第一个::之后的换行符:

var str = "StopService::\r\n697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to\r\nA@gmail.com::0::::";
var rgx = new Regex(@"(?<=^[^:]*::)[\r\n]+");
Console.WriteLine(rgx.Replace(str, string.Empty));

Output: 输出:

StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::

See IDEONE demo IDEONE演示

The look-behind ( (?<=...) ) matches: 后视( (?<=...) )匹配:

  • ^ - Start of string ^ -字符串开头
  • [^:]* - 0 or more characters other than : [^:]* -除:以外的0个或更多字符
  • :: - 2 colons :: -2个冒号

The [\\r\\n]+ pattern makes sure we match all newline symbols, even if there is more than one. [\\r\\n]+模式可确保我们匹配所有换行符,即使有不止一个。

Try this: 尝试这个:

Regex.Replace(yourtext, @"(?<=[::])[\\r\\n|\\n|\\r]", string.empty);

You were on the right track with the lookbehind idea. 您的后视想法是正确的。 But you need to look for a newline and/or/both a carriage return... 但是您需要寻找换行符和/或两者都回车...

Here's my quick attempt. 这是我的快速尝试。 It may need some tweaks, as I just dummied up two records for input. 可能需要进行一些调整,因为我只是对两个记录进行了虚拟化以供输入。

The approach is to define a Regex that identifies the header, line break, and detail (which may include line breaks). 该方法是定义一个正则表达式,以标识标头,换行符和详细信息(可能包括换行符)。 Then, just run a replace that puts the header back together with the detail, throwing out the header/detail line break. 然后,只需运行一次替换操作即可将标头与明细放回原处,从而排除标头/细节换行符。

The RegexOptions.IgnorePatternWhitespace option is used to allow whitespace in the expression for better readability. RegexOptions.IgnorePatternWhitespace选项用于允许表达式中的空格,以提高可读性。

var text = "StopService::" + Environment.NewLine;
text += "697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "A@gmail.com::0::::"  + Environment.NewLine;
text += "StopService::" + Environment.NewLine;
text += "697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "A@gmail.com::0::::"  + Environment.NewLine;

var options = RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace;
var matchRegex = new Regex("(?<header>\\w+?::) \\r\\n (?<detail>.+?::::)", options );
var replacement = "${header}${detail}";

var newText = matchRegex.Replace(text,replacement);

Produces: 生产:

StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::
StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::

Javascript: 使用Javascript:

yourtext.replace(/(\r\n|\n|\r)/gm," ");

I haven't tested C# one. 我还没有测试过C#。 It should work something like below. 它应该像下面这样工作。

C#: C#:

Regex.Replace(yourtext, @"/(\r\n|\n|\r)/gm", " ");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM