[英]How can I use lookbehind in a C# Regex in order to remove line breaks?
I have a text file with the repetitve structure as a header and a detail records such as 我有一个带有repetitve结构的文本文件作为标题和详细记录,例如
StopService::
697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::
I want to remove the line break between the header and the detail record so as to process them as a single record, as the detail record can contain line breaks as well I need to remove only the line breaks that follow directly the ::
sign. 我想删除标题和详细记录之间的换行符,以便将它们作为单个记录处理,因为详细记录也可以包含换行符,所以我只需要删除直接在::
符号后面的换行符。
I'm not a pro when using regular expressions so I searched and tried to use this approach but it doesn't work: 使用正则表达式时我不是专业人士,因此我搜索并尝试使用此方法,但它不起作用:
string text = File.ReadAllText(path);
Regex.Replace(text, @"(?<=(:))(?!\1):\n", String.Empty);
File.WriteAllText(path, text);
I also tried this: 我也试过这个:
Regex.Replace(text, @"(?<=::)\n", String.Empty);
Any idea how I can use a regex look-behind in this case? 知道如何在这种情况下使用正则表达式吗? My output should look like this: 我的输出应如下所示:
StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::
Read a file line by line. 逐行读取文件。 Check the first line and if it is equal to StopService::
do not add a newline ( Environment.Newline
) after it. 检查第一行,如果它等于StopService::
请不要在其后添加换行符( Environment.Newline
)。
You can match the line break after the first ::
using a (?<=^[^:]*::)
look-behind: 您可以使用(?<=^[^:]*::)
后向匹配项来匹配第一个::
之后的换行符:
var str = "StopService::\r\n697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to\r\nA@gmail.com::0::::";
var rgx = new Regex(@"(?<=^[^:]*::)[\r\n]+");
Console.WriteLine(rgx.Replace(str, string.Empty));
Output: 输出:
StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::
See IDEONE demo 见IDEONE演示
The look-behind ( (?<=...)
) matches: 后视( (?<=...)
)匹配:
^
- Start of string ^
-字符串开头 [^:]*
- 0 or more characters other than :
[^:]*
-除:
以外的0个或更多字符 ::
- 2 colons ::
-2个冒号 The [\\r\\n]+
pattern makes sure we match all newline symbols, even if there is more than one. [\\r\\n]+
模式可确保我们匹配所有换行符,即使有不止一个。
Try this: 尝试这个:
Regex.Replace(yourtext, @"(?<=[::])[\\r\\n|\\n|\\r]", string.empty);
You were on the right track with the lookbehind idea. 您的后视想法是正确的。 But you need to look for a newline and/or/both a carriage return... 但是您需要寻找换行符和/或两者都回车...
Here's my quick attempt. 这是我的快速尝试。 It may need some tweaks, as I just dummied up two records for input. 可能需要进行一些调整,因为我只是对两个记录进行了虚拟化以供输入。
The approach is to define a Regex that identifies the header, line break, and detail (which may include line breaks). 该方法是定义一个正则表达式,以标识标头,换行符和详细信息(可能包括换行符)。 Then, just run a replace that puts the header back together with the detail, throwing out the header/detail line break. 然后,只需运行一次替换操作即可将标头与明细放回原处,从而排除标头/细节换行符。
The RegexOptions.IgnorePatternWhitespace option is used to allow whitespace in the expression for better readability. RegexOptions.IgnorePatternWhitespace选项用于允许表达式中的空格,以提高可读性。
var text = "StopService::" + Environment.NewLine;
text += "697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "A@gmail.com::0::::" + Environment.NewLine;
text += "StopService::" + Environment.NewLine;
text += "697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "A@gmail.com::0::::" + Environment.NewLine;
var options = RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace;
var matchRegex = new Regex("(?<header>\\w+?::) \\r\\n (?<detail>.+?::::)", options );
var replacement = "${header}${detail}";
var newText = matchRegex.Replace(text,replacement);
Produces: 生产:
StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::
StopService::697::12::test::20::a@yahoo.com::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
A@gmail.com::0::::
Javascript: 使用Javascript:
yourtext.replace(/(\r\n|\n|\r)/gm," ");
I haven't tested C# one. 我还没有测试过C#。 It should work something like below. 它应该像下面这样工作。
C#: C#:
Regex.Replace(yourtext, @"/(\r\n|\n|\r)/gm", " ");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.