[英]C# Regex Expression Issue
I am trying to parse the following line:我正在尝试解析以下行:
"\#" TEST #comment hello world
In my input, the #comment always comes at the end of the line.在我的输入中,#comment 总是出现在行尾。 There may or may not be a comment, but if there is, its always in the end of the line.
可能有也可能没有评论,但如果有,它总是在行尾。
I used the following Regex to parse it:我使用以下正则表达式来解析它:
(\#.+)?
I have the RegexOption.RightToLeft
on.我有
RegexOption.RightToLeft
。 I expected it to pull #comment hello world
.我希望它能够拉动
#comment hello world
。 But instead it is pulling "#" TEST #comment hello world"
但相反,它正在拉
"#" TEST #comment hello world"
Why is my Regex expression not pulling the right thing and what is the valid Regex expression I need to make it pull correctly?为什么我的正则表达式没有拉出正确的东西,我需要什么有效的正则表达式才能正确拉出?
The important question is: How do you see the difference between the # at the end of the line and the # that starts the comment?重要的问题是:您如何看待行尾的# 和开始注释的# 之间的区别? Let's assume for simplicity that the last # starts a comment.
为简单起见,我们假设最后一个# 开始评论。
In that case, what you want to match is在这种情况下,您要匹配的是
So let's put that into a regex: #[^#]*$
.所以让我们把它放到一个正则表达式中:
#[^#]*$
。 You don't need RightToLeft for it.您不需要 RightToLeft。 As far as I know, you also don't need to escape
#
in C# regular expressions.据我所知,您也不需要在 C# 正则表达式中转义
#
。
Of course, if you provide information on how to see the difference between a "valid" # and a "comment-starting" #, a more elegant solution could be found that allows for # within comments.当然,如果您提供有关如何查看“有效”# 和“评论开始”# 之间差异的信息,可以找到一个更优雅的解决方案,允许在评论中使用 #。
The +
operator tries to match as many times as it can. +
运算符尝试尽可能多地匹配。 To match as few times as possible, use its lazy equivalent, +?
要尽可能少地匹配,请使用其惰性等效项
+?
: :
(#.+?)
Of course, this would give trouble with comments that contain #
:当然,这会给包含
#
的注释带来麻烦:
"\#" TEST #comment #hello #world
Use " #.+".使用“#.+”。 I left the \ out of my test because # is not a recognized escape sequence.
我将 \ 排除在测试之外,因为 # 不是可识别的转义序列。 I left out the (, ) and?
我省略了 (, ) 和? because they where not needed.
因为他们不需要。
Regex regex = new Regex(" #.+");
Console.WriteLine(regex.Match("#\" TEST #comment hello world"));
For the test string you've given, this regex pulls the comment correctly (with right to left option): /((?: #).+)$/
对于您给出的测试字符串,此正则表达式正确提取注释(使用从右到左选项):
/((?: #).+)$/
Disclaimer:免责声明:
This will match "#" and everything after it, witch is the expected behavior:)这将匹配“#”和它之后的所有内容,女巫是预期的行为:)
var reg = new Regex("#(.)*")
Hope this helps希望这可以帮助
Right, I've tested this one and it seems to do the necessary.是的,我已经测试了这个,它似乎做了必要的事情。
\#.+(\#.+)$
Specifically, it skips past the first #, then captures everything from the second # to the end of the line, returning具体来说,它跳过第一个 #,然后捕获从第二个 # 到行尾的所有内容,返回
#comment hello world
I think you'll find too many edge cases when trying to pull this off with regular expressions.我认为当你试图用正则表达式解决这个问题时,你会发现太多的边缘情况。 Dealing with the quotes is what really complicates things, not to mention escape characters.
处理引号是真正使事情复杂化的地方,更不用说转义字符了。
A procedural solution is not complicated, and will be faster and easier to modify as needs dictate.程序解决方案并不复杂,并且可以根据需要更快、更容易地进行修改。 Note that I don't know what the escape characters should be in your example, but you could certainly add that to the algorithm...
请注意,我不知道您的示例中的转义字符应该是什么,但您当然可以将其添加到算法中......
string CodeSnippet = Resource1.CodeSnippet;
StringBuilder CleanCodeSnippet = new StringBuilder();
bool InsideQuotes = false;
bool InsideComment = false;
Console.WriteLine("BEFORE");
Console.WriteLine(CodeSnippet);
Console.WriteLine("");
for (int i = 0; i < CodeSnippet.Length; i++)
{
switch(CodeSnippet[i])
{
case '"' :
if (!InsideComment) InsideQuotes = !InsideQuotes;
break;
case '#' :
if (!InsideQuotes) InsideComment = true;
break;
case '\n' :
InsideComment = false;
break;
}
if (!InsideComment)
{
CleanCodeSnippet.Append(CodeSnippet[i]);
}
}
Console.WriteLine("AFTER");
Console.WriteLine(CleanCodeSnippet.ToString());
Console.WriteLine("");
This example strips the comments away from the CodeSnippet
.此示例将注释从
CodeSnippet
中删除。 I assumed that's what you were after.我以为这就是你所追求的。
Here's the output:这是 output:
BEFORE
"\#" TEST #comment hello world
"ab" TEST #comment hello world
"ab" TEST #comment "hello world
"ab" + "ca" + TEST #comment
"\#" TEST
"ab" TEST
AFTER
"\#" TEST
"ab" TEST
"ab" TEST
"ab" + "ca" + TEST
"\#" TEST
"ab" TEST
As I said, you'll probably need to add escape characters to the algorithm.正如我所说,您可能需要在算法中添加转义字符。 But this is a good starting point.
但这是一个很好的起点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.