C# 正则表达式问题

Question

I am trying to parse the following line:我正在尝试解析以下行：

"\#" TEST #comment hello world

In my input, the #comment always comes at the end of the line.在我的输入中，#comment 总是出现在行尾。 There may or may not be a comment, but if there is, its always in the end of the line.可能有也可能没有评论，但如果有，它总是在行尾。

I used the following Regex to parse it:我使用以下正则表达式来解析它：

(\#.+)?

I have the RegexOption.RightToLeft on.我有RegexOption.RightToLeft 。 I expected it to pull #comment hello world .我希望它能够拉动#comment hello world 。 But instead it is pulling "#" TEST #comment hello world"但相反，它正在拉"#" TEST #comment hello world"

Why is my Regex expression not pulling the right thing and what is the valid Regex expression I need to make it pull correctly?为什么我的正则表达式没有拉出正确的东西，我需要什么有效的正则表达式才能正确拉出？

Answer 1

The important question is: How do you see the difference between the # at the end of the line and the # that starts the comment?重要的问题是：您如何看待行尾的# 和开始注释的# 之间的区别？ Let's assume for simplicity that the last # starts a comment.为简单起见，我们假设最后一个# 开始评论。

In that case, what you want to match is在这种情况下，您要匹配的是

one #一＃
an arbitrary sequence of text not containing #不包含 #的任意文本序列
until the end of the line直到行尾

So let's put that into a regex: #[^#]*$ .所以让我们把它放到一个正则表达式中： #[^#]*$ 。 You don't need RightToLeft for it.您不需要 RightToLeft。 As far as I know, you also don't need to escape # in C# regular expressions.据我所知，您也不需要在 C# 正则表达式中转义# 。

Of course, if you provide information on how to see the difference between a "valid" # and a "comment-starting" #, a more elegant solution could be found that allows for # within comments.当然，如果您提供有关如何查看“有效”# 和“评论开始”# 之间差异的信息，可以找到一个更优雅的解决方案，允许在评论中使用 #。

Answer 2

The + operator tries to match as many times as it can. +运算符尝试尽可能多地匹配。 To match as few times as possible, use its lazy equivalent, +?要尽可能少地匹配，请使用其惰性等效项+? : ：

(#.+?)

Of course, this would give trouble with comments that contain # :当然，这会给包含#的注释带来麻烦：

"\#" TEST #comment #hello #world

Answer 3

Use " #.+".使用“#.+”。 I left the \ out of my test because # is not a recognized escape sequence.我将 \ 排除在测试之外，因为 # 不是可识别的转义序列。 I left out the (, ) and?我省略了 (, ) 和？ because they where not needed.因为他们不需要。

Regex regex = new Regex(" #.+");
Console.WriteLine(regex.Match("#\" TEST #comment hello world"));

Answer 4

For the test string you've given, this regex pulls the comment correctly (with right to left option): /((?: #).+)$/对于您给出的测试字符串，此正则表达式正确提取注释（使用从右到左选项）： /((?: #).+)$/

Disclaimer:免责声明：

Also pulls the whitespace just before the '#', so you may need to do a trim.还会在“#”之前拉出空格，因此您可能需要进行修剪。
Comment cannot contain the sequence ' #' in them评论中不能包含序列“#”

Answer 5

This will match "#" and everything after it, witch is the expected behavior:)这将匹配“#”和它之后的所有内容，女巫是预期的行为:)

var reg = new Regex("#(.)*")

Hope this helps希望这可以帮助

Answer 6

Right, I've tested this one and it seems to do the necessary.是的，我已经测试了这个，它似乎做了必要的事情。

\#.+(\#.+)$

Specifically, it skips past the first #, then captures everything from the second # to the end of the line, returning具体来说，它跳过第一个 #，然后捕获从第二个 # 到行尾的所有内容，返回

#comment hello world

Answer 7

I think you'll find too many edge cases when trying to pull this off with regular expressions.我认为当你试图用正则表达式解决这个问题时，你会发现太多的边缘情况。 Dealing with the quotes is what really complicates things, not to mention escape characters.处理引号是真正使事情复杂化的地方，更不用说转义字符了。

A procedural solution is not complicated, and will be faster and easier to modify as needs dictate.程序解决方案并不复杂，并且可以根据需要更快、更容易地进行修改。 Note that I don't know what the escape characters should be in your example, but you could certainly add that to the algorithm...请注意，我不知道您的示例中的转义字符应该是什么，但您当然可以将其添加到算法中......

string CodeSnippet = Resource1.CodeSnippet;
StringBuilder CleanCodeSnippet = new StringBuilder();
bool InsideQuotes = false;
bool InsideComment = false;

Console.WriteLine("BEFORE");
Console.WriteLine(CodeSnippet);
Console.WriteLine("");

for (int i = 0; i < CodeSnippet.Length; i++)
{
    switch(CodeSnippet[i])
    {
        case '"' : 
            if (!InsideComment) InsideQuotes = !InsideQuotes;
            break;
        case '#' :
            if (!InsideQuotes) InsideComment = true;
            break;
        case '\n' :
            InsideComment = false;
            break;                       
    }

    if (!InsideComment)
    {
        CleanCodeSnippet.Append(CodeSnippet[i]);
    }
}

Console.WriteLine("AFTER");
Console.WriteLine(CleanCodeSnippet.ToString());
Console.WriteLine("");

This example strips the comments away from the CodeSnippet .此示例将注释从CodeSnippet中删除。 I assumed that's what you were after.我以为这就是你所追求的。

Here's the output:这是 output：

BEFORE
"\#" TEST #comment hello world
"ab" TEST #comment hello world
"ab" TEST #comment "hello world
"ab" + "ca" + TEST #comment
"\#" TEST
"ab" TEST

AFTER
"\#" TEST
"ab" TEST
"ab" TEST
"ab" + "ca" + TEST
"\#" TEST
"ab" TEST

As I said, you'll probably need to add escape characters to the algorithm.正如我所说，您可能需要在算法中添加转义字符。 But this is a good starting point.但这是一个很好的起点。

C# 正则表达式问题

问题描述

7 个解决方案

解决方案1
1 2011-07-09 17:24:03

解决方案2
0 2011-07-09 17:22:19

解决方案3
0 2011-07-09 17:31:35

解决方案4
0 2011-07-09 17:34:02

解决方案5
0 2011-07-09 17:35:02

解决方案6
0 2011-07-09 17:41:13

解决方案7
0 已采纳 2011-07-09 18:56:20

C# 正则表达式问题

问题描述

7 个解决方案

解决方案1 1 2011-07-09 17:24:03

解决方案2 0 2011-07-09 17:22:19

解决方案3 0 2011-07-09 17:31:35

解决方案4 0 2011-07-09 17:34:02

解决方案5 0 2011-07-09 17:35:02

解决方案6 0 2011-07-09 17:41:13

解决方案7 0 已采纳 2011-07-09 18:56:20

解决方案1
1 2011-07-09 17:24:03

解决方案2
0 2011-07-09 17:22:19

解决方案3
0 2011-07-09 17:31:35

解决方案4
0 2011-07-09 17:34:02

解决方案5
0 2011-07-09 17:35:02

解决方案6
0 2011-07-09 17:41:13

解决方案7
0 已采纳 2011-07-09 18:56:20