简体   繁体   English

C# 正则表达式优化

[英]C# Regex Optimization

I have a C# application, which I'm using RegEx to run an expect from a Unix response.我有一个 C# 应用程序,我正在使用 RegEx 运行来自 Unix 响应的期望。 I currently have this.我目前有这个。

//will pick up :
//  What is your name?:
//  [root@localhost ~]#
//  [root@localhost ~]$
//  Do you want to continue [y/N]
//  Do you want to continue [Y/n]
const string Command_Prompt_Only = @"[$#]|\[.*@(.*?)\][$%#]";
const string Command_Question_Only = @".*\?:|.*\[y/N\]/g";
const string Command_Prompt_Question = Command_Question_Only + "|" + Command_Prompt_Only;

This works as I've tested it with www.regexpal.com , but I think I need some optimization as there are times, it seems to slow way down when I use Command_Prompt_Question.这在我用www.regexpal.com测试过时有效,但我认为我有时需要一些优化,当我使用 Command_Prompt_Question 时它似乎变慢了。

var promptRegex = new Regex(Command_Prompt_Question);
var output = _shellStream.Expect(promptRegex, timeOut);

I might want to mention I'm using SSH.NET to talk to these Linux servers, but I don't think it's a SSH.NET issue because when I use Command_Prompt_Only it's fast.我可能想提一下我正在使用 SSH.NET 与这些 Linux 服务器通信,但我认为这不是 SSH.NET 问题,因为当我使用 Command_Prompt_Only 时它很快。

Does anyone see any issues with the const string I'm using?有没有人发现我正在使用的 const 字符串有任何问题? Is there a better way to do it?有更好的方法吗?

My project is open source if you feel like you want to go play with it.我的项目是开源的,如果你想 go 玩一下。
https://github.com/gavin1970/Linux-Commander https://github.com/gavin1970/Linux-Commander

Code in question: https://github.com/gavin1970/Linux-Commander/blob/master/Linux-Commander/common/Ssh.cs有问题的代码: https://github.com/gavin1970/Linux-Commander/blob/master/Linux-Commander/common/Ssh.cs

It's call Linux Commander and I'm attempting to build a virtual linux console with Ansible support.它叫 Linux Commander,我正在尝试构建一个支持 Ansible 的虚拟 linux 控制台。

Does anyone see any issues with the const string I'm using?有没有人发现我正在使用的 const 字符串有任何问题?

Yes too much backtracking in those patterns.是的,这些模式中的回溯太多了。

If one knows that there is at least one item, specifying a * ( zero or more ) can cause the parser to look over many zero type assertions.如果知道至少有一项,则指定*零个或多个)会导致解析器查看许多零类型断言。 Its better to prefer the + (one or more) multiplier which can shave a lot of time off of researching dead ends in backtracking.最好选择+ (一个或多个)乘数,它可以节省大量研究回溯死胡同的时间。


This is interesting \[.*@(.*?)\] why not use the negative set ( [^ ] ) pattern instead such as this change:这很有趣\[.*@(.*?)\]为什么不使用集 ( [^ ] ) 模式来代替,例如这个变化:

\[[^@]+@[^\]+\]

Which says anchor off of a literal "[" and the find 1 or more items that are not a literal "@" ( [^@]+ ) and then find 1 or more items that are not a literal "]" by [^\]+ .这表示锚定文字“[”并找到 1 个或多个不是文字“@”的项目( [^@]+ ),然后通过[^\]+找到 1 个或多个不是文字“]”的项目[^\]+

Try this:试试这个:

class Foo
{
    const string Command_Prompt_Only     = @"[$#]|\[.*@(.*?)\][$%#]";
    const string Command_Question_Only   = @".*\?:|.*\[y/N\]";

    const string Command_Prompt_Question = "(?:" + Command_Question_Only + ")|(?:" + Command_Prompt_Only + ")";

    private static readonly Regex _promptRegex = new Regex( Command_Prompt_Question, RegexOptions.Compiled );

    public void Foo()
    {
        // ...

        var output = _shellStream.Expect( _promptRegex, timeOut );
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM