简体   繁体   中英

C# Regex Optimization

I have a C# application, which I'm using RegEx to run an expect from a Unix response. I currently have this.

//will pick up :
//  What is your name?:
//  [root@localhost ~]#
//  [root@localhost ~]$
//  Do you want to continue [y/N]
//  Do you want to continue [Y/n]
const string Command_Prompt_Only = @"[$#]|\[.*@(.*?)\][$%#]";
const string Command_Question_Only = @".*\?:|.*\[y/N\]/g";
const string Command_Prompt_Question = Command_Question_Only + "|" + Command_Prompt_Only;

This works as I've tested it with www.regexpal.com , but I think I need some optimization as there are times, it seems to slow way down when I use Command_Prompt_Question.

var promptRegex = new Regex(Command_Prompt_Question);
var output = _shellStream.Expect(promptRegex, timeOut);

I might want to mention I'm using SSH.NET to talk to these Linux servers, but I don't think it's a SSH.NET issue because when I use Command_Prompt_Only it's fast.

Does anyone see any issues with the const string I'm using? Is there a better way to do it?

My project is open source if you feel like you want to go play with it.
https://github.com/gavin1970/Linux-Commander

Code in question: https://github.com/gavin1970/Linux-Commander/blob/master/Linux-Commander/common/Ssh.cs

It's call Linux Commander and I'm attempting to build a virtual linux console with Ansible support.

Does anyone see any issues with the const string I'm using?

Yes too much backtracking in those patterns.

If one knows that there is at least one item, specifying a * ( zero or more ) can cause the parser to look over many zero type assertions. Its better to prefer the + (one or more) multiplier which can shave a lot of time off of researching dead ends in backtracking.


This is interesting \[.*@(.*?)\] why not use the negative set ( [^ ] ) pattern instead such as this change:

\[[^@]+@[^\]+\]

Which says anchor off of a literal "[" and the find 1 or more items that are not a literal "@" ( [^@]+ ) and then find 1 or more items that are not a literal "]" by [^\]+ .

Try this:

class Foo
{
    const string Command_Prompt_Only     = @"[$#]|\[.*@(.*?)\][$%#]";
    const string Command_Question_Only   = @".*\?:|.*\[y/N\]";

    const string Command_Prompt_Question = "(?:" + Command_Question_Only + ")|(?:" + Command_Prompt_Only + ")";

    private static readonly Regex _promptRegex = new Regex( Command_Prompt_Question, RegexOptions.Compiled );

    public void Foo()
    {
        // ...

        var output = _shellStream.Expect( _promptRegex, timeOut );
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM