简体   繁体   中英

Convert Regular Expression pattern from Javascript to PCRE (perl)

This is my javascript regex pattern:

    url = "http://www.amazon.com/gp";    
    hostname = /^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)/.exec(url) || [];
// would return "www.amazon.com"
  • the above regex extracting the hostname from a given url. I need this line to work using pcre (c++). as you can see, I already added another '\\' to each '\\' but its still doesn't work.

what are the additional changes I need to do to make it work in pcre code instead of javascript? or maybe it isn't possible and I need to build entirely new pattern to make it work in pcre?

this is a simple version of my code:

int main(void)
{
    string text = "http://www.amazon.com";
    string hostname;
    pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
    if(re.PartialMatch(text, &hostname)) 
    {
        std::cout << "match: " << hostname << "\n";
    }else{
        std::cout << "no match. \n";
    }       
    return 0;
}

Thanks.

There's no need to convert it, the only thing you have to take care of is the escaping and the / delimiter.

Do note that a regular expression might not be what you want to use here. Or atleast... not like this directly. There are lots of url parsing libraries that are a lot better suited for this task. HTParse for example.

Your C++ code should work but your regex has a lot of optional groups so it's hard to be sure in what group the hostname will end up.

As hacky as it may be, my edit works for this input

string text = "http://www.amazon.com";
string tmp;
string hostname;
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
if(re.PartialMatch(text, &tmp, &tmp, &tmp, &tmp, &tmp, &hostname))
{
    std::cout << "match: " << hostname << "\n";
}else{
    std::cout << "no match. \n";
}
"^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM