[英]Boost::Regex throwing an error when a long expression doesn't match
我有兩個正則表達式。 一個用於匹配python樣式的注釋,另一個用於匹配文件路徑。
當我嘗試查看注釋是否與文件路徑表達式匹配時,如果注釋字符串長於〜15個字符,它將引發錯誤。 否則,它會按預期運行。
我該如何修改我的正則表達式,使其不存在此問題
樣例代碼:
#include <string>
#include "boost/regex.hpp"
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
boost::regex re_comment("\\s*#[^\\r\\n]*");
boost::regex re_path("\"?([A-Za-z]:)?[\\\\/]?(([^(\\\\/:*?\"<>|\\r\\n)]+[\\\\/]?)+)?\\.[\\w]+\"?");
string shortComment = " #comment ";
string longComment = "#123456789012345678901234567890";
string myPath = "C:/this/is.a/path.doc";
regex_match(shortComment,re_comment); //evaluates to true
regex_match(longComment,re_comment); //evaluates to true
regex_match(myPath, re_path); //evaluates to true
regex_match(shortComment, re_path); //evaluates to false
regex.match(longComment, re_path); //throws error
}
這是引發的錯誤
terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail
::error_info_injector<std::runtime_error> >'
what(): The complexity of matching the regular expression exceeded predefined
bounds. Try refactoring the regular expression to make each choice made by the
state machine unambiguous. This exception is thrown to prevent "eternal" matches
that take an indefinite period time to locate.
我知道總是創建一個巨大的正則表達式來解決世界上所有的問題是很誘人的,確實這樣做可能有性能上的原因,但是在構建這種怪異的東西時,您還必須考慮正在創建的維護噩夢。 話雖如此,我建議將問題分解為可管理的部分。
基本上要處理引號,在目錄分隔符上分割字符串,然后對路徑的每個部分進行正則表達式。
#include <string>
#include "boost/regex.hpp"
#include "boost/algorithm/string.hpp"
using namespace std;
using namespace boost;
bool my_path_match(std::string line)
{
bool ret = true;
string drive = "([a-zA-Z]\\:)?";
string pathElem = "(\\w|\\.|\\s)+";
boost::regex re_pathElem(pathElem);
boost::regex re_drive("(" + drive + "|" + pathElem + ")");
vector<string> split_line;
vector<string>::iterator it;
if ((line.front() == '"') && (line.back() == '"'))
{
line.erase(0, 1); // erase the first character
line.erase(line.size() - 1); // erase the last character
}
split(split_line, line, is_any_of("/\\"));
if (regex_match(split_line[0], re_drive) == false)
{
ret = false;
}
else
{
for (it = (split_line.begin() + 1); it != split_line.end(); it++)
{
if (regex_match(*it, re_pathElem) == false)
{
ret = false;
break;
}
}
}
return ret;
}
int main(int argc, char** argv)
{
boost::regex re_comment("^.*#.*$");
string shortComment = " #comment ";
string longComment = "#123456789012345678901234567890";
vector<string> testpaths;
vector<string> paths;
vector<string>::iterator it;
testpaths.push_back("C:/this/is.a/path.doc");
testpaths.push_back("C:/this/is also .a/path.doc");
testpaths.push_back("/this/is also .a/path.doc");
testpaths.push_back("./this/is also .a/path.doc");
testpaths.push_back("this/is also .a/path.doc");
testpaths.push_back("this/is 1 /path.doc");
bool ret;
ret = regex_match(shortComment, re_comment); //evaluates to true
cout<<"should be true = "<<ret<<endl;
ret = regex_match(longComment, re_comment); //evaluates to true
cout<<"should be true = "<<ret<<endl;
string quotes;
for (it = testpaths.begin(); it != testpaths.end(); it++)
{
paths.push_back(*it);
quotes = "\"" + *it + "\""; // test quoted paths
paths.push_back(quotes);
std::replace(it->begin(), it->end(), '/', '\\'); // test backslash paths
std::replace(quotes.begin(), quotes.end(), '/', '\\'); // test backslash quoted paths
paths.push_back(*it);
paths.push_back(quotes);
}
for (it = paths.begin(); it != paths.end(); it++)
{
ret = my_path_match(*it); //evaluates to true
cout<<"should be true = "<<ret<<"\t"<<*it<<endl;
}
ret = my_path_match(shortComment); //evaluates to false
cout<<"should be false = "<<ret<<endl;
ret = my_path_match(longComment); //evaluates to false
cout<<"should be false = "<<ret<<endl;
}
是的,它將(可能)比僅使用一個正則表達式要慢,但是它將起作用,它不會在python注釋行上引發錯誤,並且如果您發現失敗的路徑/注釋,則應該能夠弄清楚錯誤並進行修復(即可維護)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.