[英]How to combine boost::spirit::lex & boost::spirit::qi?
I have a lexer and based on that lexer I now want to create a grammar that used the tokens generated by this lexer. 我有一个词法分析器,基于该词法分析器,我现在想要创建一个使用此词法分析器生成的标记的语法。 I tried adapting some examples that I found and now I have something that compiles and works at least a little bit, but one of my tests that should fail does not. 我尝试调整一些我发现的例子,现在我有一些编译和工作至少一点点的东西,但我的一个应该失败的测试不会。 Now I want to know why and I also want to know what I'm actually doing there (I want to understand - I just copied some code from some examples but that doesn't really improve the understanding much). 现在我想知道为什么,我也想知道我在那里做了什么(我想要理解 - 我只是从一些例子中复制了一些代码,但这并没有真正提高理解力)。
Lexer: 词法:
#include <boost/spirit/include/lex_lexertl.hpp>
namespace lex = boost::spirit::lex;
enum LexerIDs { ID_IDENTIFIER, ID_WHITESPACE, ID_INTEGER, ID_FLOAT, ID_PUNCTUATOR };
template <typename Lexer>
struct custom_lexer : lex::lexer<Lexer>
{
custom_lexer()
: identifier("[a-zA-Z_][a-zA-Z0-9_]*")
, white_space("[ \\t\\n]+")
, integer_value("[1-9][0-9]*")
, hex_value("0[xX][0-9a-fA-F]+")
, float_value("[0-9]*\\.[0-9]+([eE][+-]?[0-9]+)?")
, float_value2("[0-9]+\\.([eE][+-]?[0-9]+)?")
, punctuator("\\[|\\]|\\(|\\)|\\.|&>|\\*\\*|\\*|\\+|-|~|!|\\/|%|<<|>>|<|>|<=|>=|==|!=|\\^|&|\\||\\^\\^|&&|\\|\\||\\?|:|,")// [ ] ( ) . &> ** * + - ~ ! / % << >> < > <= >= == != ^ & | ^^ && || ? : ,
{
using boost::spirit::lex::_start;
using boost::spirit::lex::_end;
this->self.add
(identifier, ID_IDENTIFIER)
/*(white_space, ID_WHITESPACE)*/
(integer_value, ID_INTEGER)
(hex_value, ID_INTEGER)
(float_value, ID_FLOAT)
(float_value2, ID_FLOAT)
(punctuator, ID_PUNCTUATOR);
this->self("WS") = white_space;
}
lex::token_def<std::string> identifier;
lex::token_def<lex::omit> white_space;
lex::token_def<int> integer_value;
lex::token_def<int> hex_value;
lex::token_def<double> float_value;
lex::token_def<double> float_value2;
lex::token_def<> punctuator;
};
Grammar: 语法:
namespace qi = boost::spirit::qi;
namespace lex = boost::spirit::lex;
template< typename Iterator, typename Lexer>
struct custom_grammar : qi::grammar<Iterator, qi::in_state_skipper<Lexer>>
{
template< typename TokenDef >
custom_grammar(const TokenDef& tok) : custom_grammar::base_type(ges)
{
ges = qi::token(ID_INTEGER) | qi::token(ID_FLOAT);
BOOST_SPIRIT_DEBUG_NODE(ges);
debug(ges);
}
qi::rule<Iterator, qi::in_state_skipper<Lexer>> ges;
};
And example: 例如:
BOOST_AUTO_TEST_CASE(BasicGrammar)
{
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
std::string test("1234 56");
typedef lex::lexertl::token<char const*, lex::omit, boost::mpl::true_> token_type;
typedef lex::lexertl::lexer<token_type> lexer_type;
typedef custom_lexer<lexer_type>::iterator_type iterator_type;
custom_lexer<lexer_type> my_lexer;
custom_grammar<iterator_type, custom_lexer<lexer_type>::lexer_def> my_grammar(my_lexer);
char const* first = test.c_str();
char const* last = &first[test.size()];
lexer_type::iterator_type iter = my_lexer.begin(first, last);
lexer_type::iterator_type end = my_lexer.end();
bool r = qi::phrase_parse(iter,end,my_grammar, qi::in_state( "WS" )[ my_lexer.self ]);
BOOST_CHECK(r);
}
My assumption is that this returns true because the whitespace is skipped - because auf qi::in_state("WS"). 我的假设是返回true,因为跳过了空格 - 因为auf qi :: in_state(“WS”)。 Is that true? 真的吗? Additionally, I know how I can output additional tokens for whitespace - but then I don't know what to put at the location where the qi::in_stat is now - without it it isn't working. 另外,我知道如何为空格输出额外的令牌 - 但是我不知道在qi :: in_stat现在的位置放什么 - 没有它它不起作用。
Any ideas what I can improve regarding the structure? 关于结构,我可以改进哪些想法? Why is the debug output so funny? 为什么调试输出如此有趣?
<ges>
<try>[]</try>
<success></success>
<attributes>[]</attributes>
</ges>
Thank you for your help. 谢谢您的帮助。
Regards 问候
Tobias 托比亚斯
You parser isn't failing, but no it isn't 'silently' skipping the whitespace either (it parses only one non-whitespace token, anyway). 你的解析器没有失败,但是它没有“静默地”跳过空白(无论如何它只解析一个非空白令牌)。
In fact, a property of *phrase_parse family of Spirit APIs is that it may not match the full input. 实际上,Spirit API的* phrase_parse系列的属性是它可能与完整输入不匹配。 In fact, this is why it takes the first iterator by reference: after parsing the iterator will indicate where parsing stopped. 实际上,这就是它通过引用获取第一个迭代器的原因:在解析迭代器之后将指示解析停止的位置。
I have changed a few bits around so you can easily access the source iterator, by using lex::tokenize_and_phrase_parse
instead of qi::phrase_parse
on lexer_tokens: 我已经更改了几个位,因此您可以通过在qi::phrase_parse
上使用lex::tokenize_and_phrase_parse
而不是qi::phrase_parse
来轻松访问源迭代器:
Iterator first = test.c_str();
Iterator last = &first[test.size()];
bool r = lex::tokenize_and_phrase_parse(first,last,my_lexer,my_grammar,qi::in_state( "WS" )[ my_lexer.self ]);
std::cout << std::boolalpha << r << "\n";
std::cout << "Remaining unparsed: '" << std::string(first,last) << "'\n";
The output is: 输出是:
Remaining unparsed: '56'
Here is a full working example (note I also changed the second parameter of the grammar class to be the Skipper directly, which is more typical for Spirit grammars): 这是一个完整的工作示例(注意我还将语法类的第二个参数直接更改为Skipper,这对于Spirit语法更为典型):
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
namespace qi = boost::spirit::qi;
namespace lex = boost::spirit::lex;
enum LexerIDs { ID_IDENTIFIER, ID_WHITESPACE, ID_INTEGER, ID_FLOAT, ID_PUNCTUATOR };
template <typename Lexer>
struct custom_lexer : lex::lexer<Lexer>
{
custom_lexer()
: identifier ("[a-zA-Z_][a-zA-Z0-9_]*")
, white_space ("[ \\t\\n]+")
, integer_value ("[1-9][0-9]*")
, hex_value ("0[xX][0-9a-fA-F]+")
, float_value ("[0-9]*\\.[0-9]+([eE][+-]?[0-9]+)?")
, float_value2 ("[0-9]+\\.([eE][+-]?[0-9]+)?")
, punctuator ("\\[|\\]|\\(|\\)|\\.|&>|\\*\\*|\\*|\\+|-|~|!|\\/|%|<<|>>|<|>|<=|>=|==|!=|\\^|&|\\||\\^\\^|&&|\\|\\||\\?|:|,")// [ ] ( ) . &> ** * + - ~ ! / % << >> < > <= >= == != ^ & | ^^ && || ? : ,
{
using boost::spirit::lex::_start;
using boost::spirit::lex::_end;
this->self.add
(identifier , ID_IDENTIFIER)
/*(white_space , ID_WHITESPACE)*/
(integer_value, ID_INTEGER)
(hex_value , ID_INTEGER)
(float_value , ID_FLOAT)
(float_value2 , ID_FLOAT)
(punctuator , ID_PUNCTUATOR);
this->self("WS") = white_space;
}
lex::token_def<std::string> identifier;
lex::token_def<lex::omit> white_space;
lex::token_def<int> integer_value;
lex::token_def<int> hex_value;
lex::token_def<double> float_value;
lex::token_def<double> float_value2;
lex::token_def<> punctuator;
};
template< typename Iterator, typename Skipper>
struct custom_grammar : qi::grammar<Iterator, Skipper>
{
template< typename TokenDef >
custom_grammar(const TokenDef& tok) : custom_grammar::base_type(ges)
{
ges = qi::token(ID_INTEGER) | qi::token(ID_FLOAT);
BOOST_SPIRIT_DEBUG_NODE(ges);
}
qi::rule<Iterator, Skipper > ges;
};
int main()
{
std::string test("1234 56");
typedef char const* Iterator;
typedef lex::lexertl::token<Iterator, lex::omit, boost::mpl::true_> token_type;
typedef lex::lexertl::lexer<token_type> lexer_type;
typedef qi::in_state_skipper<custom_lexer<lexer_type>::lexer_def> skipper_type;
typedef custom_lexer<lexer_type>::iterator_type iterator_type;
custom_lexer<lexer_type> my_lexer;
custom_grammar<iterator_type, skipper_type> my_grammar(my_lexer);
Iterator first = test.c_str();
Iterator last = &first[test.size()];
bool r = lex::tokenize_and_phrase_parse(first,last,my_lexer,my_grammar,qi::in_state( "WS" )[ my_lexer.self ]);
std::cout << std::boolalpha << r << "\n";
std::cout << "Remaining unparsed: '" << std::string(first,last) << "'\n";
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.