简体   繁体   English

误解重复指令 - 它应该失败,但没有

[英]Misunderstanding repeat directive - it should fail, but doesn't

I would like to write a grammar (highly simplified) with:我想写一个语法(高度简化):

grr := integer [ . integer ]

with

integer ::= digit { [ underline ] digit }

Since the parsed literals are needed again later (the real grammar is more complex, not everything can be converted to a number immediately) the literal must be stored completely as string (more precisely as iterator_range) in the AST for later use (with underline).由于稍后再次需要解析的文字(真正的语法更复杂,并非所有内容都可以立即转换为数字),因此文字必须完全存储为 AST 中的字符串(更准确地说是 iterator_range)以供以后使用(带下划线) .

The problem now is that the literal expressions can be longer than they should be (regarding the implementation/computation etc. later).现在的问题是文字表达式可能比它们应该的更长(关于稍后的实现/计算等)。 The obvious solution is the repeat directive (here detailed for Qi repeat or very short for X3 ).显而易见的解决方案是repeat指令(这里详细说明 Qi 重复X3非常短)。

This is where my problems start ( coliru ):这是我的问题开始的地方( coliru ):

    for(std::string_view const s : {
        // ok
        "0", "10", "1_0", "012345", 
        // too long
        "0123456",
        "1_2_3_4_5_6_7_8_9_0", 
        // absolutely invalid
        "1_2_3_4_5_6_", "_0123_456", ""
    }) {
        auto const cs = x3::char_("0-9");
        std::string attr;
        bool const ok = x3::parse(std::begin(s), std::end(s), 
            x3::raw[ cs >> x3::repeat(0, 5)[ ('_' >> cs) | cs] ],
            attr);
        cout << s << " -> " << attr 
             << " (" << std::boolalpha << ok << ")"
             << "\n";   
    }

gives

0 -> 0 (true)
10 -> 10 (true)
1_0 -> 1_0 (true)
012345 -> 012345 (true)
0123456 -> 012345 (true)
1_2_3_4_5_6_7_8_9_0 -> 1_2_3_4_5_6 (true)
1_2_3_4_5_6_ -> 1_2_3_4_5_6 (true)
_0123_456 ->  (false)
 ->  (false)

If the literal is too long, the parser should fail, which it does not.如果文字太长,解析器应该会失败,但它不会。 If it ends with an underline, it should do that too - but it doesn't.如果它以下划线结尾,它也应该这样做——但事实并非如此。 Underline at the beginning and empty literals are correctly recognized/parsed as false.开头的下划线和空文字被正确识别/解析为假。

Meanwhile, I try to write the more complex parsers into a separate parser classes, but here I am eg missing the rule to recognize the literal ending with an underline....同时,我尝试将更复杂的解析器写入单独的解析器类,但在这里我错过了识别以下划线结尾的文字的规则......

Furthermore, BOOST_SPIRIT_X3_DEBUG seems to be broken all of a sudden - there is no output.此外, BOOST_SPIRIT_X3_DEBUG 似乎突然坏了 - 没有输出。

What is the solution to my problem?我的问题的解决方案是什么? I'm out of ideas except absolutely low-level and complicated via iterator, counter, etc.除了通过迭代器,计数器等绝对低级和复杂之外,我没有想法。

This problem also affects other rules to be implemented.这个问题也会影响其他要实施的规则。

If the literal is too long, the parser should fail如果文字太长,解析器应该会失败

Where does it say that?它在哪里说呢? It looks like the code does exactly what you ask: it parses at most 6 digits with the requisite underscores.看起来代码完全符合您的要求:它最多解析 6 位数字,并带有必要的下划线。 The output even confirms that it does exactly that.输出甚至证实它确实做到了。

You can of course make it much more apparent by showing what was not parsed:您当然可以通过显示解析的内容来使其更加明显:

Live On Coliru 住在科利鲁

auto f = begin(s), l = end(s);
bool const ok = x3::parse(
    f, l, x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]], attr);

fmt::print(
    "{:21} -> {:5} {:13} remaining '{}'\n",
    fmt::format("'{}'", s),
    ok,
    fmt::format("'{}'", attr),
    std::string(f, l));

Prints印刷

'0'                   -> true  '0'           remaining ''
'10'                  -> true  '10'          remaining ''
'1_0'                 -> true  '1_0'         remaining ''
'012345'              -> true  '012345'      remaining ''
'0123456'             -> true  '012345'      remaining '6'
'1_2_3_4_5_6_7_8_9_0' -> true  '1_2_3_4_5_6' remaining '_7_8_9_0'
'1_2_3_4_5_6_'        -> true  '1_2_3_4_5_6' remaining '_'
'_0123_456'           -> false ''            remaining '_0123_456'
''                    -> false ''            remaining ''

Fixes修复

To assert that a complete input be parsed, use either x3::eoi or check the iterators:要断言解析完整的输入,请使用x3::eoi或检查迭代器:

Live On Coliru 住在科利鲁

bool const ok = x3::parse(
    f,
    l,
    x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]] >> x3::eoi,
    attr);

Prints印刷

'0'                   -> true  '0'           remaining ''
'10'                  -> true  '10'          remaining ''
'1_0'                 -> true  '1_0'         remaining ''
'012345'              -> true  '012345'      remaining ''
'0123456'             -> false '012345'      remaining '0123456'
'1_2_3_4_5_6_7_8_9_0' -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_7_8_9_0'
'1_2_3_4_5_6_'        -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_'
'_0123_456'           -> false ''            remaining '_0123_456'
''                    -> false ''            remaining ''

Distinct Lexemes不同的词素

If instead you want to allow the input to continue, just not with certain characters, eg parsing many such "numbers":相反,如果您想允许输入继续,而不是某些字符,例如解析许多这样的“数字”:

auto const number = x3::lexeme[ //
    x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
    // within the lexeme, assert that no digit or _ follows
    >> ! (cs | '_') //
];

Live On Coliru 住在科利鲁

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
    namespace x3 = boost::spirit::x3;
    auto const cs = x3::digit;
    auto const number = x3::lexeme[ //
        x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
        // within the lexeme, assert that no digit or _ follows
        >> ! (cs | '_') //
    ];
    auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
    auto const numbers = x3::skip(ws_or_comment)[number % ','];
} // namespace Parser

int main()
{
    std::vector<std::string> attr;
    std::string_view const s =
        R"(0,
           10,
           1_0,
           012345,
           // too long
           0123456,
           1_2_3_4_5_6_7_8_9_0,
           // absolutely invalid
           1_2_3_4_5_6_,
           _0123_456)"sv;

    auto f = begin(s), l = end(s);
    bool const ok = parse(f, l, Parser::numbers, attr);

    fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints印刷

true: ["0", "10", "1_0", "012345"]
remaining ',
           // too long
           0123456,
           1_2_3_4_5_6_7_8_9_0,
           // absolutely invalid
           1_2_3_4_5_6_,
           _0123_456'

Proving It证明它

To drive home the point of checking inside the lexeme in the presence of otherwise insignificant whitespace:在存在其他无关紧要的空白的情况下,要深入检查词位内部的要点:

auto const numbers = x3::skip(ws_or_comment)[*number];

With a slightly adjusted test input (removing the commas):稍微调整一下测试输入(去掉逗号):

Live On Coliru 住在科利鲁

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
    namespace x3 = boost::spirit::x3;
    auto const cs = x3::digit;
    auto const number = x3::lexeme[ //
        x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
        // within the lexeme, assert that no digit or _ follows
        >> ! (cs | '_') //
    ];
    auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
    auto const numbers = x3::skip(ws_or_comment)[*number];
} // namespace Parser

int main()
{
    std::vector<std::string> attr;
    std::string_view const s =
        R"(0
           10
           1_0
           012345
           // too long
           0123456
           1_2_3_4_5_6_7_8_9_0
           // absolutely invalid
           1_2_3_4_5_6_
           _0123_456)"sv;

    auto f = begin(s), l = end(s);
    bool const ok = parse(f, l, Parser::numbers, attr);

    fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints印刷

true: ["0", "10", "1_0", "012345"]
remaining '0123456
           1_2_3_4_5_6_7_8_9_0
           // absolutely invalid
           1_2_3_4_5_6_
           _0123_456'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM