误解重复指令 - 它应该失败，但没有

Question

I would like to write a grammar (highly simplified) with:我想写一个语法（高度简化）：

grr := integer [ . integer ]

with和

integer ::= digit { [ underline ] digit }

Since the parsed literals are needed again later (the real grammar is more complex, not everything can be converted to a number immediately) the literal must be stored completely as string (more precisely as iterator_range) in the AST for later use (with underline).由于稍后再次需要解析的文字（真正的语法更复杂，并非所有内容都可以立即转换为数字），因此文字必须完全存储为 AST 中的字符串（更准确地说是 iterator_range）以供以后使用（带下划线） .

The problem now is that the literal expressions can be longer than they should be (regarding the implementation/computation etc. later).现在的问题是文字表达式可能比它们应该的更长（关于稍后的实现/计算等）。 The obvious solution is the repeat directive (here detailed for Qi repeat or very short for X3 ).显而易见的解决方案是repeat指令（这里详细说明 Qi 重复或X3非常短）。

This is where my problems start ( coliru ):这是我的问题开始的地方（ coliru ）：

    for(std::string_view const s : {
        // ok
        "0", "10", "1_0", "012345", 
        // too long
        "0123456",
        "1_2_3_4_5_6_7_8_9_0", 
        // absolutely invalid
        "1_2_3_4_5_6_", "_0123_456", ""
    }) {
        auto const cs = x3::char_("0-9");
        std::string attr;
        bool const ok = x3::parse(std::begin(s), std::end(s), 
            x3::raw[ cs >> x3::repeat(0, 5)[ ('_' >> cs) | cs] ],
            attr);
        cout << s << " -> " << attr 
             << " (" << std::boolalpha << ok << ")"
             << "\n";   
    }

gives给

0 -> 0 (true)
10 -> 10 (true)
1_0 -> 1_0 (true)
012345 -> 012345 (true)
0123456 -> 012345 (true)
1_2_3_4_5_6_7_8_9_0 -> 1_2_3_4_5_6 (true)
1_2_3_4_5_6_ -> 1_2_3_4_5_6 (true)
_0123_456 ->  (false)
 ->  (false)

If the literal is too long, the parser should fail, which it does not.如果文字太长，解析器应该会失败，但它不会。 If it ends with an underline, it should do that too - but it doesn't.如果它以下划线结尾，它也应该这样做——但事实并非如此。 Underline at the beginning and empty literals are correctly recognized/parsed as false.开头的下划线和空文字被正确识别/解析为假。

Meanwhile, I try to write the more complex parsers into a separate parser classes, but here I am eg missing the rule to recognize the literal ending with an underline....同时，我尝试将更复杂的解析器写入单独的解析器类，但在这里我错过了识别以下划线结尾的文字的规则......

Furthermore, BOOST_SPIRIT_X3_DEBUG seems to be broken all of a sudden - there is no output.此外， BOOST_SPIRIT_X3_DEBUG 似乎突然坏了 - 没有输出。

What is the solution to my problem?我的问题的解决方案是什么？ I'm out of ideas except absolutely low-level and complicated via iterator, counter, etc.除了通过迭代器，计数器等绝对低级和复杂之外，我没有想法。

This problem also affects other rules to be implemented.这个问题也会影响其他要实施的规则。

Answer 1

If the literal is too long, the parser should fail如果文字太长，解析器应该会失败

Where does it say that?它在哪里说呢？ It looks like the code does exactly what you ask: it parses at most 6 digits with the requisite underscores.看起来代码完全符合您的要求：它最多解析 6 位数字，并带有必要的下划线。 The output even confirms that it does exactly that.输出甚至证实它确实做到了。

You can of course make it much more apparent by showing what was not parsed:您当然可以通过显示未解析的内容来使其更加明显：

Live On Coliru 住在科利鲁

auto f = begin(s), l = end(s);
bool const ok = x3::parse(
    f, l, x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]], attr);

fmt::print(
    "{:21} -> {:5} {:13} remaining '{}'\n",
    fmt::format("'{}'", s),
    ok,
    fmt::format("'{}'", attr),
    std::string(f, l));

Prints印刷

'0'                   -> true  '0'           remaining ''
'10'                  -> true  '10'          remaining ''
'1_0'                 -> true  '1_0'         remaining ''
'012345'              -> true  '012345'      remaining ''
'0123456'             -> true  '012345'      remaining '6'
'1_2_3_4_5_6_7_8_9_0' -> true  '1_2_3_4_5_6' remaining '_7_8_9_0'
'1_2_3_4_5_6_'        -> true  '1_2_3_4_5_6' remaining '_'
'_0123_456'           -> false ''            remaining '_0123_456'
''                    -> false ''            remaining ''

Fixes修复

To assert that a complete input be parsed, use either x3::eoi or check the iterators:要断言解析完整的输入，请使用x3::eoi或检查迭代器：

Live On Coliru 住在科利鲁

bool const ok = x3::parse(
    f,
    l,
    x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]] >> x3::eoi,
    attr);

Prints印刷

'0'                   -> true  '0'           remaining ''
'10'                  -> true  '10'          remaining ''
'1_0'                 -> true  '1_0'         remaining ''
'012345'              -> true  '012345'      remaining ''
'0123456'             -> false '012345'      remaining '0123456'
'1_2_3_4_5_6_7_8_9_0' -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_7_8_9_0'
'1_2_3_4_5_6_'        -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_'
'_0123_456'           -> false ''            remaining '_0123_456'
''                    -> false ''            remaining ''

Distinct Lexemes不同的词素

If instead you want to allow the input to continue, just not with certain characters, eg parsing many such "numbers":相反，如果您想允许输入继续，而不是某些字符，例如解析许多这样的“数字”：

auto const number = x3::lexeme[ //
    x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
    // within the lexeme, assert that no digit or _ follows
    >> ! (cs | '_') //
];

Live On Coliru 住在科利鲁

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
    namespace x3 = boost::spirit::x3;
    auto const cs = x3::digit;
    auto const number = x3::lexeme[ //
        x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
        // within the lexeme, assert that no digit or _ follows
        >> ! (cs | '_') //
    ];
    auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
    auto const numbers = x3::skip(ws_or_comment)[number % ','];
} // namespace Parser

int main()
{
    std::vector<std::string> attr;
    std::string_view const s =
        R"(0,
           10,
           1_0,
           012345,
           // too long
           0123456,
           1_2_3_4_5_6_7_8_9_0,
           // absolutely invalid
           1_2_3_4_5_6_,
           _0123_456)"sv;

    auto f = begin(s), l = end(s);
    bool const ok = parse(f, l, Parser::numbers, attr);

    fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints印刷

true: ["0", "10", "1_0", "012345"]
remaining ',
           // too long
           0123456,
           1_2_3_4_5_6_7_8_9_0,
           // absolutely invalid
           1_2_3_4_5_6_,
           _0123_456'

Proving It证明它

To drive home the point of checking inside the lexeme in the presence of otherwise insignificant whitespace:在存在其他无关紧要的空白的情况下，要深入检查词位内部的要点：

auto const numbers = x3::skip(ws_or_comment)[*number];

With a slightly adjusted test input (removing the commas):稍微调整一下测试输入（去掉逗号）：

Live On Coliru 住在科利鲁

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
    namespace x3 = boost::spirit::x3;
    auto const cs = x3::digit;
    auto const number = x3::lexeme[ //
        x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
        // within the lexeme, assert that no digit or _ follows
        >> ! (cs | '_') //
    ];
    auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
    auto const numbers = x3::skip(ws_or_comment)[*number];
} // namespace Parser

int main()
{
    std::vector<std::string> attr;
    std::string_view const s =
        R"(0
           10
           1_0
           012345
           // too long
           0123456
           1_2_3_4_5_6_7_8_9_0
           // absolutely invalid
           1_2_3_4_5_6_
           _0123_456)"sv;

    auto f = begin(s), l = end(s);
    bool const ok = parse(f, l, Parser::numbers, attr);

    fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints印刷

true: ["0", "10", "1_0", "012345"]
remaining '0123456
           1_2_3_4_5_6_7_8_9_0
           // absolutely invalid
           1_2_3_4_5_6_
           _0123_456'

误解重复指令 - 它应该失败，但没有

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-07-08 19:46:49

Fixes修复

Distinct Lexemes不同的词素

Proving It证明它

误解重复指令 - 它应该失败，但没有

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-07-08 19:46:49

Fixes修复

Distinct Lexemes不同的词素

Proving It证明它

解决方案1
2 已采纳 2022-07-08 19:46:49