简体   繁体   English

使用解析器组合器解析带有转义字符的字符串?

[英]Using parser-combinators to parse string with escaped characters?

I'm trying to use the combine library in Rust to parse a string.我正在尝试使用 Rust 中的组合库来解析字符串。 The real data that I'm trying to parse looks something like this:我试图解析的真实数据如下所示:

A79,216,0,4,2,2,N,"US\"PS"

So at the end of that data is a string in quotes, but the string will contain escaped characters as well.所以在该数据的末尾是一个带引号的字符串,但该字符串也将包含转义字符。 I can't figure out how to parse those escaped characters in between the other quotes.我不知道如何解析其他引号之间的那些转义字符。

extern crate parser_combinators;

use self::parser_combinators::*;

fn main() {
    let s = r#""HE\"LLO""#;
    let data = many(satisfy(|c| c != '"')); // Fails on escaped " obviously
    let mut str_parser = between(satisfy(|c| c == '"'), satisfy(|c| c == '"'), data);
    let result : Result<(String, &str), ParseError> = str_parser.parse(s);
    match result {
        Ok((value, _)) => println!("{:?}", value),
        Err(err) => println!("{}", err),
    }
}

//=> "HE\\"

The code above will parse that string successfully but will obviously fail on the escaped character in the middle, printing out "HE\\\\" in the end.上面的代码将成功解析该字符串,但显然会在中间的转义字符上失败,最后打印出"HE\\\\"

I want to change the code above so that it prints "HE\\\\\\"LLO" .我想更改上面的代码,使其打印"HE\\\\\\"LLO"

How do I do that?我怎么做?

I have a mostly functional JSON parser as a benchmark for parser-combinators which parses this sort of escaped characters.我有一个主要功能的 JSON 解析器作为解析器组合器的基准,它解析这种转义字符。 I have included a link to it and a slightly simplified version of it below.我在下面包含了一个链接和一个稍微简化的版本。

fn json_char(input: State<&str>) -> ParseResult<char, &str> {
    let (c, input) = try!(satisfy(|c| c != '"').parse_state(input));
    let mut back_slash_char = satisfy(|c| "\"\\nrt".chars().find(|x| *x == c).is_some()).map(|c| {
        match c {
            '"' => '"',
            '\\' => '\\',
            'n' => '\n',
            'r' => '\r',
            't' => '\t',
            c => c//Should never happen
        }
    });
    match c {
        '\\' => input.combine(|input| back_slash_char.parse_state(input)),
        _    => Ok((c, input))
    }
}

json_char json_char

Since this parser may consume 1 or 2 characters it is not enough to use the primitive combinators and so we need to introduce a function which can branch on the character which is parsed.由于此解析器可能会消耗 1 或 2 个字符,因此使用原始组合器是不够的,因此我们需要引入一个可以在解析的字符上进行分支的函数。

I ran into the same problem and ended up with the following solution:我遇到了同样的问题,最终得到了以下解决方案:

    (
        char('"'),
        many1::<Vec<char>, _>(choice((
            escaped_character(),
            satisfy(|c| c != '"'),
        ))),
        char('"')
    )

Or in other words, a string is delimited by " followed by many escaped_characters or anything that isn't a closing " , and is closed by a closing " .或者换句话说,一个字符串被分隔"随后many escaped_characters或任何不是关闭" ,并通过关闭关闭"

Here's a full example of how I'm using this:这是我如何使用它的完整示例:

pub enum Operand {
    String { value: String },
}

fn escaped_character<I>() -> impl Parser<Input = I, Output = char>
    where
        I: Stream<Item = char>,
        I::Error: ParseError<I::Item, I::Range, I::Position>,
{
    (
        char('\\'),
        any(),
    ).and_then(|(_, x)| match x {
        '0' => Ok('\0'),
        'n' => Ok('\n'),
        '\\' => Ok('\\'),
        '"' => Ok('"'),
        _ => Err(StreamErrorFor::<I>::unexpected_message(format!("Invalid escape sequence \\{}", x)))
    })
}

#[test]
fn parse_escaped_character() {
    let expected = Ok(('\n', " foo"));
    assert_eq!(expected, escaped_character().easy_parse("\\n foo"))
}

fn string_operand<I>() -> impl Parser<Input = I, Output = Operand>
    where
        I: Stream<Item = char>,
        I::Error: ParseError<I::Item, I::Range, I::Position>,
{
    (
        char('"'),
        many1::<Vec<char>, _>(choice((
            escaped_character(),
            satisfy(|c| c != '"'),
        ))),
        char('"')
    )
        .map(|(_,value,_)| Operand::String { value: value.into_iter().collect() })
}

#[test]
fn parse_string_operand() {
    let expected = Ok((Operand::String { value: "foo \" bar \n baz \0".into() }, ""));
    assert_eq!(expected, string_operand().easy_parse(r#""foo \" bar \n baz \0""#))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM