简体   繁体   中英

Sprache parser and characters escaping

I haven't found an example - what to do with characters escaping. I have found a code example:

static void Main(string[] args)
{
    string text = "'test \\\' text'";
    var result = Grammar.QuotedText.End().Parse(text);
}

public static class Grammar
{
    private static readonly Parser<char> QuoteEscape = Parse.Char('\\');
    private static Parser<T> Escaped<T>(Parser<T> following)
    {
        return from escape in QuoteEscape
               from f in following
               select f;
    }

    private static readonly Parser<char> QuotedTextDelimiter = Parse.Char('\'');

      private static readonly Parser<char> QuotedContent =
          Parse.AnyChar.Except(QuotedTextDelimiter).Or(Escaped(QuotedTextDelimiter));

    public static Parser<string> QuotedText = (
        from lquot in QuotedTextDelimiter
        from content in QuotedContent.Many().Text()
        from rquot in QuotedTextDelimiter
        select content
        ).Token();
}

It parses a text successfully if the text doesn't contain escaping, but it doesn't parse text with characters escaping.

I had a similar problem, parsing strings using " as delimiter and \\ as escape character. I wrote a simple parser for this (may not be the most elegant solution) and it seems to work nicely.

You should be able to adapt it, since the only difference appears to be the delimiter.

var escapedDelimiter = Parse.String("\\\"").Text().Named("Escaped delimiter");
var singleEscape = Parse.String("\\").Text().Named("Single escape character");
var doubleEscape = Parse.String("\\\\").Text().Named("Escaped escape character");
var delimiter = Parse.Char('"').Named("Delimiter");
var simpleLiteral = Parse.AnyChar.Except(singleEscape).Except(delimiter).Many().Text().Named("Literal without escape/delimiter character");

var stringLiteral = (from start in delimiter
            from v in escapedDelimiter.Or(doubleEscape).Or(singleEscape).Or(simpleLiteral).Many()
            from end in delimiter
            select string.Concat(start) + string.Concat(v) + string.Concat(end));

The key part is from v in ... . It searches for escaped delimiters first, then for double escape characters and then for single escape characters before trying to parse it as a "simpleLiteral" w/o any escape or delimiter characters. Changing the order here would result in parse errors (eg if you would try to parse single escape before escaped delimiters, you would never find the latter, same for double escapes and single escape). This step is repeated many times, until an unescaped delimiter occurs ( from v in ... does not handle unescaped delimiters, but from end in delimiter does of course).

I had a requirement to parse string literals that can be noted with single-quote or double-quotes, and moreover also support escaping of those.

The method generating the string literal parser:

private readonly StringBuilder _reusableStringBuilder = new StringBuilder();

private Parser<string> BuildStringLiteralParser(char delimiterChar)
{
    var escapeChar = '\\';

    var delimiter = Sprache.Parse.Char(delimiterChar);
    var escape = Sprache.Parse.Char(escapeChar);
    var escapedDelimiter = Sprache.Parse.String($"{escapeChar}{delimiterChar}");
    var splitByEscape = Sprache.Parse.AnyChar
        .Except(delimiter.Or(escape))
        .Many()
        .Text()
        .DelimitedBy(escapedDelimiter);

    string BuildStr(IEnumerable<IEnumerable<string>> splittedByEscape)
    {
        _reusableStringBuilder.Clear();

        var i = 0;

        foreach (var splittedByEscapedDelimiter in splittedByEscape)
        {
            if (i > 0)
            {
                _reusableStringBuilder.Append(escapeChar);
            }

            var j = 0;

            foreach (var str in splittedByEscapedDelimiter)
            {
                if (j > 0)
                {
                    _reusableStringBuilder.Append(delimiterChar);
                }

                _reusableStringBuilder.Append(str);

                j++;
            }

            i++;
        }

        return _reusableStringBuilder.ToString();
    }

    return (from ln in delimiter
            from splittedByEscape in splitByEscape.DelimitedBy(escape)
            from rn in delimiter
            select BuildStr(splittedByEscape)).Named("string");
}

Usage:

var stringParser = BuildStringLiteralParser('\"').Or(BuildStringLiteralParser('\''));

var str1 = stringParser.Parse("\"'Hello' \\\"John\\\"\"");
Console.WriteLine(str1);

var str2 = stringParser.Parse("\'\\'Hello\\' \"John\"\'");
Console.WriteLine(str2);

Output:

'Hello' "John"
'Hello' "John"

Check the working demo: https://dotnetfiddle.net/8wFNbj

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM