繁体   English   中英

超能力:匹配除令牌生成器之外的任何非白色字符

[英]Superpower: Match any not white character except for tokenizer

我想使用Nuget软件包Superpower来匹配所有非白色字符,除非它是一个标记化的值。 例如,

var s = "some random text{variable}";

应导致:

["some", "random", "text", "variable"]

但是我现在所拥有的是:

["some", "random", "text{variable}"]

解析器如下所示:

    public static class TextParser
    {
        public static TextParser<string> EncodedContent =>
            from open in Character.EqualTo('{')
            from chars in Character.Except('}').Many()
            from close in Character.EqualTo('}')
            select new string(chars);

        public static TextParser<string> HtmlContent =>
            from content in Span.NonWhiteSpace
            select content.ToString();
    }

当然,我要在解析器的另一个变量中返回字符串。 但这只是简化了。

希望这是足够的信息。 如果没有,我确实在Github上拥有整个仓库。 https://github.com/jon49/FlowSharpHtml

解析输入的方法可能有很多,并且取决于输入的实际复杂程度(如您所说的那样),您可能需要对其进行调整。 但是,使用Superpower的最佳方法是创建小型解析器,然后在它们之上进行构建。 请参阅下面的解析器及其说明(每一个都基于前一个建筑物):

/// <summary>
/// Parses any character other than whitespace or brackets.
/// </summary>
public static TextParser<char> NonWhiteSpaceOrBracket =>
    from c in Character.Except(c => 
        char.IsWhiteSpace(c) || c == '{' || c == '}',
        "Anything other than whitespace or brackets"
    )
    select c;

/// <summary>
/// Parses any piece of valid text, i.e. any text other than whitespace or brackets.
/// </summary>
public static TextParser<string> TextContent =>
    from content in NonWhiteSpaceOrBracket.Many()
    select new string(content);

/// <summary>
/// Parses an encoded piece of text enclosed in brackets.
/// </summary>
public static TextParser<string> EncodedContent =>
    from open in Character.EqualTo('{')
    from text in TextContent
    from close in Character.EqualTo('}')
    select text;

/// <summary>
/// Parse a single content, e.g. "name{variable}" or just "name"
/// </summary>
public static TextParser<string[]> Content =>
    from text in TextContent
    from encoded in EncodedContent.OptionalOrDefault()
    select encoded != null ? new[] { text, encoded } : new[] { text };

/// <summary>
/// Parse multiple contents and flattens the result.
/// </summary>
public static TextParser<string[]> AllContent =>
    from content in Content.ManyDelimitedBy(Span.WhiteSpace)
    select content.SelectMany(x => x.Select(y => y)).ToArray();

然后运行它:

string input = "some random text{variable}";
var result = AllContent.Parse(input);

哪个输出:

["some", "random", "text", "variable"]

这里的想法是建立一个解析器来解析一个内容,然后利用Superpower内置的解析器ManyDelimitedBy在要解析的实际内容之间模拟空白中的“拆分”。 这导致了一系列“内容”片段。

另外,您可能想利用Superpower的令牌功能在解析失败时产生更好的错误消息。 这是一种稍有不同的方法,但是请看这篇博客文章,以了解有关如何使用令牌生成器的更多信息,但是如果您不需要更友好的错误消息,则它是完全可选的。

也许您可以编写得更简单,但这是我的第一个想法。 希望对您有所帮助:

    Regex tokenizerRegex = new Regex(@"\{(.+?)\}");
    var s = "some random text{variable}";
    string[] splitted = s.Split(' ');
    List<string> result = new List<string>();
    foreach (string word in splitted)
    {
        if (tokenizerRegex.IsMatch(word)) //when a tokenized value were recognized
        {
            int nextIndex = 0;
            foreach (Match match in tokenizerRegex.Matches(word)) //loop throug all matches
            {
                if (nextIndex < match.Index - 1) //if there is a gap between two tokens or at the beginning, add the word
                    result.Add(word.Substring(nextIndex, match.Index - nextIndex));
                result.Add(match.Value);
                nextIndex = match.Index + match.Length; //Save the endposition of the token
            }
        }
        else
            result.Add(word);//no token found, just add the word.
    }
    Console.WriteLine("[\"{0}\"]",string.Join("\", \"", result));

例子

文字: some random text{variable}

["some", "random", "text", "{variable}"]

文字: some random text{variable}{next}

["some", "random", "text", "{variable}", "{next}"]

文字: some random text{variable}and{next}

["some", "random", "text", "{variable}","and", "{next}"]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM