[英]Using preprocessing function with identifier parser in FParsec?
I am using the identifier
parser from FParsec to parse the names of variables and functions, which are normally a mixture of Unicode and ASCII characters. 我使用FParsec中的
identifier
解析器来解析变量和函数的名称,这些变量和函数通常是Unicode和ASCII字符的混合。 But sometimes I have escaped Unicode characters in the beginning (like \π
) or within the identifier (like swipe_board\:_b
). 但有时我在开头(例如
\π
)或标识符(如swipe_board\:_b
)中转义了Unicode字符。 I still can make them parseable using isAsciiIdStart
and isAsciiIdContinue
options, but I can't define my own custom function for pre-processing before normalization. 我仍然可以使用
isAsciiIdStart
和isAsciiIdContinue
选项使它们可解析,但我无法在规范化之前定义自己的自定义函数进行预处理。 What could be a solution here? 这可能是什么解决方案?
The identifier
parser internally first parses a string and then passes it to an IdentifierValidator
instance for validation. identifier
解析器在内部首先解析字符串,然后将其传递给IdentifierValidator
实例以进行验证。 Since the C# IdentifierValidator
class is publicly accessible (though not documented), you could easily adapt the identifier
parser to your needs (by making the initial string parsing step also recognize the escapes). 由于C#
IdentifierValidator
类是可公开访问的(虽然未记录),因此您可以轻松地根据需要调整identifier
解析器(通过使初始字符串解析步骤也识别转义)。
The identifier parsing is a bit complicated due to support for UTF-16 surrogate pairs, normalization and the Unicode XID character category, which is not natively supported on .NET. 由于支持UTF-16代理对,规范化和Unicode XID字符类别(在.NET上本身不支持),标识符解析有点复杂。 Maybe you only need to support ASCII or UCS-2 identifiers specified in term of character categories supported by
CharUnicodeInfo.GetUnicodeCategory
, in which case you could probably implement the parsing and validation in just one step using many1Satisfy2
or many1Chars2
. 也许您只需要支持在
CharUnicodeInfo.GetUnicodeCategory
支持的字符类别中指定的ASCII或UCS-2标识符,在这种情况下,您可以使用many1Satisfy2
或many1Chars2
在一个步骤中实现解析和验证。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.