简体   繁体   English

在FParsec中使用带有标识符解析器的预处理功能?

[英]Using preprocessing function with identifier parser in FParsec?

I am using the identifier parser from FParsec to parse the names of variables and functions, which are normally a mixture of Unicode and ASCII characters. 我使用FParsec中的identifier解析器来解析变量和函数的名称,这些变量和函数通常是Unicode和ASCII字符的混合。 But sometimes I have escaped Unicode characters in the beginning (like ) or within the identifier (like swipe_board\:_b ). 但有时我在开头(例如 )或标识符(如swipe_board\:_b )中转义了Unicode字符。 I still can make them parseable using isAsciiIdStart and isAsciiIdContinue options, but I can't define my own custom function for pre-processing before normalization. 我仍然可以使用isAsciiIdStartisAsciiIdContinue选项使它们可解析,但我无法在规范化之前定义自己的自定义函数进行预处理。 What could be a solution here? 这可能是什么解决方案?

The identifier parser internally first parses a string and then passes it to an IdentifierValidator instance for validation. identifier解析器在内部首先解析字符串,然后将其传递给IdentifierValidator实例以进行验证。 Since the C# IdentifierValidator class is publicly accessible (though not documented), you could easily adapt the identifier parser to your needs (by making the initial string parsing step also recognize the escapes). 由于C# IdentifierValidator类是可公开访问的(虽然未记录),因此您可以轻松地根据需要调整identifier解析器(通过使初始字符串解析步骤也识别转义)。

The identifier parsing is a bit complicated due to support for UTF-16 surrogate pairs, normalization and the Unicode XID character category, which is not natively supported on .NET. 由于支持UTF-16代理对,规范化和Unicode XID字符类别(在.NET上本身不支持),标识符解析有点复杂。 Maybe you only need to support ASCII or UCS-2 identifiers specified in term of character categories supported by CharUnicodeInfo.GetUnicodeCategory , in which case you could probably implement the parsing and validation in just one step using many1Satisfy2 or many1Chars2 . 也许您只需要支持在CharUnicodeInfo.GetUnicodeCategory支持的字符类别中指定的ASCII或UCS-2标识符,在这种情况下,您可以使用many1Satisfy2many1Chars2在一个步骤中实现解析和验证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM