简体繁体 English

在FParsec中使用带有标识符解析器的预处理功能？

[英]Using preprocessing function with identifier parser in FParsec?

原文 2012-02-10 14:37:13 1 1 parsing/ f#/ fparsec

I am using the identifier parser from FParsec to parse the names of variables and functions, which are normally a mixture of Unicode and ASCII characters. 我使用FParsec中的identifier解析器来解析变量和函数的名称，这些变量和函数通常是Unicode和ASCII字符的混合。 But sometimes I have escaped Unicode characters in the beginning (like \π ) or within the identifier (like swipe_board\:_b ). 但有时我在开头（例如\π ）或标识符（如swipe_board\:_b ）中转义了Unicode字符。 I still can make them parseable using isAsciiIdStart and isAsciiIdContinue options, but I can't define my own custom function for pre-processing before normalization. 我仍然可以使用isAsciiIdStart和isAsciiIdContinue选项使它们可解析，但我无法在规范化之前定义自己的自定义函数进行预处理。 What could be a solution here? 这可能是什么解决方案？

1 个解决方案

The identifier parser internally first parses a string and then passes it to an IdentifierValidator instance for validation. identifier解析器在内部首先解析字符串，然后将其传递给IdentifierValidator实例以进行验证。 Since the C# IdentifierValidator class is publicly accessible (though not documented), you could easily adapt the identifier parser to your needs (by making the initial string parsing step also recognize the escapes). 由于C＃ IdentifierValidator类是可公开访问的（虽然未记录），因此您可以轻松地根据需要调整identifier解析器（通过使初始字符串解析步骤也识别转义）。

The identifier parsing is a bit complicated due to support for UTF-16 surrogate pairs, normalization and the Unicode XID character category, which is not natively supported on .NET. 由于支持UTF-16代理对，规范化和Unicode XID字符类别（在.NET上本身不支持），标识符解析有点复杂。 Maybe you only need to support ASCII or UCS-2 identifiers specified in term of character categories supported by CharUnicodeInfo.GetUnicodeCategory , in which case you could probably implement the parsing and validation in just one step using many1Satisfy2 or many1Chars2 . 也许您只需要支持在CharUnicodeInfo.GetUnicodeCategory支持的字符类别中指定的ASCII或UCS-2标识符，在这种情况下，您可以使用many1Satisfy2或many1Chars2在一个步骤中实现解析和验证。