简体   繁体   English

使用预处理程序指令解析和生成代码

[英]parsing and generating code with preprocessor directive

I'm experimenting with roslyn, parsing and generating c# code. 我正在尝试使用roslyn,解析并生成c#代码。 I'm trying to figure out how the CSharpSyntaxTree.ParseText method handles preprocessor symbols. 我试图弄清楚CSharpSyntaxTree.ParseText方法如何处理预处理程序符号。

Here is my test method. 这是我的测试方法。 It takes in some C# code as a string, extracts the using statements and returns a new string with those using statements, taking into account preprocessor directives. 它以字符串形式接收一些C#代码,提取出using语句,并在考虑到预处理器指令的情况下返回包含那些using语句的新字符串。

private static string Process(string input, string[] preprocessorSymbols)
{
    var options = CSharpParseOptions.Default.WithPreprocessorSymbols(preprocessorSymbols);
    var syntaxTree = CSharpSyntaxTree.ParseText(input, options);
    var compilationUnit = (CompilationUnitSyntax)syntaxTree.GetRoot();
    var usings = compilationUnit.Usings.ToArray();
    var cs = SyntaxFactory.CompilationUnit()
            .AddUsings(usings)
            .NormalizeWhitespace();
    var result = cs.ToString();
    return result;
}

When feeding this method with the following input, it works as expected: 当使用以下输入来输入此方法时,它将按预期工作:

var input = "using MyUsing1;\r\nusing MyUsing2;";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result);

When adding a preprocessor directive, but not passing said directive to the parser, the result is still as expected (conditional using statement is stripped): 当添加预处理器指令,但未将所述指令传递给解析器时,结果仍然符合预期(剥离了条件using语句):

var input =
    "using MyUsing1;\r\n" +
    "#if CONDITIONAL\r\n" +
    "using MyUsing2;\r\n" +
    "#endif";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;", result);

However, when adding the CONDITIONAL preprocessor directive to the CSharpParseOptions , I get a strange result 但是,将CONDITIONAL预处理程序指令添加到CSharpParseOptions ,我得到一个奇怪的结果

var input = 
    "using MyUsing1;\r\n" +
    "#if CONDITIONAL\r\n" +
    "using MyUsing2;\r\n" +
    "#endif";
string result = Process(input, new[] { "CONDITIONAL" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result); // fails??

The actual returnvalue is "using MyUsing1;\\r\\n#if CONDITIONAL\\r\\nusing MyUsing2;" 实际的返回值是"using MyUsing1;\\r\\n#if CONDITIONAL\\r\\nusing MyUsing2;" . The #if CONDITIONAL part is retained, and #endif is removed. 保留#if CONDITIONAL部分,并删除#endif

Is this a bug, or am I doing something wrong? 这是一个错误,还是我做错了什么?

In trying to understand this behavior, I added another test case to consider: 为了理解这种行为,我添加了另一个测试用例来考虑:

var input =
    "using MyUsing1;\r\n" +
    "#if CONDITIONAL\r\n" +
    "using MyUsing2;\r\n" +
    "#endif" +
    "using MyUsing3;\r\n";
string result = Process(input, new[] { "CONDITIONAL" });

And in this case, both the #if and the #endif are preserved. 在这种情况下, #if#endif保留。

If you break in the debugger and look at the usings array, it appears that each UsingDirectiveSyntax knows both the minimal range of characters for the using statement ( Span ) and a "wider" range of characters from the original stream ( FullSpan ) which includes things like, in this case, the #if directive. 如果您进入调试器并查看usings数组,则似乎每个UsingDirectiveSyntax知道using语句的最小字符范围( Span )和原始流( FullSpan )中包含字符的“更大”范围的字符例如#if指令

Digging a little deeper, the docs refer to preceding code like the preproc directive as "leading trivia", and it is attached to the using node as a child. 深入研究后,文档将之前的代码(如preproc指令)称为“前导琐事”,并将其作为子节点附加到using节点。

Interestingly, if you pass .AddUsings() just one of the using directives, it seems to omit the leading trivia; 有趣的是,如果仅通过using指令之一传递.AddUsings() ,则它似乎忽略了前导琐事。 but if you give it an array of multiple UsingDirectiveSyntax s, then for each except the first, it includes the leading trivia. 但是,如果给它一个由多个UsingDirectiveSyntax的数组,则除第一个之外,每个数组都包含前导琐事。 (That's probably not exactly right; I'm working from black-box observations only.) (这可能并不完全正确;我仅根据黑盒观测结果进行工作。)

I'm not going to pretend to understand the reasoning for that behavior. 我不会假装理解这种行为的原因。 The upshot is that many bits of code that look ok - like your example - will produce troubling output. 结果是,很多看起来不错的代码(如您的示例)将产生令人不安的输出。 (If you pass in new[] {usings[0], usings[2], usings[1]} you get even worse-looking output, with the #endif before the #if . But... you know... I guess why would you do that?) (如果传入new[] {usings[0], usings[2], usings[1]} ,则输出会变得更糟,在#if之前加上#endif 。但是...您知道...我猜你为什么要那样做?)

So if you want to use these tools to generate source code to be fed back into a full build pipeline, you could see this as a bug (or at least, a weird behavior that could easily be a source of bugs). 因此,如果您想使用这些工具生成要反馈到完整构建管道中的源代码,则可以将其视为错误(或者至少可以将其视为错误的怪异行为)。 If there's intended usage that would keep you clear of this, I can't find straightforward documentation of it. 如果有预期的用法可以使您摆脱这些问题,那么我找不到直接的文档。 In this case, you could remove the trivia from the usings before adding them to the output; 在这种情况下,您可以从使用中删除琐事, usings再将其添加到输出中。 but in other cases, that might drop something you want to preserve I would think. 但是在其他情况下,我认为这可能会丢失您想要保留的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM