使用预处理程序指令解析和生成代码

Question

I'm experimenting with roslyn, parsing and generating c# code. 我正在尝试使用roslyn，解析并生成c＃代码。 I'm trying to figure out how the CSharpSyntaxTree.ParseText method handles preprocessor symbols. 我试图弄清楚CSharpSyntaxTree.ParseText方法如何处理预处理程序符号。

Here is my test method. 这是我的测试方法。 It takes in some C# code as a string, extracts the using statements and returns a new string with those using statements, taking into account preprocessor directives. 它以字符串形式接收一些C＃代码，提取出using语句，并在考虑到预处理器指令的情况下返回包含那些using语句的新字符串。

private static string Process(string input, string[] preprocessorSymbols)
{
    var options = CSharpParseOptions.Default.WithPreprocessorSymbols(preprocessorSymbols);
    var syntaxTree = CSharpSyntaxTree.ParseText(input, options);
    var compilationUnit = (CompilationUnitSyntax)syntaxTree.GetRoot();
    var usings = compilationUnit.Usings.ToArray();
    var cs = SyntaxFactory.CompilationUnit()
            .AddUsings(usings)
            .NormalizeWhitespace();
    var result = cs.ToString();
    return result;
}

When feeding this method with the following input, it works as expected: 当使用以下输入来输入此方法时，它将按预期工作：

var input = "using MyUsing1;\r\nusing MyUsing2;";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result);

When adding a preprocessor directive, but not passing said directive to the parser, the result is still as expected (conditional using statement is stripped): 当添加预处理器指令，但未将所述指令传递给解析器时，结果仍然符合预期（剥离了条件using语句）：

var input =
    "using MyUsing1;\r\n" +
    "#if CONDITIONAL\r\n" +
    "using MyUsing2;\r\n" +
    "#endif";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;", result);

However, when adding the CONDITIONAL preprocessor directive to the CSharpParseOptions , I get a strange result 但是，将CONDITIONAL预处理程序指令添加到CSharpParseOptions ，我得到一个奇怪的结果

var input = 
    "using MyUsing1;\r\n" +
    "#if CONDITIONAL\r\n" +
    "using MyUsing2;\r\n" +
    "#endif";
string result = Process(input, new[] { "CONDITIONAL" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result); // fails??

The actual returnvalue is "using MyUsing1;\\r\\n#if CONDITIONAL\\r\\nusing MyUsing2;" 实际的返回值是"using MyUsing1;\\r\\n#if CONDITIONAL\\r\\nusing MyUsing2;" . 。 The #if CONDITIONAL part is retained, and #endif is removed. 保留#if CONDITIONAL部分，并删除#endif 。

Is this a bug, or am I doing something wrong? 这是一个错误，还是我做错了什么？

Answer 1

In trying to understand this behavior, I added another test case to consider: 为了理解这种行为，我添加了另一个测试用例来考虑：

var input =
    "using MyUsing1;\r\n" +
    "#if CONDITIONAL\r\n" +
    "using MyUsing2;\r\n" +
    "#endif" +
    "using MyUsing3;\r\n";
string result = Process(input, new[] { "CONDITIONAL" });

And in this case, both the #if and the #endif are preserved. 在这种情况下， #if和#endif保留。

If you break in the debugger and look at the usings array, it appears that each UsingDirectiveSyntax knows both the minimal range of characters for the using statement ( Span ) and a "wider" range of characters from the original stream ( FullSpan ) which includes things like, in this case, the #if directive. 如果您进入调试器并查看usings数组，则似乎每个UsingDirectiveSyntax知道using语句的最小字符范围（ Span ）和原始流（ FullSpan ）中包含字符的“更大”范围的字符例如#if指令

Digging a little deeper, the docs refer to preceding code like the preproc directive as "leading trivia", and it is attached to the using node as a child. 深入研究后，文档将之前的代码（如preproc指令）称为“前导琐事”，并将其作为子节点附加到using节点。

Interestingly, if you pass .AddUsings() just one of the using directives, it seems to omit the leading trivia; 有趣的是，如果仅通过using指令之一传递.AddUsings() ，则它似乎忽略了前导琐事。 but if you give it an array of multiple UsingDirectiveSyntax s, then for each except the first, it includes the leading trivia. 但是，如果给它一个由多个UsingDirectiveSyntax的数组，则除第一个之外，每个数组都包含前导琐事。 (That's probably not exactly right; I'm working from black-box observations only.) （这可能并不完全正确；我仅根据黑盒观测结果进行工作。）

I'm not going to pretend to understand the reasoning for that behavior. 我不会假装理解这种行为的原因。 The upshot is that many bits of code that look ok - like your example - will produce troubling output. 结果是，很多看起来不错的代码（如您的示例）将产生令人不安的输出。 (If you pass in new[] {usings[0], usings[2], usings[1]} you get even worse-looking output, with the #endif before the #if . But... you know... I guess why would you do that?) （如果传入new[] {usings[0], usings[2], usings[1]} ，则输出会变得更糟，在#if之前加上#endif 。但是...您知道...我猜你为什么要那样做？）

So if you want to use these tools to generate source code to be fed back into a full build pipeline, you could see this as a bug (or at least, a weird behavior that could easily be a source of bugs). 因此，如果您想使用这些工具生成要反馈到完整构建管道中的源代码，则可以将其视为错误（或者至少可以将其视为错误的怪异行为）。 If there's intended usage that would keep you clear of this, I can't find straightforward documentation of it. 如果有预期的用法可以使您摆脱这些问题，那么我找不到直接的文档。 In this case, you could remove the trivia from the usings before adding them to the output; 在这种情况下，您可以从使用中删除琐事， usings再将其添加到输出中。 but in other cases, that might drop something you want to preserve I would think. 但是在其他情况下，我认为这可能会丢失您想要保留的内容。

使用预处理程序指令解析和生成代码

问题描述

1 个解决方案

解决方案1
1 2018-08-27 16:38:05

使用预处理程序指令解析和生成代码

问题描述

1 个解决方案

解决方案1 1 2018-08-27 16:38:05

解决方案1
1 2018-08-27 16:38:05