[英]parsing and generating code with preprocessor directive
I'm experimenting with roslyn, parsing and generating c# code. 我正在尝试使用roslyn,解析并生成c#代码。 I'm trying to figure out how the
CSharpSyntaxTree.ParseText
method handles preprocessor symbols. 我试图弄清楚
CSharpSyntaxTree.ParseText
方法如何处理预处理程序符号。
Here is my test method. 这是我的测试方法。 It takes in some C# code as a string, extracts the
using
statements and returns a new string with those using
statements, taking into account preprocessor directives. 它以字符串形式接收一些C#代码,提取出
using
语句,并在考虑到预处理器指令的情况下返回包含那些using
语句的新字符串。
private static string Process(string input, string[] preprocessorSymbols)
{
var options = CSharpParseOptions.Default.WithPreprocessorSymbols(preprocessorSymbols);
var syntaxTree = CSharpSyntaxTree.ParseText(input, options);
var compilationUnit = (CompilationUnitSyntax)syntaxTree.GetRoot();
var usings = compilationUnit.Usings.ToArray();
var cs = SyntaxFactory.CompilationUnit()
.AddUsings(usings)
.NormalizeWhitespace();
var result = cs.ToString();
return result;
}
When feeding this method with the following input, it works as expected: 当使用以下输入来输入此方法时,它将按预期工作:
var input = "using MyUsing1;\r\nusing MyUsing2;";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result);
When adding a preprocessor directive, but not passing said directive to the parser, the result is still as expected (conditional using
statement is stripped): 当添加预处理器指令,但未将所述指令传递给解析器时,结果仍然符合预期(剥离了条件
using
语句):
var input =
"using MyUsing1;\r\n" +
"#if CONDITIONAL\r\n" +
"using MyUsing2;\r\n" +
"#endif";
string result = Process(input, new[] { "" });
Assert.AreEqual("using MyUsing1;", result);
However, when adding the CONDITIONAL
preprocessor directive to the CSharpParseOptions
, I get a strange result 但是,将
CONDITIONAL
预处理程序指令添加到CSharpParseOptions
,我得到一个奇怪的结果
var input =
"using MyUsing1;\r\n" +
"#if CONDITIONAL\r\n" +
"using MyUsing2;\r\n" +
"#endif";
string result = Process(input, new[] { "CONDITIONAL" });
Assert.AreEqual("using MyUsing1;\r\nusing MyUsing2;", result); // fails??
The actual returnvalue is "using MyUsing1;\\r\\n#if CONDITIONAL\\r\\nusing MyUsing2;"
实际的返回值是
"using MyUsing1;\\r\\n#if CONDITIONAL\\r\\nusing MyUsing2;"
. 。 The
#if CONDITIONAL
part is retained, and #endif
is removed. 保留
#if CONDITIONAL
部分,并删除#endif
。
Is this a bug, or am I doing something wrong? 这是一个错误,还是我做错了什么?
In trying to understand this behavior, I added another test case to consider: 为了理解这种行为,我添加了另一个测试用例来考虑:
var input =
"using MyUsing1;\r\n" +
"#if CONDITIONAL\r\n" +
"using MyUsing2;\r\n" +
"#endif" +
"using MyUsing3;\r\n";
string result = Process(input, new[] { "CONDITIONAL" });
And in this case, both the #if
and the #endif
are preserved. 在这种情况下,
#if
和#endif
保留。
If you break in the debugger and look at the usings
array, it appears that each UsingDirectiveSyntax
knows both the minimal range of characters for the using
statement ( Span
) and a "wider" range of characters from the original stream ( FullSpan
) which includes things like, in this case, the #if
directive. 如果您进入调试器并查看
usings
数组,则似乎每个UsingDirectiveSyntax
知道using
语句的最小字符范围( Span
)和原始流( FullSpan
)中包含字符的“更大”范围的字符例如#if
指令
Digging a little deeper, the docs refer to preceding code like the preproc directive as "leading trivia", and it is attached to the using node as a child. 深入研究后,文档将之前的代码(如preproc指令)称为“前导琐事”,并将其作为子节点附加到using节点。
Interestingly, if you pass .AddUsings()
just one of the using directives, it seems to omit the leading trivia; 有趣的是,如果仅通过using指令之一传递
.AddUsings()
,则它似乎忽略了前导琐事。 but if you give it an array of multiple UsingDirectiveSyntax
s, then for each except the first, it includes the leading trivia. 但是,如果给它一个由多个
UsingDirectiveSyntax
的数组,则除第一个之外,每个数组都包含前导琐事。 (That's probably not exactly right; I'm working from black-box observations only.) (这可能并不完全正确;我仅根据黑盒观测结果进行工作。)
I'm not going to pretend to understand the reasoning for that behavior. 我不会假装理解这种行为的原因。 The upshot is that many bits of code that look ok - like your example - will produce troubling output.
结果是,很多看起来不错的代码(如您的示例)将产生令人不安的输出。 (If you pass in
new[] {usings[0], usings[2], usings[1]}
you get even worse-looking output, with the #endif
before the #if
. But... you know... I guess why would you do that?) (如果传入
new[] {usings[0], usings[2], usings[1]}
,则输出会变得更糟,在#if
之前加上#endif
。但是...您知道...我猜你为什么要那样做?)
So if you want to use these tools to generate source code to be fed back into a full build pipeline, you could see this as a bug (or at least, a weird behavior that could easily be a source of bugs). 因此,如果您想使用这些工具生成要反馈到完整构建管道中的源代码,则可以将其视为错误(或者至少可以将其视为错误的怪异行为)。 If there's intended usage that would keep you clear of this, I can't find straightforward documentation of it.
如果有预期的用法可以使您摆脱这些问题,那么我找不到直接的文档。 In this case, you could remove the trivia from the
usings
before adding them to the output; 在这种情况下,您可以从使用中删除琐事,
usings
再将其添加到输出中。 but in other cases, that might drop something you want to preserve I would think. 但是在其他情况下,我认为这可能会丢失您想要保留的内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.