简体   繁体   English

如何从用户输入修改C ++代码

[英]How to modify C++ code from user-input

I am currently writing a program that sits on top of a C++ interpreter. 我目前正在编写一个位于C ++解释器之上的程序。 The user inputs C++ commands at runtime, which are then passed into the interpreter. 用户在运行时输入C ++命令,然后将其传递给解释器。 For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality. 对于某些模式,我想用修改后的表单替换给出的命令,以便我可以提供其他功能。

I want to replace anything of the form 我想替换任何形式的东西

A->Draw(B1, B2)

with

MyFunc(A, B1, B2).

My first thought was regular expressions, but that would be rather error-prone, as any of A , B1 , or B2 could be arbitrary C++ expressions. 我的第一个想法是正则表达式,但这很容易出错,因为AB1B2任何A都可以是任意的C ++表达式。 As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. 由于这些表达式本身可能包含带引号的字符串或括号,因此将所有情况与正则表达式匹配将非常困难。 In addition, there may be multiple, nested forms of this expression 此外,此表达式可能有多种嵌套形式

My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. 我的下一个想法是将clang称为子进程,使用“-dump-ast”获取抽象语法树,修改它,然后将其重建为一个命令以传递给C ++解释器。 However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. 但是,这需要跟踪任何环境更改,例如包含文件和转发声明,以便为clang提供足够的信息来解析表达式。 As the interpreter does not expose this information, this seems infeasible as well. 由于口译员没有公开这些信息,这似乎也是不可行的。

The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. 第三个想法是使用C ++解释器自己的内部解析转换为抽象语法树,然后从那里构建。 However, this interpreter does not expose the ast in any way that I was able to find. 但是,这个解释器不会以任何我能够找到的方式揭示ast。

Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely? 是否有任何关于如何继续进行的建议,无论是沿着其中一条规定的路线,还是完全沿着不同的路线?

What you want is a Program Transformation System . 你想要的是一个程序转换系统 These are tools that generally let you express changes to source code, written in source level patterns that essentially say: 这些工具通常允许您表达对源代码的更改,这些源代码模式基本上表示:

 if you see *this*, replace it by *that*

but operating on Abstract Syntax Trees so the matching and replacement process is far more trustworthy than what you get with string hacking. 但是在抽象语法树上运行,因此匹配和替换过程比使用字符串黑客得到的更可信。

Such tools have to have parsers for the source language of interest. 这些工具必须具有感兴趣的源语言的解析器。 The source language being C++ makes this fairly difficult. 源语言是C ++使得这相当困难。

Clang sort of qualifies; Clang有资格; after all it can parse C++. 毕竟它可以解析C ++。 OP objects it cannot do so without all the environment context. 没有所有环境上下文的OP对象不能这样做。 To the extent that OP is typing (well-formed) program fragments (statements, etc,.) into the interpreter, Clang may [I don't have much experience with it myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). 如果OP在解释器中输入(格式良好)程序片段(语句等),Clang可能[我自己没有多少经验]难以专注于片段是什么(语句?表达式?声明?...)。 Finally, Clang isn't really a PTS; 最后,Clang并不是真正的PTS; its tree modification procedures are not source-to-source transforms. 它的树修改过程不是源到源的转换。 That matters for convenience but might not stop OP from using it; 这对方便起见,但可能无法阻止OP使用它; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. 表面语法重写规则很方便,但您总是可以更省力地替换程序树黑客。 When there are more than a few rules, this starts to matter a lot. 当有不止一些规则时,这开始变得很重要。

GCC with Melt sort of qualifies in the same way that Clang does. 融合GCC的GCC与 Clang一样具有资格。 I'm under the impression that Melt makes GCC at best a bit less intolerable for this kind of work. 我的印象是,熔化使GCC充其量只能让这种工作无法忍受。 YMMV. 因人而异。

Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. 我们的DMS软件再造工具包及其完整的C ++ 14 [编辑2018年7月:C ++ 17]前端绝对符合条件。 DMS has been used to carry out massive transformations on large scale C++ code bases. DMS已被用于在大规模C ++代码库上进行大规模转换。

DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. DMS可以解析C ++的任意(格式良好)片段,而不事先告诉语法类别是什么,并使用其模式解析机制返回正确语法非终结类型的AST。 [You may end up with multiple parses, eg ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? [你最终可能会有多个解析,例如歧义,你将决定如何解决,请参阅为什么不能用LR(1)解析器解析C ++? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments. 如果您愿意在解析时没有宏扩展而不依赖于“环境”,那么它可以做到这一点,并且坚持预处理器指令(它们也被解析)在代码片段方面的结构很好(#if foo {#endif不允许)但这对交互式输入的代码片段来说不太可能是一个真正的问题。

DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text to feed to the interpreter. 然后,DMS提供了一个完整的过程AST库,用于操作已解析的树(搜索,检查,修改,构建,替换),然后可以从修改后的树中重新生成表面源代码,从而将OP文本提供给解释器。

Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. 在这种情况下,OP可能会直接将其大多数修改写为源到源语法规则。 For his example, he can provide DMS with a rewrite rule (untested but pretty close to right): 对于他的例子,他可以为DMS提供重写规则(未经测试但非常接近右边):

rule replace_Draw(A:primary,B1:expression,B2:expression):
        primary->primary
    "\A->Draw(\B1, \B2)"     -- pattern
rewrites to
    "MyFunc(\A, \B1, \B2)";  -- replacement

and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. 并且DMS将取代包含左侧“... Draw ...”模式的任何已解析的AST,并在将匹配替换为A,B1和B2之后用右侧替换该子树。 The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; 引号是metaquotes ,用于区分C ++文本和规则语法文本; the backslash is a metaescape used inside metaquotes to name metavariables. 反斜杠是metaquotes中用于命名metavariables的metaescape。 For more details of what you can say in the rule syntax, see DMS Rewrite Rules . 有关您在规则语法中可以说的内容的更多详细信息,请参阅DMS重写规则

If OP provides a set of such rules, DMS can be asked to apply the entire set. 如果OP提供了一此类规则,则可以要求DMS应用整个集合。

So I think this would work just fine for OP. 所以我认为这对OP来说效果很好。 It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; 这是一个相当重量级的机制,可以“添加”到他想要提供给第三方的包裹中; DMS and its C++ front end are hardly "small" programs. DMS及其C ++前端几乎不是“小”程序。 But then modern machines have lots of resources so I think its a question of how badly does OP need to do this. 但是现代机器有很多资源,所以我认为这是一个OP需要做多少这个问题的问题。

Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core. 尝试修改标题来压缩方法,然后编译你会发现错误并且能够替换所有核心。

As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines 至于你有一个C ++解释器(作为CERN的Root),我猜你必须使用编译器拦截所有的Draw,一个简单而干净的方法就是在Draw方法中将Draw方法声明为private,使用一些定义

 class ItemWithDrawMehtod
 {
 ....
 public:
 #ifdef CATCHTHEMETHOD
     private:
 #endif
 void Draw(A,B);
 #ifdef CATCHTHEMETHOD
     public:
 #endif
 ....
 };

Then compile as: 然后编译为:

 gcc -DCATCHTHEMETHOD=1 yourfilein.cpp

In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. 如果用户想要向应用程序输入复杂的算法,我建议将脚本语言集成到应用程序中。 So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. 这样用户就可以编写代码[函数/算法以定义的方式],以便应用程序可以在解释器中执行它并获得最终结果。 Ex: Python, Perl, JS, etc. 例如:Python,Perl,JS等

Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion. 因为在解释器http://chaiscript.com/中需要C ++会是一个建议。

What happens when someone gets ahold of the Draw member function ( auto draw = &A::Draw; ) and then starts using draw ? 当有人获得Draw成员函数( auto draw = &A::Draw; )然后开始使用draw什么? Presumably you'd want the same improved Draw-functionality to be called in this case too. 据推测,你也希望在这种情况下调用同样改进的Draw功能。 Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own. 因此,我认为我们可以得出结论,你真正想要的是用你自己的函数替换Draw成员函数。

Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. 由于您似乎无法直接修改包含Draw的类,因此解决方案可能是从A派生您自己的类并在其中覆盖Draw Then your problem reduces to having your users use your new improved class. 然后,您的问题就会减少,让您的用户使用新的改进类。

You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. 您可能会再次考虑自动将类A使用转换为新的派生类的问题,但如果没有完整的C ++实现的帮助,这似乎仍然很难。 Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us. 也许有一种方法可以隐藏A的旧定义,并通过巧妙地使用头文件来替换该名称,但我无法确定是否就是您告诉我们的情况。

Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime. 另一种可能是使用LD_PRELOAD来使用一些动态链接器hackery来替换在运行时调用的Draw函数。

There may be a way to accomplish this mostly with regular expressions. 可能有一种方法可以通过正则表达式来实现这一点。

Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined. 由于Draw之后出现的任何内容(已经正确格式化为参数,因此您无需为了概述的目的完全解析它们。

Fundamentally, the part that matters is the "SYMBOL->Draw(" 从根本上说,重要的部分是“SYMBOL-> Draw(”

SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). SYMBOL可以是解析为重载对象的任何表达式 - >或实现Draw(...)的类型的指针。 If you reduce this to two cases, you can short-cut the parsing. 如果将此减少为两种情况,则可以快速解析。

For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters. 对于第一种情况,一个简单的正则表达式搜索任何有效的C ++符号,类似于“[A-Za-z _] [A-Za-z0-9_ \\。]”,以及文字表达式“ - > Draw (“。这将为您提供必须重写的部分,因为此部分后面的代码已经格式化为有效的C ++参数。

The second case is for complex expressions that return an overloaded object or pointer. 第二种情况是复杂的表达式,它返回一个重载的对象或指针。 This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). 这需要更多的努力,但是通过简单的复杂表达式向后遍历的简短解析例程可以非常容易地编写,因为您不必支持块(C ++中的块不能返回对象,因为lambda定义不会调用lambda本身,实际的嵌套代码块{...}不能直接返回任何适用于此处的内联代码)。 Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well. 注意,如果表达式没有结束)那么它必须是这个上下文中的有效符号,所以如果你发现a)只是匹配嵌套的)(并提取嵌套的SYMBOL之前的符号(...(.. 。) - > - > Draw()模式。这可能适用于正则表达式,但在普通代码中也应该相当容易。

As soon as you have the symbol or expression, the replacement is trivial, going from 一旦你有了符号或表达式,替换就变得微不足道了

SYMBOL->Draw(... 符号 - >绘制(...

to

YourFunction(SYMBOL, ... 你的功能(符号,......

without having to deal with the additional parameters to Draw(). 无需处理Draw()的附加参数。

As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as 作为一个额外的好处,使用此模型可以免费解析链式函数调用,因为您可以递归迭代代码,例如

A->Draw(B...)->Draw(C...)

The first iteration identifies the first A->Draw( and rewrites the whole statement as 第一次迭代识别第一个A-> Draw(并将整个语句重写为

YourFunction(A, B...)->Draw(C...)

which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as 然后在其前面标识第二个 - > Draw并使用表达式“YourFunction(A,...) - >”,并将其重写为

YourFunction(YourFunction(A, B...), C...)

where B... and C... are well-formed C++ parameters, including nested calls. 其中B ...和C ...是格式良好的C ++参数,包括嵌套调用。

Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile. 如果不知道解释器支持的C ++版本,或者你将要重写的代码类型,我真的无法提供任何可能值得的示例代码。

一种方法是将用户代码加载为DLL(类似于插件),这样,您不需要编译实际应用程序,只需编译用户代码,并且应用程序将动态加载它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM