简体   繁体   English

在C / C ++代码中解析python脚本文件的最佳方法是什么

[英]What is the best way to parse python script file in C/C++ code

I am embedding python in C/C++ program. 我将python嵌入C / C ++程序中。

What I am trying to do is to parse the python script file from the C/C++ program, break the file to "blocks" so that each "block" is an a valid command in python code . 我想做的是从C / C ++程序解析python脚本文件,将文件拆分为“块”,以便每个“块” 在python代码中都是有效命令 Each block I need to put into std::string . 我需要将每个块放入std::string For example: 例如:

#PythonScript.py

import math

print "Hello Python"
i = 0;
while (i < 10):
    print "i = " , i;
    i = i + 1;

print "GoodBye Python"

In this script are 5 different "blocks": 此脚本中有5个不同的“块”:

  • the first one is "import math;" 第一个是"import math;"
  • the second is "print "Hello Python;" 第二个是"print "Hello Python;"
  • the third is "i = 0;" 第三个是"i = 0;"
  • and the fourth is 第四个是

     while (i < 10):\\n\\tprint "i = " , i;\\n\\ti = i + 1; 

My knowledge in python is very basic and I am not familiar with the python code syntax. 我对python的了解非常基础,并且我对python代码语法不熟悉。 What is the best way to do this, is there any Python C/C++ API function that supports this? 最好的方法是什么,是否有任何支持此功能的Python C / C ++ API函数?


why i need it -> for GUI purpose. 为什么我需要它->用于GUI。 My program , which is writen in C, uses python to make some calculations. 我的程序是用C语言编写的,它使用python进行一些计算。 I run from C code , using python C API , python script and what i need is a way to capture python's output in my program. 我从C代码运行,使用python C API,python脚本,而我需要的是一种在程序中捕获python输出的方法。 I catch it and evrything is ok, the problem is when the script involves user input. 我抓住了它,并且一切正常,问题是当脚本涉及用户输入时。 What happens is that i capture python's output after the script is finished , therefore, when there is an input command in the script i get a black screen .... i need to get all the printings before the input command. 发生的是我在脚本完成后捕获了python的输出,因此,当脚本中有输入命令时,我会黑屏....我需要在输入命令前获取所有打印内容。

The first solution i tried is to parss the script to valid commands and run each comand, one after the other , seperatly .... for this i need to pars the script and deside what is a command and what is not ... The question is : what is the best way to do this and if there is somthing that allready does ? 我尝试的第一个解决方案是将脚本解析为有效命令,然后依次运行每个命令,..为此,我需要解析脚本并忽略什么是命令,什么不是。问题是:做到这一点的最佳方法是什么?是否有万事俱备的东西?

I've no idea why you want to do this, but the safest way is to let Python itself do the parsing work. 我不知道为什么要这样做,但是最安全的方法是让Python本身进行解析工作。 If you're using Python earlier than 2.6, you can use the compiler module. 如果您使用的是2.6之前的Python,则可以使用compiler模块。 For 2.6 and later, use the built-in compile function and the ast module. 对于2.6及更高版本,请使用内置的compile功能和ast模块。 In 3.x you have to use these, as the compiler module has been removed. 在3.x中, 必须使用它们,因为已删除了compiler模块。

I think you're trying to do extra work because there is (at least) Embedding Python in Another Application facility and you can just execute your script via Python/C API. 我认为您正在尝试做额外的工作,因为(至少) 在其他应用程序工具中嵌入了Python,并且您只能通过Python / C API执行脚本。 I my mind you don't want to code Python interpreter from the scratch, do you? 我想我不想从头开始编写Python解释器,是吗?

If you want to do syntax analysis you should look into Pythons grammar (and maybe use Bison as a parser generator) 如果要进行语法分析,则应研究Python语法(并可能使用Bison作为解析器生成器)

Python grammar specs: Python语法规格:

Why do you need this? 你为什么需要这个? If you're embedding Python, you don't need to parse Python code yourself - not even remotely. 如果要嵌入Python,则无需自己解析Python代码-甚至无需远程解析。

But to answer the question: You could use Python's ast module (which uses a builtin module _ast internally - I don't know if and how you can use it from C). 但是要回答这个问题:您可以使用Python的ast模块(该模块_ast内部使用_ast内置模块-我不知道是否以及如何从C中使用它)。 ast.parse("""... your code ...""") gives a Module object, which has a body attribute, which is a list of AST Nodes the module consist of. ast.parse("""... your code ...""")提供一个Module对象,该对象具有body属性,该属性是该模块组成的AST节点的列表。 In this example, with Python 3 (don't have Python 2 at hand) it's (naming the classes only) [Import, Expr, Assign, While, Expr]. 在此示例中,使用Python 3(手头没有Python 2)是(仅命名类)[Import,Expr,Assign,While,Expr]。 Not quite what you asked for, but as close as you get. 并不是您想要的,而是尽可能地接近您。

Okay, with the addition: There are much easier ways than this. 好吧,外加:还有比这简单的方式。 Proving that nothing reads from stdin is very hard, it requires extensive static analyses (and yes, if you would choose that path, using CPython's AST would still be a hundred times easier than building your own parser). 要证明没有什么东西可以从stdin中读取是非常困难的,它需要大量的静态分析(是的,如果您选择该路径,那么使用CPython的AST仍然比构建自己的解析器容易一百倍)。 That's the general case - so you might be able to get it almost working for your particular use case, with a lot of work. 这是一般情况-因此,您可以通过大量工作使它几乎适合您的特定用例。 However, it will be much easier to just prevent it in the first place - I don't know the C API very well, but there must be some way to adjust the __builtins__ and remove input , raw_input , sys.stdin , etc. 但是,从一开始就阻止它会容易得多-我不太了解C API,但是必须有某种方法可以调整__builtins__并删除inputraw_inputsys.stdin等。

@genesiss gave all information you need. @genesiss提供了您需要的所有信息。

I've learned Python 10 years ago so my knowledge is not better than yours. 我10年前学习过Python,所以我的知识并不比您的知识更好。 But I do remember that white spaces and newlines are actual syntax elements in Python. 但我确实记得,空格和换行符是Python中实际的语法元素。

Looking at Official Python Grammar , the most close matching syntax element to your "block" would be statement . 查看Python官方语法 ,最接近“块”的语法元素就是statement

statement ::= 
             stmt_list NEWLINE | compound_stmt

So you can successfully separate Python statements by only looking at newline character. 因此,您只需查看换行符即可成功分离Python语句。

Also note the No 4 of lexical structure: 另请注意词汇结构的第4个:

Outside of string literals, newlines (denoted NEWLINE below) are significant except when 除字符串文字外,换行符(在下面表示为NEWLINE)很重要,除非

  • They are immediately preceded by a backslash ("\\") character, in which case, both backslash and newline are (in effect) replaced by a space, joining the two lines they separate. 它们后面紧跟一个反斜杠(“ \\”)字符,在这种情况下,反斜杠和换行符(实际上)都被一个空格代替,将它们分开的两行连接在一起。

  • They are enclosed in matching opening and closing brackets: "(" and ")", "[" and "]", or "{" and "}". 它们被包含在匹配的左括号和右括号中:“(”和“)”,“ [”和“]”或“ {”和“}”。 In this case, also, newline is treated as space. 在这种情况下,换行符也被视为空格。

So, read input char by char, look for '\\', '\\n', and delimiters. 因此,按字符读取输入字符,查找“ \\”,“ \\ n”和定界符。

Sample code below (just concept sketch): 下面的示例代码(仅概念草图):

std::string input;
std::string::const_iterator it = input.begin();
std::string::const_iterator itEnd = input.end();

int delim = 0;
bool escape = false;
std::string block;

while (it != itEnd)
{
char c = *it;

switch (c) {
case '\\':
  if (!delim) escape = true;
  break;
case '\n':
  if (!delim && !escape)
    write_block(); // handle contents of the block variable
  escape = false;
  break;
case '(': case '[': case '{':
  ++delim; escape = false;
  break;
case ')': case ']': case '}':
  --delim; escape = false;
  break;
}

block.append(c, 1);
++it;
}

EDITED EDITED

String literal handling is missing, but I believe you could surely roll complete lexical analysis like this. 缺少字符串文字处理,但是我相信您肯定可以像这样进行完整的词法分析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM