简体   繁体   中英

What is the best way to parse python script file in C/C++ code

I am embedding python in C/C++ program.

What I am trying to do is to parse the python script file from the C/C++ program, break the file to "blocks" so that each "block" is an a valid command in python code . Each block I need to put into std::string . For example:

#PythonScript.py

import math

print "Hello Python"
i = 0;
while (i < 10):
    print "i = " , i;
    i = i + 1;

print "GoodBye Python"

In this script are 5 different "blocks":

  • the first one is "import math;"
  • the second is "print "Hello Python;"
  • the third is "i = 0;"
  • and the fourth is

     while (i < 10):\\n\\tprint "i = " , i;\\n\\ti = i + 1; 

My knowledge in python is very basic and I am not familiar with the python code syntax. What is the best way to do this, is there any Python C/C++ API function that supports this?


why i need it -> for GUI purpose. My program , which is writen in C, uses python to make some calculations. I run from C code , using python C API , python script and what i need is a way to capture python's output in my program. I catch it and evrything is ok, the problem is when the script involves user input. What happens is that i capture python's output after the script is finished , therefore, when there is an input command in the script i get a black screen .... i need to get all the printings before the input command.

The first solution i tried is to parss the script to valid commands and run each comand, one after the other , seperatly .... for this i need to pars the script and deside what is a command and what is not ... The question is : what is the best way to do this and if there is somthing that allready does ?

I've no idea why you want to do this, but the safest way is to let Python itself do the parsing work. If you're using Python earlier than 2.6, you can use the compiler module. For 2.6 and later, use the built-in compile function and the ast module. In 3.x you have to use these, as the compiler module has been removed.

I think you're trying to do extra work because there is (at least) Embedding Python in Another Application facility and you can just execute your script via Python/C API. I my mind you don't want to code Python interpreter from the scratch, do you?

If you want to do syntax analysis you should look into Pythons grammar (and maybe use Bison as a parser generator)

Python grammar specs:

Why do you need this? If you're embedding Python, you don't need to parse Python code yourself - not even remotely.

But to answer the question: You could use Python's ast module (which uses a builtin module _ast internally - I don't know if and how you can use it from C). ast.parse("""... your code ...""") gives a Module object, which has a body attribute, which is a list of AST Nodes the module consist of. In this example, with Python 3 (don't have Python 2 at hand) it's (naming the classes only) [Import, Expr, Assign, While, Expr]. Not quite what you asked for, but as close as you get.

Okay, with the addition: There are much easier ways than this. Proving that nothing reads from stdin is very hard, it requires extensive static analyses (and yes, if you would choose that path, using CPython's AST would still be a hundred times easier than building your own parser). That's the general case - so you might be able to get it almost working for your particular use case, with a lot of work. However, it will be much easier to just prevent it in the first place - I don't know the C API very well, but there must be some way to adjust the __builtins__ and remove input , raw_input , sys.stdin , etc.

@genesiss gave all information you need.

I've learned Python 10 years ago so my knowledge is not better than yours. But I do remember that white spaces and newlines are actual syntax elements in Python.

Looking at Official Python Grammar , the most close matching syntax element to your "block" would be statement .

statement ::= 
             stmt_list NEWLINE | compound_stmt

So you can successfully separate Python statements by only looking at newline character.

Also note the No 4 of lexical structure:

Outside of string literals, newlines (denoted NEWLINE below) are significant except when

  • They are immediately preceded by a backslash ("\\") character, in which case, both backslash and newline are (in effect) replaced by a space, joining the two lines they separate.

  • They are enclosed in matching opening and closing brackets: "(" and ")", "[" and "]", or "{" and "}". In this case, also, newline is treated as space.

So, read input char by char, look for '\\', '\\n', and delimiters.

Sample code below (just concept sketch):

std::string input;
std::string::const_iterator it = input.begin();
std::string::const_iterator itEnd = input.end();

int delim = 0;
bool escape = false;
std::string block;

while (it != itEnd)
{
char c = *it;

switch (c) {
case '\\':
  if (!delim) escape = true;
  break;
case '\n':
  if (!delim && !escape)
    write_block(); // handle contents of the block variable
  escape = false;
  break;
case '(': case '[': case '{':
  ++delim; escape = false;
  break;
case ')': case ']': case '}':
  --delim; escape = false;
  break;
}

block.append(c, 1);
++it;
}

EDITED

String literal handling is missing, but I believe you could surely roll complete lexical analysis like this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM