简体   繁体   English

SimpleParse非确定性语法,直到运行

[英]SimpleParse non-deterministic grammar until runtime

I'm working on a basic networking protocol in Python, which should be able to transfer both ASCII strings (read: EOL-terminated) and binary data. 我正在使用Python开发基本的网络协议,该协议应该能够传输ASCII字符串(读取:以EOL终止)和二进制数据。 For the latter to be possible, I chose to create the grammar such that it contains the number of bytes to come which are going to be binary. 为了使后者可行,我选择创建语法,使其包含将要成为二进制字节的字节数。

For SimpleParse, the grammar would look like this [1] so far: 到目前为止,对于SimpleParse来说,语法看起来像这样[1]:

EOL := [\n]
IDENTIFIER := [a-zA-Z0-9_-]+
SIZE_INTEGER := [1-9]*[0-9]+
ASCII_VALUE := [^\n\0]+, EOL
BINARY_VALUE := .*+
value := (ASCII_VALUE/BINARY_VALUE)

eol_attribute := IDENTIFIER, ':', value
binary_attribute := IDENTIFIER, [\t], SIZE_INTEGER, ':', value
attributes := (eol_attribute/binary_attribute)+ 

command := IDENTIFIER, EOL
command := IDENTIFIER, '{', attributes, '}'

The problem is I don't know how to instruct SimpleParse that the following is going to be a chuck of binary data of SIZE_INTEGER bytes at runtime . 问题是我不知道如何指示SimpleParse,以下内容将在运行时变成SIZE_INTEGER字节的二进制数据。

The cause for this is the definition of the terminal BINARY_VALUE which fulfills my needs as it is now, so it cannot be changed. 原因是终端BINARY_VALUE的定义可以满足我现在的需要,因此无法更改。

Thanks 谢谢

Edit 编辑

I suppose the solution would be telling it to stop when it matches the production binary_attribute and let me populate the AST node manually (via socket.recv()), but how to do that? 我想该解决方案将告诉它在与生产binary_attribute匹配时停止运行,并让我手动填充AST节点(通过socket.recv()),但是该怎么做呢?

Edit 2 编辑2

Base64-encoding or similar is not an option. 不能使用Base64编码或类似方法。

[1] I have't tested it, so I don't know if it practically works, it's only for you to get an idea [1]我尚未测试过,所以我不知道它是否切实可行,这只是给您一个主意

If the grammar is as simple as the one you quoted, then perhaps using a parser generator is overkill? 如果语法与您引用的语法一样简单,那么使用解析器生成器可能会过大? You might find that rolling your own recursive parser by hand is simpler and quicker. 您可能会发现,手动滚动自己的递归解析器更加简单快捷。

If you want your application to be portable and reliable I would suggest you pass only standard ASCII characters over the wire. 如果您希望您的应用程序具有可移植性和可靠性,我建议您仅通过电线传递标准ASCII字符。

Different computer architectures have different binary representaions, different word sizes, different character sets. 不同的计算机体系结构具有不同的二进制表示形式,不同的字长,不同的字符集。 There are three approaches to dealing with this. 有三种解决方法。

FIrst you can ignore the issues and hope you only ever have to implement the protocol on a single paltform. 首先,您可以忽略这些问题,并希望您只需要在单个平台上实现该协议。

Two you can go all computer sciency and come up with a "cardinal form" for each possible data type ala CORBA. 您可以通过两种方法处理所有计算机科学,并为每种可能的数据类型ala CORBA提出一个“基本形式”。

You can be practical and use the magic of "sprintf" and "scanf" to translate your data to and from plain ASCII characters when sending data over the network. 通过网络发送数据时,您可以实践并使用“ sprintf”和“ scanf”的魔力在纯ASCII字符之间来回转换数据。

I would also suggest that your protocol includes a message length at or near the begining of the message. 我还建议您的协议在消息开头或开头附近包含一条消息长度。 The commonest bug in home made protocols is the receiving partner expecting more data than was sent and subsequntly waiting forever for data that was never sent. 自制协议中最常见的错误是接收方希望接收的数据量超过发送的数据,并因此永远等待从未发送的数据。

I strongly recommend you consider using the construct library for parsing the binary data. 我强烈建议您考虑使用构造库来解析二进制数据。 It also has support for text (ASCII), so when it detects text you can pass that to your SimpleParse-based parser, but the binary data will be parsed with construct. 它还支持文本(ASCII),因此,当检测到文本时,可以将其传递给基于SimpleParse的解析器,但是二进制数据将通过构造进行解析。 It's very convenient and powerful. 非常方便且强大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM