识别Treetop语法中的Ruby代码

Question

I'm trying to use Treetop to parse an ERB file. 我正在尝试使用Treetop来解析ERB文件。 I need to be able to handle lines like the following: 我需要能够处理如下行：

<% ruby_code_here %>
<%= other_ruby_code %>

Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? 由于Treetop是用Ruby编写的，你用Ruby编写Treetop语法，Treetop中已经有一些现有方法可以说“嘿，在这里查找Ruby代码，然后给我分解”，而不必编写单独的规则来处理Ruby语言的所有部分？ I'm looking for a way, in my .treetop grammar file, to have something like: 我正在寻找一种方法，在我的.treetop语法文件中，有类似的东西：

rule erb_tag
  "<%" ruby_code "%>" {
    def content
      ...
    end
  }
end

Where ruby_code is handled by some rules that Treetop provides. 其中ruby_code由Treetop提供的某些规则处理。

Edit: someone else parsed ERB using Ruby-lex, but I got errors trying to reproduce what he did. 编辑： 其他人使用Ruby-lex解析ERB，但我在尝试重现他所做的事时遇到了错误。 The rlex program did not produce a full class when it generated the parser class. 在生成解析器类时，rlex程序没有生成完整的类。

Edit: right, so you lot are depressing, but thanks for the info. 编辑：对，所以你很多都很郁闷，但感谢你的信息。 :) For my Master's project, I'm writing a test case generator that needs to work with ERB as input. :)对于我的Master的项目，我正在编写一个测试用例生成器，需要使用ERB作为输入。 Fortunately, for my purposes, I only need to recognize a few things in the ERB code, such as if statements and other conditionals as well as loops. 幸运的是，出于我的目的，我只需要识别ERB代码中的一些内容，例如if语句和其他条件以及循环。 I think I can come up with Treetop grammar to match that, with the caveat that it isn't complete for Ruby. 我想我可以提出Treetop语法来匹配它，但需要注意的是Ruby并不完整。

Answer 1

As far as I know, nobody has yet created a Treetop grammar for Ruby. 据我所知，还没有人为Ruby创建Treetop语法。 (In fact, nobody has ever been able to create any grammar for Ruby other than the YACC grammar that ships with MRI and YARV.) I know that the author of Treetop has been working on one for several years, but it's not a trivial undertaking. （事实上，除了MRI和YARV附带的YACC语法之外，没有人能够为Ruby创建任何语法。）我知道Treetop的作者已经开发了几年，但这并不是一项微不足道的事情。。 Getting the ANTLR grammar which is used in XRuby right took about 5 years, and it is still not fully compliant. 获得XRuby中使用的ANTLR语法大约花了5年时间，它仍然不完全符合。

Ruby's syntax is insanely , mindbogglingly complex. Ruby的语法是疯狂的 ，令人难以置信的复杂。

Answer 2

No 没有

I don't think so. 我不这么认为。 Specifying the complex and subtle Ruby grammar in treetop would be a major accomplishment, but it should be possible. 在treetop中指定复杂而微妙的Ruby语法将是一项重大成就，但它应该是可能的。

The actual ruby grammer is written in yacc. 实际的ruby语法是用yacc编写的。 Now, yacc is a legendary tool but treetop generates a more powerful class of parsers, so it should be possible and perhaps someone has done it. 现在，yacc是一个传奇的工具，但是treetop会生成一个更强大的解析器类，所以它应该是可能的，也许有人已经完成了它。

It's not an afternoon project. 这不是一个下午的项目。

Answer 3

可能是我在开玩笑但如果yacc不如ruby复杂那么你可以在树梢上实现yacc，而不是使用为yacc创建的ruby语法。

Answer 4

For your purposes, you can probably get away without parsing all of Ruby. 出于您的目的，您可以在不解析所有Ruby的情况下逃脱。 What you actually need is a way to detect the %> that closes off a Ruby block. 你真正需要的是一种检测关闭Ruby块的％>的方法。 If you don't ever want to fail when the Ruby code contains those closing characters, you must detect anywhere those characters can occur inside the Ruby text; 如果你不想在Ruby代码包含那些结束字符时失败，你必须检测Ruby文本中可能出现的任何字符; which means you need to detect all forms of literals. 这意味着你需要检测所有形式的文字。

However for you purposes you can probably get away with recognising the most likely cases where %> would occur in Ruby text, and ignore just those cases. 但是，出于您的目的，您可能会认识到最有可能在Ruby文本中出现％>的情况，并忽略这些情况。 This assumes of course that any remaining failure can be handled by getting your user to write the ERB a little differently. 当然，这假设可以通过让用户以不同的方式编写ERB来处理任何剩余的故障。

For what it's worth, Treetop itself "parses" Ruby blocks this way; 对于它的价值，Treetop本身以这种方式“解析”Ruby块; it just counts { and } characters until the closing one is found. 它只计算{和}字符，直到找到结束的字符。 So if your block contains a } in a literal string, you're broken (but you can work around by including the matching one in a comment). 因此，如果您的块在文字字符串中包含一个}，那么您就会被破坏（但您可以通过在评论中包含匹配的一个来解决）。

识别Treetop语法中的Ruby代码

问题描述

4 个解决方案

解决方案1
11 已采纳 2010-10-29 20:41:44

解决方案2
2 2010-10-29 18:43:47

No 没有

解决方案3
1 2010-10-31 18:06:18

解决方案4
0 2015-05-06 02:50:53

识别Treetop语法中的Ruby代码

问题描述

4 个解决方案

解决方案1 11 已采纳 2010-10-29 20:41:44

解决方案2 2 2010-10-29 18:43:47

No 没有

解决方案3 1 2010-10-31 18:06:18

解决方案4 0 2015-05-06 02:50:53

解决方案1
11 已采纳 2010-10-29 20:41:44

解决方案2
2 2010-10-29 18:43:47

解决方案3
1 2010-10-31 18:06:18

解决方案4
0 2015-05-06 02:50:53