樹形語法中的匹配標記對

Question

我不想重復Cthulhu的回答，但是我想使用Treetop來配對成對的打開和關閉HTML標簽。 使用此語法，我可以匹配開始標記和結束標記，但是現在我想要一條規則將它們綁定在一起。 我已經嘗試了以下方法，但是使用此方法會使我的解析器永遠運行（無限循環）：

rule html_tag_pair
  html_open_tag (!html_close_tag (html_tag_pair / '' / text / newline /
    whitespace))+ html_close_tag <HTMLTagPair>
end

我試圖基於Treetop Github頁面上的遞歸括號示例和否定超前示例為基礎。 我引用的其他規則如下：

rule newline
  [\n\r] {
    def content
      :newline
    end
  }
end

rule tab
  "\t" {
    def content
      :tab
    end
  }
end

rule whitespace
  (newline / tab / [\s]) {
    def content
      :whitespace
    end
  }
end

rule text
  [^<]+ {
    def content
      [:text, text_value]
    end
  }
end

rule html_open_tag
  "<" html_tag_name attribute_list ">" <HTMLOpenTag>
end

rule html_empty_tag
  "<" html_tag_name attribute_list whitespace* "/>" <HTMLEmptyTag>
end

rule html_close_tag
  "</" html_tag_name ">" <HTMLCloseTag>
end

rule html_tag_name
  [A-Za-z0-9]+ {
    def content
      text_value
    end
  }
end

rule attribute_list
  attribute* {
    def content
      elements.inject({}){ |hash, e| hash.merge(e.content) }
    end
  }
end

rule attribute
  whitespace+ html_tag_name "=" quoted_value {
    def content
      {elements[1].content => elements[3].content}
    end
  }
end

rule quoted_value
  ('"' [^"]* '"' / "'" [^']* "'") {
    def content
      elements[1].text_value
    end
  }
end

我知道我需要允許匹配單個開始或結束標記，但是如果存在一對HTML標記，我希望將它們作為一對組合在一起。 通過將它們與我的語法相匹配似乎最干凈，但是也許有更好的方法嗎？

Answer 1

這是一個非常簡單的語法，它使用語義謂詞將結束標記與開始標記匹配。

grammar SimpleXML
  rule document
    (text / tag)*
  end

  rule text
    [^<]+
  end

  rule tag
    "<" [^>]+ ">" (text / tag)* "</" [^>]+ &{|seq| seq[1].text_value == seq[5].text_value } ">"
  end
end

Answer 2

您只能使用針對每個HTML標簽對的單獨規則或使用語義謂詞來執行此操作。 也就是說，通過保存開始標簽（在一個臨時標簽中），然后僅在結束標簽是同一標簽時才接受（在另一個臨時一個標簽中）。 在Treetop中，這比應該做的要難得多，因為沒有方便的位置來保存上下文，並且您無法窺視解析器堆棧，但是有可能。

順便說一句，在解析MIME邊界（和Markdown）中也會出現相同的問題。 我沒有在ActionMailer中檢查Mikel的實現（可能他為此使用了嵌套的Mime解析器），但是在Treetop中是可能的。

在http://github.com/cjheath/activefacts/blob/master/lib/activefacts/cql/parser.rb中，我將上下文保存在偽輸入流中-您可以看到它必須支持哪些方法-因為“輸入”是在所有SyntaxNode上可用。 我在這里使用sempred的原因有所不同，但是某些技術是適用的。

樹形語法中的匹配標記對

問題描述

2 個解決方案

解決方案1
5 2012-11-22 18:24:00

解決方案2
1 已采納 2012-08-31 00:45:11

樹形語法中的匹配標記對

問題描述

2 個解決方案

解決方案1 5 2012-11-22 18:24:00

解決方案2 1 已采納 2012-08-31 00:45:11

解決方案1
5 2012-11-22 18:24:00

解決方案2
1 已采納 2012-08-31 00:45:11