简体   繁体   中英

Is it possible to convert ANTLR3 grammar into regular expression?

I have an ANTLR3 simple grammar parser that takes short lines of text and convert them to Java objects. Next, I have a big list of text lines. Some of them (less than 1%) can be converted because they match the grammar.

I need to pass all of them through the parser in order to understand which are convertible, and create a collection of Java objects. Very time-consuming operation. Would be much more effective to pass them through a regular expression before sending to ANTLR3.

I can create such a regex myself, but would be much better to get it dynamically from ANTLR3 parser. Is it possible to do?

In general, only regular languages can have a regular expression describing them. (Most RE-engines support features that match also some non-regular languages (for example by back-references), but still not all context-free or even general formal languages.)

I'm not really an antlr-expert, but I suppose its grammars can match languages which are not RE-matchable.

So, even the theoretical solvability of your problem is not really given, it depends on the grammar.

It may be that

  • your grammar is in fact a regular grammar, or
  • that there is a regular language which is a slight superset of your grammars language - thus the RE can filter out most non-matching lines, while some will only be filtered out by the grammar/parser.

There most probably is no certain way to be sure without seeing your grammar.

Also, a regular expression is not necessarily faster than your parser would be.

To be specific of ANTLR can parse LL(*) subclass of context free grammars .

To tell if your language is Regular, see if you can apply the pumping lemma for regular languages .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM