简体   繁体   中英

Computation of DFA states

I want to compute the total number of DFA states for a certain regular expression using FLEX. Which C files or functions will help me to achieve this task using FLEX?

If you look in the file generated by flex , then the number of entries in yy_accept (and yy_base ) will probably give a good indication of the number of states used by the generated DFA. If you'd use -Cf option then yy_nxt contains the transition function of the DFA and the number of rows in the table is again the number of used states.

You may have a different version of flex where the tables are named differently, but most likely their names will be very similar.

In reaction to your questions below: the number of states in a DFA could be considered quite well defined, assuming the DFA has been minimized. The number of transitions is however much less well defined.

In the first place flex has a transition for each input character as it will ECHO any character that is not part of the defined language. This is implemented by a fresh new state to handle that case. Using a debugger you could reverse engineer which state this is. But beware that if you use start conditions, you may have to consider the possibility that there are multiple such states. If you want to analyze many regular expressions, then you may want to look into some other tools or take the sources of flex and go from there.

In the second place flex has strategies to minimize the total size of all the tables. The -Cf option instructs it to not do that. One such optimization is finding equivalence classes of characters and only use transitions for each character class. An input character is first translated to its class, which in turn is used to determine the transition. As a consequence the number of transitions is much lower, but an additional table (see yy_ec ) is required for determining the character class.

As a consequence the number of transitions is a not so well defined concept. If you are interested in determining the memory footprint of the scanner, then I would look at the size of the data section of the scanner. Use for example objdump -h on the lex.yy.o file. The size of the .rodata section will give a quite accurate estimate of the total size of the tables.

You seemed to have already found the -v option of flex that gives the number of states in the DFA in a more verbose form. In answer to why "a" {} gives 5 states, you may also use the --trace option as it gives the DFA while it is generated. Apparently there is also an End Marker rule, I assume it is used for end-of-file. For each start condition there are two states, one that is used when at the start of a line and one in the middle of a line. That makes 3 accepting states (one for "a" , one for End Marker and one for (.|"\\n") ) plus two states for the single start condition.

The source file dfa.c is not part of the generated code, but if you feel brave you could of course change the sources of flex to do further analysis of your own. I had a quick look and it does seem that generation of the code is intertwined with the transformations, which makes it a bit less modular than one would desire for an experimentation platform. Also beware of the K&R prototypes which effectively disables any type checking on the prototypes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM