简体   繁体   中英

Algorithm for extracting a language from DFA

I am trying to find an algorithm to extract a language from a DFA. If that language is infinite then I only need to report up to : maxcount <= 1000 strings. there are no restrictions on what order the are reported.

I have tried:(I don't know how to write algorithms so I just explain in words what I did)

a recursive algorithm - for each transition from one state to the next state/states save the transition character and then make a recursive call for each transition,if that transition was to an accepting state then report .if we have reached a "dead end" then there is noting to recurse on.

suppose the start state is state 1 and there are two transitions to: state 2 and state 3 with transition characters 'a' , 'b' respectively. Then there is a transition from state 3 to state 4 with transition character 'c' , only state 4 is an accepting state.

going from state 1 to state 2 will give save 'a' as transition character , but since state 2 is a dead end there will be noting to recurse on and since it is not an accepting state then there is noting to report.

Going from state 1 to state 3 will save 'b' and then go from state 3 to state 4 will save 'c' so that we have a total of "bc" and since state 4 does not have any transitions the recursion ends. We report "bc" because state 4 is an accepting state


If I have DFA with a loop - then in order to not get "stuck" in that loop I have made sure that "if" there is a another possible transition then we will always pick that transition and not the transition we made last time/times we where at that state ( a kind of memory for which transition we made at which states )

The algorithm works for small DFAs but it will give a Stack overflow on bigger DFA:s (imagine 20 transitions from state 1 to state 2 and 30 transitions from state 2 to state 3 and so on ).

Can anybody recommend more efficient algorithm?

  1. Run a cycle detection algorithm.
  2. If there are no cycles, run any path finding algorithm (BFS, DFS, Dijkstra, A*, etc) to list all paths from start to finish.
  3. If there is a cycle run pathfinding to find a path from the start node to the cycle. Then output (start node to the cycle) + (the cycle starting at the connecting node) * N for N = 1,2,3,...,1000.

Alternately:

  1. Convert to regex.
  2. Generate words from regex.

I'd do this with a breadth-first search over the DFA, which will produce strings ordered by length.

Define an object consisting of a state and a string (there is a more memory-efficient solution here, but I think that if you only need to produce 1000 strings, this will be fine). Then create a work-queue of such objects. Initialize the workqueue with a single object whose state is the start state and whose string is empty.

Now repeat the following three steps until you have found maxcount strings or the queue becomes empty:

  1. Remove the first object in the queue.

  2. If its state is an accepting state, output its string.

  3. For each possible out transition (which consists of a character and a new state), add to the end of the queue a new object with the transition's state and the concatenation of the object's string with the transition's character.

If you use a BFS, then cycles don't matter. The idea is to define search nodes that contain the current state and a pointer to the predecessor node. Therefore when you visit a node for an accepting state, you can trace the previous pointers backward to determine the accepted string (in reverse). It turns out that its elegant if a search node also contains the character that caused the transition from the previous node's state to the current one.

Here's an approach in Java:

import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

class Experimental {
  // DFA state with its transitions, possibly accepting.
  static class State {
    final Map<Character, State> transitions = new HashMap<>();
    final boolean accept;    
    State(boolean accept) {
      this.accept = accept;
    }
  }

  // A little DFA.
  static final State s0 = new State(false);
  static final State s1 = new State(false);
  static final State s2 = new State(true);
  static final State s3 = new State(true);
  static {
    s0.transitions.put('a', s1);
    s0.transitions.put('b', s2);
    s0.transitions.put('c', s3);
    s1.transitions.put('d', s3);
    s2.transitions.put('e', s0);
    s2.transitions.put('f', s1);
  }

  // An enumerator of strings accepted by the DFA in order of length.
  static class Enumerator {
    static class Node {
      final Node prev;
      final char prevCh;
      final State state;
      Node(State start) {
        this(null, Character.MIN_VALUE, start);
      } 
      Node(Node prev, char ch, State state) {
        this.prev = prev;
        this.prevCh = ch;
        this.state = state;
      }
    }

    final Deque<Node> queue = new ArrayDeque<>();
    final List<String> output = new ArrayList<>();
    final State start;

    Enumerator(State start) {
      this.start = start;
    }

    Enumerator enumerate(int outputLimit) {
      queue.clear();
      output.clear();
      // Enqueue a search node for the start state.
      queue.add(new Node(start));
      while (!queue.isEmpty() && output.size() < outputLimit) {
        Node current = queue.pollFirst();
        if (current.state.accept) {
          // Follow prev pointers to build the accepted string.
          StringBuilder sb = new StringBuilder();
          for (Node p = current; p.prev != null; p = p.prev) {
            sb.append(p.prevCh);
          }
          output.add(sb.reverse().toString());
        }
        // Enqueue nodes for the successors of current state.
        for (Entry<Character, State> transition : current.state.transitions.entrySet()) {
          queue.addLast(new Node(current, transition.getKey(), transition.getValue()));
        }
      }
      return this;
    }
  }

  public static void main(String[] args) {
    System.out.println(new Enumerator(s0).enumerate(20).output);
  }
}

Output:

[b, c, ad, beb, bec, bfd, bead, bebeb, bebec, bebfd, bebead, bebebeb, bebebec, bebebfd, bebebead, bebebebeb, bebebebec, bebebebfd, bebebebead, bebebebebeb]

Note : I am guessing you can model this problem as finding all paths between start state and all accepting states. So, i believe this can be accomplished with a depth-first search of the graph. The depth-first search will find all non-cyclical paths between two nodes.

If you are familiar with Depth First Search (DFS) algorithm which is a graph search/traversal algorithm can be helpful in your problem. Let me give you a very simple example.

Given a directed graph, a source vertex 's' and a destination vertex 'd', print all paths from given 's' to 'd'. Consider the following directed graph. Let the s be 2 and d be 3. There are 4 different paths from 2 to 3.

在此输入图像描述

The idea is to do Depth First Traversal of given directed graph. Start the traversal from source. Keep storing the visited vertices in an array say path[] . If we reach the destination vertex, print contents of path[] . The important thing is to mark current vertices in path[] as visited also, so that the traversal doesn't go in a cycle.

Sample Code : You can find a very simple code in Java here to find all paths between two nodes in a graph.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM