简体   繁体   English

如何在内部表示确定性有限自动机

[英]How to internally represent a Deterministic Finite Automata

I have a big Deterministic Finite Automaton with ~15M states and the current Java implementation is rather slow and memory-consuming.我有一个具有约 15M 状态的大型确定性有限自动机,而当前的 Java 实现相当缓慢且消耗内存。 I am looking for a compact and fast representation that could replace the current code.我正在寻找可以替换当前代码的紧凑且快速的表示。

The automaton consists of the following parts:自动机由以下部分组成:

  1. States identified by integers.由整数标识的状态。 The initial state is always state 0 .初始 state 始终为 state 0
  2. State transitions (triple of source state id, character for transition or wildcard (max 32 values), target state id). State 转换(源 state id 的三倍,转换或通配符的字符(最多 32 个值),目标 state id)。
  3. A Set of accepting states.一组接受状态。

I am experimenting with the following approaches, and I am looking for other ideas.我正在尝试以下方法,并且正在寻找其他想法。

1. Using Java Collections 1.使用Java Collections

The state transitions are represented in the following way: state 转换以下列方式表示:

final List<Map<Character, Integer>> transitions = new ArrayList<>();
final Set<Integer> acceptingStates = new HashSet<>();

The i'th item in the list contains the state transitions for the i'th state.列表中的第 i 个项目包含第 i 个 state 的 state 转换。 Some profiling shown me that most of the execution time is taken my accessing the map.一些分析告诉我,大部分执行时间都花在了访问 map 上。

2. Arrays of arrays 2. arrays 的 Arrays

final int[][] states = new int[STATE_COUNT][32];
Set<Integer> acceptingStates = new HashSet<>();

The i'th row contains state transitions for the i'th state.第 i 行包含第 i 个 state 的 state 转换。 The j'th column of the inner array contains the id of the state for the j'th character from i'th state or -1 when missing.内部数组的第 j 列包含来自第 i 个 state 的第 j 个字符的 state 的 id,如果缺失,则为 -1。

This representation is much faster, but it still takes up least N * 32 * 4 bytes.这种表示要快得多,但它仍然占用至少 N * 32 * 4 个字节。

I would propose a framework to implement your state machine but one statement in your definition puzzles me, you have 15M States or 15M State Machine instances.我会提出一个框架来实现你的 state 机器,但你定义中的一个语句让我感到困惑,你有 15M 状态或 15M State 机器实例。

If your application is a Server Side application that can cluster up to host 15M State Machine instances or 15M States, I would propose that you look to the Akka Finite State Machine framework.如果您的应用程序是一个服务器端应用程序,可以集群最多托管 15M State 机器实例或 15M 状态,我建议您查看Akka 有限机器框架。

Akka Framework has a very low memory footprint for State Machine instances, which can satisfy your requirement. Akka 框架对于 State 机器实例的 memory 占用空间非常低,可以满足您的要求。 I personally developed systems with 50M State Machine instances and the system could easily contain much more then that.我个人开发了具有 50M State 机器实例的系统,该系统可以轻松包含更多。

If you need an implementation example, you can check my following blogs blog1 blog2 .如果需要实现示例,可以查看我的以下博客blog1 blog2

I hope this would help you.我希望这会对你有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM