# 为什么需要 NFA 转 DFA

Regex 正则表达式原理及如何从零实现 中我们已经构建了一个可用的正则表达式引擎。

# DFA和NFA

## 等价性

DFA和NFA存在等价性，也就是说任何NFA都可以转化为等价的DFA。

DFA的匹复杂度只有O(n)，但因为要递归和回溯NFA的匹配复杂度达到了O(n^2)。

# NFA转DFA算法

NFA转DFA的算法叫做子集构造法，其具体流程如下。

## 例子

from to input
0-> 1  a
1-> 8  Epsilon
8-> 9  Epsilon
8-> 6  Epsilon
6-> 2  Epsilon
6-> 4  Epsilon
2-> 3  b
4-> 5  c
3-> 7  Epsilon
5-> 7  Epsilon
7-> 9  Epsilon
7-> 6  Epsilon


# 代码实现

 private static DFAGraph convertNfa2Dfa(NFAGraph nfaGraph) {
DFAGraph dfaGraph = new DFAGraph();
Set<State> startStates = new HashSet<>();
// 用NFA图的起始节点构造DFA的起始节点 步骤1
if (startStates.size() == 0) {
}
dfaGraph.start = dfaGraph.getOrBuild(startStates);
Set<State> finishedStates = new HashSet<>();
// 如果BFS的方式从已找到的起始节点遍历并构建DFA
while (!queue.isEmpty()) {  // 步骤2
DFAState curState = queue.poll();
for (State nfaState : curState.nfaStates) {
Set<State> nextStates = new HashSet<>();
Set<String> finishedEdges = new HashSet<>();
for (String edge : nfaState.next.keySet()) {
if (finishedEdges.contains(edge)) {
continue;
}
Set<State> efinishedState = new HashSet<>();
for (State state : curState.nfaStates) {
Set<State> edgeStates = state.next.getOrDefault(edge, Collections.emptySet());
for (State eState : edgeStates) {
// 添加E可达节点
if (efinishedState.contains(eState)) {
continue;
}
}
}
// 将NFA节点列表转化为DFA节点，如果已经有对应的DFA节点就返回，否则创建一个新的DFA节点
DFAState nextDFAstate = dfaGraph.getOrBuild(nextStates);
if (!finishedStates.contains(nextDFAstate)) {
}
}
}
}
return dfaGraph;
}

• DFAState
public class DFAState extends State {
public Set<State> nfaStates = new HashSet<>();
// 保存对应NFAState的id,一个DFAState可能是多个NFAState的集合,所以拼接成String
private String allStateIds;
public DFAState() {
this.stateType = 2;
}

public DFAState(String allStateIds, Set<State> states) {
this.allStateIds = allStateIds;
//这里我将步骤五直接集成在创建DFA节点的过程中了
for (State state : states) {
if (state.isEndState()) {
this.stateType = 1;
}
}
}

public String getAllStateIds() {
return allStateIds;
}
}


public class DFAGraph {

private Map<String, DFAState> nfaStates2dfaState = new HashMap<>();
public DFAState start = new DFAState();

// 这里用map保存NFAState结合是已有对应的DFAState, 有就直接拿出来用
public DFAState getOrBuild(Set<State> states) {
String allStateIds = "";
int[] ids = states.stream()
.mapToInt(state -> state.getId())
.sorted()
.toArray();
for (int id : ids) {
allStateIds += "#";
allStateIds += id;
}
if (!nfaStates2dfaState.containsKey(allStateIds)) {
DFAState dfaState = new DFAState(allStateIds, states);
nfaStates2dfaState.put(allStateIds, dfaState);
}
return nfaStates2dfaState.get(allStateIds);
}
}


# DFA引擎匹配过程

dfa引擎的匹配也可以完全复用NFA的匹配过程，所以对之前NFA的匹配代码，可以针对DFA模式取消回溯即可(不取消也没问题，但会有性能影响)。

private boolean isMatch(String text, int pos, State curState) {
if (pos == text.length()) {
if (curState.isEndState()) {
return true;
}
for (State nextState : curState.next.getOrDefault(Constant.EPSILON, Collections.emptySet())) {
if (isMatch(text, pos, nextState)) {
return true;
}
}
return false;
}
for (Map.Entry<String, Set<State>> entry : curState.next.entrySet()) {
String edge = entry.getKey();
// 如果是DFA模式,不会有EPSILON边
if (Constant.EPSILON.equals(edge)) {
for (State nextState : entry.getValue()) {
if (isMatch(text, pos, nextState)) {
return true;
}
}
} else {
MatchStrategy matchStrategy = MatchStrategyManager.getStrategy(edge);
if (!matchStrategy.isMatch(text.charAt(pos), edge)) {
continue;
}
// 遍历匹配策略
for (State nextState : entry.getValue()) {
// 如果是DFA匹配模式,entry.getValue()虽然是set,但里面只会有一个元素,所以不需要回溯
if (nextState instanceof DFAState) {
return isMatch(text, pos + 1, nextState);
}
if (isMatch(text, pos + 1, nextState)) {
return true;
}
}
}
}
return false;
}


private boolean isDfaMatch(String text, int pos, State startState) {
State curState = startState;
while (pos < text.length()) {
boolean canContinue = false;
for (Map.Entry<String, Set<State>> entry : curState.next.entrySet()) {
String edge = entry.getKey();
MatchStrategy matchStrategy = MatchStrategyManager.getStrategy(edge);
if (matchStrategy.isMatch(text.charAt(pos), edge)) {
curState = entry.getValue().stream().findFirst().orElse(null);
pos++;
canContinue = true;
break;
}
}
if (!canContinue) {
return false;
}
}
return curState.isEndState();
}


# 个人收获

## paper

Efficient Regular Expression Matching Algorithm Based on DoLFA 基于DoLFA的高效正则表达式匹配算法

Regular Expression Optimization in Java Java正则表达式优化

# 拓展阅读

01-Regex 正则表达式入门

02-Regex 正则表达式与 DFA

03-Regex 正则表达式原理及如何从零实现

# 参考资料

automata-conversion-from-nfa-with-null-to-dfa