Leetcode: 10. Regular Expression Matching
Problem Statement: Given two string \(s\) and a pattern \(p\), return if \(s\) matches \(p\). Note that \(p\) contains English letters, '.'
and '*'
(Kleene start). The character '.'
can be matched with any other character.
Simulating all transitions
Let's use an example to explain the idea (kind of how I saw the solution). Suppose that \(s=aabb\) and \(p=aa*a.*\). Be \(i\) the position on \(s\) that was matched so far, and \(j\) the position on the pattern that we are matching \(s_i\) against. initially \(i=j=0\). as \(s_i=p_i\), we can move forward in \(s\) and \(p\) by adding one to \(i\) and \(j\). Now, \(s_1=a\), \(p_1=a\) and \(p_1\) is a zero or more pattern. If we match \(s_1\) with \(p_1\), we won't be able to match the rest of \(s\) because \(p_3\) must be matched with an letter a in \(s\) and \(s_1\) is the last one. Even thought we can't perform this match, it isn't always true for other inputs (e.g. \(s=aaabb\) and \(p=aa*a.*\)). The conclusion is that we have to branch the matching every time that \(p_j\) is a Kleene star. In this example, we would have to try matching \(s_1\) with \(p_1\) and \(s_1\) with \(p_3\) (we have to jump the *
). So, for every \(i\), we have to keep a set of position \(js\) to match \(s_i\) against. After performing all matches, we would be able to generate the new \(js\) to match \(s_{i+1}\) against. The following code implements this idea.
With \(N\) and \(M\) being the length of \(s\) and \(p\) respectively,
- Time complexity: \(O(N\times M)\), since
transition
is \((M)\) andjs
has at most \(N\) elements on every iteration of the loop. - Space complexity: \(O(N + M^2)\). The function
transition
uses \(O(M^2)\) space since it stores a list of \(M\) values for each position \(j\). The variable \(js\) contains at most \(N\) values.
from functools import cache def solve(s, p): def is_zero_or_more(j): return p[j + 1 : j + 2] == "*" def expand(j): """Return list of all consecutives zero or more patterns starting on j.""" ans = [j] k = 0 while k < len(ans): if is_zero_or_more(ans[k]): ans.append(ans[k] + 2) k += 1 return ans @cache def transition(j): if j == len(p): return [] if not is_zero_or_more(j): return expand(j + 1) return [j] + expand(j + 2) i, js = 0, [0] if p[1:2] != "*" else [0] + expand(2) while i < len(s) and len(js) > 0: next_js = set() for j in js: if j == len(p): continue assert p[j] != "*" if p[j] == "." or p[j] == s[i]: next_js.update(transition(j)) if len(next_js) > 0: i += 1 js = next_js return len(p) in js and i == len(s) assert solve("aa", "a") == False assert solve("aa", "a*") == True assert solve("ab", ".*") == True assert solve("aabb", "a.*b") == True assert solve("aabb", "a.*bc") == False assert solve("a", ".*") == True assert solve("a", "b") == False assert solve("aaaaaaaaaaaaaaaaaaaa", ".*.*.*.*.*.*.*.*.*.*") == True assert solve("a", ".*a") == True assert solve("b", ".*a") == False assert solve("aaa", ".*aaa") == True assert solve("aaa", "ab*ac*a") == True assert solve("aa", "ab*c*a") == True assert solve("mississippi", "mis*is*p*.") == False assert solve("aaabaaaababcbccbaab", "c*c*.*c*a*..*c*") == True class Solution: def isMatch(self, s: str, p: str) -> bool: return solve(s, p)
Dynamic Programming
We would like to compute a function \(f(i,j)\) that is true if it is possible to match \(s\) starting on \(i\) with \(p\) starting on \(j\), and false otherwise. To do so, we have to try matching \(s_i\) with \(p_j\) if possible, and also skipping zero or more pattern. The following code implements using top-down approach.
from functools import cache def solve(s, p): def match(i, j): return p[j] == "." or s[i] == p[j] def is_zero_or_more(j): return p[j + 1 : j + 2] == "*" @cache def dfs(i, j): if i == len(s): return j == len(p) or (is_zero_or_more(j) and dfs(i, j + 2)) if j == len(p): return False assert p[j] != "*" if is_zero_or_more(j): return dfs(i, j + 2) or ( match(i, j) and (dfs(i + 1, j) or dfs(i + 1, j + 2)) ) elif match(i, j): return dfs(i + 1, j + 1) return False return dfs(0, 0) assert solve("aa", "a") == False assert solve("aa", "a*") == True assert solve("ab", ".*") == True assert solve("aabb", "a.*b") == True assert solve("aabb", "a.*bc") == False assert solve("a", ".*") == True assert solve("a", "b") == False assert solve("aaaaaaaaaaaaaaaaaaaa", ".*.*.*.*.*.*.*.*.*.*") == True assert solve("a", ".*a") == True assert solve("b", ".*a") == False assert solve("aaa", ".*aaa") == True assert solve("aaa", "ab*ac*a") == True assert solve("aa", "ab*c*a") == True assert solve("mississippi", "mis*is*p*.") == False assert solve("aaabaaaababcbccbaab", "c*c*.*c*a*..*c*") == True class Solution: def isMatch(self, s: str, p: str) -> bool: return solve(s, p)
Sometimes, top-down approach doesn't work due stack limit. So, it is a good training to also write a bottom-up solution.
from functools import cache def solve(s, p): def match(i, j): return p[j] == "." or s[i] == p[j] def is_zero_or_more(j): return p[j + 1 : j + 2] == "*" dp = [[False for _ in range(len(p) + 1)] for _ in range(len(s) + 1)] for j in range(len(p), -1, -1): dp[len(s)][j] = j == len(p) or (is_zero_or_more(j) and dp[len(s)][j + 2]) for i in range(len(s) - 1, -1, -1): # reversed(range(len(s))): for j in range(len(p) - 1, -1, -1): # reversed(range(len(p))): if is_zero_or_more(j): dp[i][j] = dp[i][j + 2] or ( match(i, j) and (dp[i + 1][j] or dp[i + 1][j + 2]) ) elif match(i, j): dp[i][j] = dp[i + 1][j + 1] return dp[0][0] assert solve("aa", "a") == False assert solve("aa", "a*") == True assert solve("ab", ".*") == True assert solve("aabb", "a.*b") == True assert solve("aabb", "a.*bc") == False assert solve("a", ".*") == True assert solve("a", "b") == False assert solve("aaaaaaaaaaaaaaaaaaaa", ".*.*.*.*.*.*.*.*.*.*") == True assert solve("a", ".*a") == True assert solve("b", ".*a") == False assert solve("aaa", ".*aaa") == True assert solve("aaa", "ab*ac*a") == True assert solve("aa", "ab*c*a") == True assert solve("mississippi", "mis*is*p*.") == False assert solve("aaabaaaababcbccbaab", "c*c*.*c*a*..*c*") == True class Solution: def isMatch(self, s: str, p: str) -> bool: return solve(s, p)
Both, implementations have
- time complexity: \(O(N \times N)\), and
- space complexity: \(O(N \times N)\).