Leetcode: 1987. Number of Unique Good Subsequences
- Mistake: Bug caused by incorrect assumption. I re-wrote part of the code, but didn't check if I had to update the other part of the code.
Can we derive an invariant based on the smallest possible examples? First, I generated all unique substrings of the string "1001" to see if a pattern emerge:
1 xxx1 2 xx0x xx01 2 x00x x001 5 1xx1 1x0x 1x01 100x 1001
Let's compute \(dp[i]\) which is the number of unique subsequences starting on \(i\). Let's do it from left to right. Therefore, \(dp[N - 1]=1\). Be \(i, j\) indexes where \(i < j\), \(binary[i]=binary[j]\) and \(binary[i] \neq binary[k]\) for \(i < k < j\). Suppose that \(binary[i]=1\). Note that we can extend by one all unique subsequence starting with 1 (i.e. \(binary[i]\)) and they will all be unique. Besides that, we can also extend all subsequences starting with 0 (i.e. \(binary[i+1], binary[i+2], .., binary[j-1]\)) by one and they will all be unique. So, in this case, \(dp[i]=dp[i+1]+dp[i+2]+...+dp[j]\). We can't compute this sum, since \(O(n^2)\) will not make the cut. Note that if there are no zeros between \(i\) and \(j\), \(dp[i]=dp[j]\). This means that \(dp[i+1]=dp[i+2]=...=dp[j-1]\), since they are computed similarly to the way we compute \(dp\) for ones. Therefore, \(dp[i]=dp[i+1] \times (j - i) + dp[j]\). To compute \(dp\) efficiently, we can keep track of the last one and last zero and use it to compute \(dp\). Time and space is \(O(n)\).
class Solution: def numberOfUniqueGoodSubsequences(self, binary: str) -> int: MOD = 10**9 + 7 N = len(binary) dp = [0] * N dp[N - 1] = 1 last = {} last["1"], last["0"] = (N - 1, None) if binary[-1] == "1" else (None, N - 1) for i in range(N - 2, -1, -1): b = binary[i] if last[b] is None: dp[i] = N - i else: if last[b] - i == 1: dp[i] = dp[i + 1] else: dp[i] = (dp[last[b]] + dp[i + 1] * (last[b] - i - 1)) % MOD last[b] = i extra = 1 if "0" in binary else 0 return (sum(dp[i] for i in range(N) if binary[i] == "1") + extra) % MOD assert Solution().numberOfUniqueGoodSubsequences("001") == 2 assert Solution().numberOfUniqueGoodSubsequences("11") == 2 assert Solution().numberOfUniqueGoodSubsequences("101") == 5
How can we extend the solution for \(i\) to \(i+1\)? Other way to solve the problem is counting the unique subsequences from left to right which means that extend the unique subsequences that ended with 1 and 0 so far. Be \(e0\) and \(e1\) the number of unique subsequences that end with zero and one respectively. If the current number is 0, we will have all subsequences that ended with zero plus all subsequences that ended with one but now ending on zero. So, \(e0 = e0 + e1\). If the current number is 1, we will have the same but we have to re-add 1
to the count. So, \(e1 = e0 + e1 + 1\). Time complexity is \(O(n)\) and space is \(O(1)\).
class Solution: def numberOfUniqueGoodSubsequences(self, binary: str) -> int: MOD = 10**9 + 7 count_zero = False e1 = 0 e0 = 0 for b in binary: if b == "0": e0 = (e0 + e1) % MOD count_zero = True else: e1 = (e1 + e0 + 1) % MOD return (e1 + e0 + (1 if count_zero else 0)) % MOD assert Solution().numberOfUniqueGoodSubsequences("001") == 2 assert Solution().numberOfUniqueGoodSubsequences("11") == 2 assert Solution().numberOfUniqueGoodSubsequences("101") == 5