Leetcode: 1987. Number of Unique Good Subsequences

Mistake: Bug caused by incorrect assumption. I re-wrote part of the code, but didn't check if I had to update the other part of the code.

Can we derive an invariant based on the smallest possible examples? First, I generated all unique substrings of the string "1001" to see if a pattern emerge:

1
xxx1
2
xx0x
xx01
2
x00x
x001
5
1xx1
1x0x
1x01
100x
1001

Let's compute \(dp[i]\) which is the number of unique subsequences starting on \(i\). Let's do it from left to right. Therefore, \(dp[N - 1]=1\). Be \(i, j\) indexes where \(i < j\), \(binary[i]=binary[j]\) and \(binary[i] \neq binary[k]\) for \(i < k < j\). Suppose that \(binary[i]=1\). Note that we can extend by one all unique subsequence starting with 1 (i.e. \(binary[i]\)) and they will all be unique. Besides that, we can also extend all subsequences starting with 0 (i.e. \(binary[i+1], binary[i+2], .., binary[j-1]\)) by one and they will all be unique. So, in this case, \(dp[i]=dp[i+1]+dp[i+2]+...+dp[j]\). We can't compute this sum, since \(O(n^2)\) will not make the cut. Note that if there are no zeros between \(i\) and \(j\), \(dp[i]=dp[j]\). This means that \(dp[i+1]=dp[i+2]=...=dp[j-1]\), since they are computed similarly to the way we compute \(dp\) for ones. Therefore, \(dp[i]=dp[i+1] \times (j - i) + dp[j]\). To compute \(dp\) efficiently, we can keep track of the last one and last zero and use it to compute \(dp\). Time and space is \(O(n)\).

class Solution:
    def numberOfUniqueGoodSubsequences(self, binary: str) -> int:
        MOD = 10**9 + 7
        N = len(binary)

        dp = [0] * N
        dp[N - 1] = 1
        last = {}
        last["1"], last["0"] = (N - 1, None) if binary[-1] == "1" else (None, N - 1)

        for i in range(N - 2, -1, -1):
            b = binary[i]
            if last[b] is None:
                dp[i] = N - i
            else:
                if last[b] - i == 1:
                    dp[i] = dp[i + 1]
                else:
                    dp[i] = (dp[last[b]] + dp[i + 1] * (last[b] - i - 1)) % MOD
            last[b] = i

        extra = 1 if "0" in binary else 0
        return (sum(dp[i] for i in range(N) if binary[i] == "1") + extra) % MOD


assert Solution().numberOfUniqueGoodSubsequences("001") == 2
assert Solution().numberOfUniqueGoodSubsequences("11") == 2
assert Solution().numberOfUniqueGoodSubsequences("101") == 5

How can we extend the solution for \(i\) to \(i+1\)? Other way to solve the problem is counting the unique subsequences from left to right which means that extend the unique subsequences that ended with 1 and 0 so far. Be \(e0\) and \(e1\) the number of unique subsequences that end with zero and one respectively. If the current number is 0, we will have all subsequences that ended with zero plus all subsequences that ended with one but now ending on zero. So, \(e0 = e0 + e1\). If the current number is 1, we will have the same but we have to re-add 1 to the count. So, \(e1 = e0 + e1 + 1\). Time complexity is \(O(n)\) and space is \(O(1)\).

class Solution:
    def numberOfUniqueGoodSubsequences(self, binary: str) -> int:
        MOD = 10**9 + 7

        count_zero = False
        e1 = 0
        e0 = 0
        for b in binary:
            if b == "0":
                e0 = (e0 + e1) % MOD
                count_zero = True
            else:
                e1 = (e1 + e0 + 1) % MOD

        return (e1 + e0 + (1 if count_zero else 0)) % MOD


assert Solution().numberOfUniqueGoodSubsequences("001") == 2
assert Solution().numberOfUniqueGoodSubsequences("11") == 2
assert Solution().numberOfUniqueGoodSubsequences("101") == 5