Wednesday, March 9, 2022

Finding flaws in an algorithm and constructing the counterexamples

Here is the current fastest solution on submissions page to Leetcode 49. Group Anagrams.

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

class Solution {
public:
    vector<vector<string>> groupAnagrams(vector<string>& strs)
    {
        unordered_map<uint64_t, std::vector<std::string>> map;
        for (auto& word : strs)
            map[encode(word)].emplace_back(std::move(word));
       
        std::vector<std::vector<std::string>> result;
        for (auto& pair : map)
            result.emplace_back(std::move(pair.second));
           
        return result;
    }

    uint64_t encode(const std::string& word)
    {
        uint16_t counter[26] = {0};
        for (char c : word) ++counter[c - 'a'];
       
        uint64_t result = 0;
        uint64_t degree = 1;
       
        for (int i = 0; i < 26; ++i)
        {
            result += degree * (counter[i]);
            degree *= 26;
        }
        return result;
    }
};

Take a look at the "encode" function, which serves as a hash map from a string to an integer. Do you see any flaws?

I explained the flaw and gave a counterexample and a patched solution here. I spotted the problem immediately when I saw it, because I thought about something similar when I was solving the problem myself, and I noticed that it didn't work unless I can find a workaround. But somehow, the solution above passed all the test cases. So, yeah, in most cases it would work correctly, but with a certain probability it would fail, and it's guaranteed to fail when it's analyzed and targeted.

Here's a solution to a different problem which passed all the test cases but is also wrong: link. The problem is called "Flip Game II",
Description
You are playing the following Flip Game with your friend: Given a string that contains only these two characters: + and -, you and your friend take turns to flip two consecutive "++" into "--". The game ends when a person can no longer make a move and therefore the other person will be the winner.
Write a function to determine if the starting player can guarantee a win.
The solution in question is
class Solution {
public:
    bool canWin(string &s) {
        int _true = 0, _false = 0;
        if(s.length() < 2) {
            return false;
        }
        else {
            int i = 0, t = 0;
            while(i < s.length()) {
                if(!(s[i] == '+' && s[i + 1] == '+')) {
                    i++;
                }
                else {
                    t++;
                    int j = i;
                    for(; j < s.length(); ++j) {
                        if(s[j] != '+') {
                            break;
                        }
                    }
                    int len = j - i;
                    i = j;
                    bool tmp = true;
                    if(len != 3 && (len % 2)) {
                        tmp = false;
                    }
                    if(tmp) {
                        _true++;
                    }
                    else {
                        _false++;
                    }
                }
            }
        }
        if(_true == 2) {
            return false;
        }
        else if(_true > 0 || _false > 1) {
            return true;
        }
        else {
            return false;
        }
    }
};
This is linear in the length of the string, and it passed all the test cases, no wonder it's the first place on the leaderboard... except that, unfortunately, it's wrong.
It's because that I was looking for a linear solution myself, and I noticed that odd substrings and even substrings have different properties. But I figured that it's much more complicated than what the code above can compute. Can you give a counterexample such that the code above would give a wrong answer?
(Spoiler below)












I found interesting situations about the length of "+...+" substrings, especially with the even length ones. The strings that have odd length or length 2 are easy to deal with, but even length which >=4 are much more complicated. It seems that if the length mod 8 is 0 or 2, then it's also simple, but if it's 4 or 6 then it's complicated. (I wonder if the number of strings that have length mod 8 == 4 or 6 determine the result.)
E.g, if you give your opponent two length 4 substrings, you'll win, but give him one length 4 and one length 6, you'll lose, 6 6 you'll win, 4 8 you'll lose.

So, take a look at the solution above again. Do you see what he did?
He computed the number of odd and even length "+" substrings, and made the decision. But I already thought about this, it didn't work.
He decided that if the number of even strings is 2, then player 1 would fail for sure. But I already gave multiple examples where the number of even strings is 2, the situations are not all the same. Here's an counterexample: 4 6. If the string has a length 4 "+" substring and a length 6 one, then player 1 should win.

Testing it with input "++++-++++++" gives the wrong answer, as expected.
I wonder how the solution passed all the test cases... Is my counterexample a minimal one? What's the probability that the solution above would fail, given a random "+-" string? I'll leave these questions for future... if I have time...

Update 230827, another counter-example to the top solution to H403. Frog Jump

No comments:

Post a Comment

A Recursive Geometric Proof for 1962 IMO Problem 6

Youtube recommended a video to me, where the following problem is solved: Consider an isosceles triangle. Let $R$ be the radius of its circ...