Friday, June 9, 2023

Leetcode M1318 Minimum Flips to Make a OR b Equal to c, E1351 Count Negative Numbers in a Sorted Matrix

M1318 description:
  Given 3 positives numbers a, b and c. Return the minimum flips required in some bits of a and b to make ( a OR b == c ). (bitwise OR operation).
  Flip operation consists of change any single bit 1 to 0 or change the bit 0 to 1 in their binary representation.
Constraints:
  1 <= a <= $10^9$
  1 <= b <= $10^9$
  1 <= c <= $10^9$
This problem is very easy, a typical algorithm is to go through each bit position of the three numbers. If $c$ at that position is 1 but $a|b$ at that position is 0, we add one flip, and if $c$ is 0 we add one if $a$ is 1 and another one if $b$ is 1 at that position. (Since the numbers are positive, we don't need to check the 32nd bit.)

But putting these together, let's make a table:
c 1, a|b 0 : add 1
c 0, a&b 1 : add 2
c 0, a|b 1 : add 1
We can do the bitwise operations to the numbers beforehand, instead of checking each digit in the loop. So here's the first version that I submitted:
```cpp
class Solution {
public:
    int minFlips(int a, int b, int c) {
        int r=0;
        int t=1;
        int abc=(a|b)^c,aAb=a&b;
        for(int i=0;i<31;++i){
            if(abc&t){
                r++;
                if(aAb&t) r++;
            }
            t<<=1;
        }
        return r;
    }
};
```
Then I noticed that, I can also combine the '&' operation to get rid of the branching. The version without the branching reads,
```cpp
class Solution {
public:
    int minFlips(int a, int b, int c) {
        int r=0;
        int abc=(a|b)^c,aAb=a&b&abc;
        for(int i=0;i<31;++i) r+=(abc>>i&1)+(aAb>>i&1);
        return r;
    }
};
```
At this point, it's obvious that we are simply counting the number of '1's in the two numbers in binary, which is known as the Hamming weight or population count. There is a built-in function in GCC compiler for that, "__builtin_popcount(int)", so we can write it like this:
```cpp
class Solution {
public:
    int minFlips(int a, int b, int c) {
        uint abc=(a|b)^c,aAb=a&b&abc;
        #ifdef __GNUG__
        return __builtin_popcount(abc)+__builtin_popcount(aAb);
        #else
        int r=0;
        for(int i=0;i<31;++i) r+=(abc>>i&1)+(aAb>>i&1);
        return r;
        #endif
    }
};
```
I did some research and noticed that llvm also supports the function "__builtin_popcount", and MSVC has these functions "__popcnt16, __popcnt, __popcnt64". We can use these functions according to the compiler.
(Update: starting from C++20, there's an "std::popcount()" function in the header <bit>.)

Then I found this webpage, which lists the best bit hacks currently known for many computation tasks. I'm fascinated by the esoteric beauty of the algorithms and magic numbers. Using the algorithm in the list, we can write the solution as
```cpp
#ifndef __GNUG__
int popC_32(uint v){
    v = v - ((v >> 1) & 0x55555555);
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
    return ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
#endif
class Solution {
public:
    int minFlips(int a, int b, int c) {
        ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
        uint abc=(a|b)^c,aAb=a&b&abc;
        #ifdef __GNUG__
        return __builtin_popcount(abc)+__builtin_popcount(aAb);
        #else
        return popC_32(abc)+popC_32(aAb);
        #endif
    }
};
```
But since many CPUs support the instruction "POPCNT", it's probably better to use the built-in functions if they are available.
---
E1351 description:
  Given a m x n matrix grid which is sorted in non-increasing order both row-wise and column-wise, return the number of negative numbers in grid.
Constraints:
  m == grid.length
  n == grid[i].length
  1 <= m, n <= 100
  -100 <= grid[i][j] <= 100
An O(m+n) algorithm is easy to find,
```cpp
class Solution {
public:
    int countNegatives(vector<vector<int>>& grid) {
        const int m=grid.size(),n=grid[0].size();
        int k=n-1,r=0;
        for(int i=0;i<m;++i){
            while(k>=0&&grid[i][k]<0) k--;
            if(k<0){r+=(m-i)*n;return r;}
            r+=n-1-k;
        }
        return r;
    }
};
```
We may try to use binary search to increase speed,
```cpp
class Solution {
public:
    int countNegatives(vector<vector<int>>& grid) {
        const int m=grid.size(),n=grid[0].size();
        int k=n,r=0;
        for(int i=0;i<m;++i){
            k=upper_bound(grid[i].begin(),grid[i].begin()+k,0,greater())-grid[i].begin();
            if(k==0){r+=(m-i)*n;return r;}
            r+=n-k;
        }
        return r;
    }
};
```
But now it takes O(m*log(n)), which may be slower than O(m+n).
Also, if $m>n$, it would be better to do binary search in the columns, in which case we must write our own upper bound function. Also, because we'll access elements in different vectors, the overhead may be larger.

Let's consider the complexity. For m*log(n) to be less than m+n, we need $n>m*(log(n)-1)$. Considering that $n<100$, it can be satisfied if $n>6m$. But also considering the overhead and the constant coeffecient, we may try to use a larger criterion, something like this:
```cpp
int upperB(vector<vector<int>>& grid,int i,int k){
    if(grid[0][i]<0) return 0;
    int l=0,r=k;
    while(r-l>1){
        k=(l+r)/2;
        if(grid[k][i]>=0) l=k;
        else r=k;
    }
    return r;
}
class Solution {
public:
    int countNegatives(vector<vector<int>>& grid) {
        const int m=grid.size(),n=grid[0].size();
        if(n>8*m){
            int k=n,r=0;
            for(int i=0;i<m;++i){
                k=upper_bound(grid[i].begin(),grid[i].begin()+k,0,greater())-grid

[i].begin();
                if(k==0){r+=(m-i)*n;return r;}
                r+=n-k;
            }
            return r;
        }
        else if(m>64*n){
            int k=m,r=0;
            for(int i=0;i<n;++i){
                k=upperB(grid,i,k);
                if(k==0){r+=(n-i)*m;return r;}
                r+=m-k;
            }
            return r;
        }
        else{
            int k=n-1,r=0;
            for(int i=0;i<m;++i){
                while(k>=0&&grid[i][k]<0) k--;
                if(k<0){r+=(m-i)*n;return r;}
                r+=n-1-k;
            }
            return r;
        }
    }
};
```
But considering that the size of the input is quite small, it won't show much difference in this case, unless there're a lot of test cases where the input has one dimension much larger than the other.

Then I thought, instead of doing binary search vertically or horizontally, what if we do it... diagonally?
The good thing about it is, we can be sure that we can get rid of at least half of the inputs after each step, because if we choose any point on the diagonal and cut the rectangle into four smaller ones, the total area of the top left and the bottom right must be at least half of the area of the original rectangle.

The following code implements this algorithm:
```cpp
int negCount(int tl_r,int tl_c,int br_r,int br_c,vector<vector<int>>& grid){
    //cout<<"tl is {"<<tl_r<<","<<tl_c<<"}, br is {"<<br_r<<","<<br_c<<"},";
    if(tl_r==br_r||tl_c==br_c) return 0;
    if(br_r-tl_r==1){
        if(grid[tl_r][tl_c]<0){
            //cout<<"\nadded col count="<<br_c-c1<<endl;
            return br_c-tl_c;
        }
        int c1=tl_c,c2=br_c,cm;
        while(c2-c1>1){
            cm=(c1+c2)/2;
            if(grid[tl_r][cm]>=0)c1=cm;
            else c2=cm;
        }
        //cout<<"\nadded col count="<<br_c-c2<<endl;
        return br_c-c2;
    }
    if(br_c-tl_c==1){
        if(grid[tl_r][tl_c]<0){
            //cout<<"\nadded row count="<<br_r-r1<<endl;
            return br_r-tl_r;
        }
        int r1=tl_r,r2=br_r,rm;
        while(r2-r1>1){
            rm=(r1+r2)/2;
            if(grid[rm][tl_c]>=0)r1=rm;
            else r2=rm;
        }
        //cout<<"\nadded row count="<<br_r-r2<<endl;
        return br_r-r2;
    }
    int r1=tl_r,r2=br_r,c1=tl_c,c2=br_c,rm,cm;
    if(grid[r1][c1]<0){r2=r1;c2=c1;}
    else{
        while(r2-r1>1&&c2-c1>1){
            rm=(r1+r2)/2;
            cm=(c1+c2)/2;
            if(grid[rm][cm]>=0){r1=rm;c1=cm;}
            else{r2=rm;c2=cm;}
        }
        while(r2-r1>1){
            rm=(r1+r2)/2;
            if(grid[rm][cm]>=0)r1=rm;
            else r2=rm;
        }
        while(c2-c1>1){
            cm=(c1+c2)/2;
            if(grid[rm][cm]>=0)c1=cm;
            else c2=cm;
        }
    }
    //cout<<"r1 is "<<r1<<",c1 is "<<c1<<"\n";
    //cout<<"r2 is "<<r2<<",c2 is "<<c2<<"\nadded count="<<(br_r-r2)*(br_c-c2)<<endl;
    return (br_r-r2)*(br_c-c2)+negCount(tl_r,c2,r2,br_c,grid)+negCount(r2,tl_c,br_r,c2,grid);
}
class Solution {
public:
    int countNegatives(vector<vector<int>>& grid) {
        const int m=grid.size(),n=grid[0].size();
        return negCount(0,0,m,n,grid);
    }
};
```
What's the complexity of this algorithm? Let's take a look. Without loss of generality, let's assume $m>n$. The first function call will take log(n) first, when c1 and c2 meet, then it will take log(m/n) to let r1 and r2 meet, so it will take O(log(m)). Next, the complication starts. The time it takes depends on the shapes of the two rectangles that result from the first call, so it's hard to tell. If the index of the point where the positive and negative meet on the diagonal is (m1,n1), then the complexity of the next function call is log(max(m-m1,n1))+log(max(m1,n-n1)). The maximum it can get is log(m)+log(n), which is when the point is at the lower right corner - but that also means, we only has two rows left and we just need to do two more 1D upper-bound search. Considering this, the worse case is probably when the point is at the center. Let's find out the complexity in this scenario.

Each time the rectangle will be cut into 4 equal smaller ones, two of them will be the input of the next call. The complexity is log(m)+2log(m/2)+4log(m/4)+8log(m/8)+..., but it doesn't go down to 2^log(m)*log(m/m), instead it stops at 2^log(n)*log(m/(2^n)) because at that point we only have 1D arrays to search.

Summing up the terms, I got the complexity O(f(m,n)) where $f(m,n)=2n\log(m/n)+2n-\log m -2$ and $m\geq n$. When $m>>n$, it's approximately O(nlog(m/n)), and when $m=n$, it becomes O(m+n). I think it's a pretty good algorithm, although it's hard to tell with such small input sizes.

A variation of this algorithm is, instead of finding the exact cell that separates the positive and negative values, we can stop at the separation line of the smaller dimension, so it takes O(log(min(m,n))) instead of max(m,n). Then the two rectangles that would be the input of the next call will have an overlap in the larger dimension of size around m/n. The complexity analysis of this one would be even more complicated. Anyway, I wonder how the performance would be on a larger scale.

Monday, June 5, 2023

Leetcode M1091 Shortest Path in Binary Matrix, M2101 Detonate the Maximum Bombs, E1232 Check If It Is a Straight Line

M1091 description:
  Given an n x n binary matrix grid, return the length of the shortest clear path in the matrix. If there is no clear path, return -1.
  A clear path in a binary matrix is a path from the top-left cell (i.e., (0, 0)) to the bottom-right cell (i.e., (n - 1, n - 1)) such that:
  All the visited cells of the path are 0.
  All the adjacent cells of the path are 8-directionally connected (i.e., they are different and they share an edge or a corner).
  The length of a clear path is the number of visited cells of this path.
Constraints:
  n == grid.length
  n == grid[i].length
  1 <= n <= 100
  grid[i][j] is 0 or 1
Apparently this is a bfs problem. The concept is simple, but there're a few places we can optimize. This code breaks both the speed and memory record at the same time:
```cpp
#pragma GCC optimize("Ofast","inline","-ffast-math")
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
static const int SZMAX=10000;
int sz;
struct Co{
    int8_t x;
    int8_t y;
    Co operator+(const Co& other){
        return Co{int8_t(x+other.x),int8_t(y+other.y)};
    }
};
Co q[SZMAX];
Co* q2;
const Co dir[8]={{0,1},{0,-1},{1,0},{-1,0},{1,1},{1,-1},{-1,1},{-1,-1}};
class Solution {
    int checkCo(Co co,const vector<vector<int>>& grid){
        if(co.x<0||co.x>=sz||co.y<0||co.y>=sz) return 1;
        else return grid[co.x][co.y];
    }
    int q1l,q1r,q2l,q2r;
    int l1,l2;
    int processQ(vector<vector<int>>& grid,int which){
        if(which==1){
            int qsz=q1r-q1l;
            while(qsz--){
                auto& cur=q[q1l++];
                for(int i=0;i<8;++i){
                    auto nxt=cur+dir[i];
                    auto v=checkCo(nxt,grid);
                    if(v>1) return -l1+v-2;
                    if(v==0){q[q1r++]=nxt;grid[nxt.x][nxt.y]=l1;}
                }
            }
            l1--;
        }
        else{
            int qsz=q2r-q2l;
            while(qsz--){
                auto& cur=q2[q2r--];
                for(int i=0;i<8;++i){
                    auto nxt=cur+dir[i];
                    auto v=checkCo(nxt,grid);
                    if(v<0) return l2-2-v;
                    if(v==0){q2[q2l--]=nxt;grid[nxt.x][nxt.y]=l2;}
                }
            }
            l2++;
        }
        return 0;
    }
public:
    int shortestPathBinaryMatrix(vector<vector<int>>& grid) {
        static auto _=[](){ios_base::sync_with_stdio(false);cin.tie(0);cout.tie(0);return 0;}();
        sz=grid.size();
        if(grid[0][0]||grid[sz-1][sz-1]) return -1;
        if(sz==1) return 1;
        grid[0][0]=-1;grid[sz-1][sz-1]=2;
        l1=-2;l2=3;
        q1l=0;q1r=1;q2l=-1;q2r=0;
        q2=&q[0]+SZMAX-1;
        q[0]=Co{0,0};*q2=Co{int8_t(sz-1),int8_t(sz-1)};
        while(q1l<q1r&&q2l<q2r){
            int r=(q1r-q1l)<=(q2r-q2l)?processQ(grid,1):processQ(grid,2);
            if(r) return r;
        }
        return -1;
    }
};
```
It combines a few tricks to increase speed. One of them is the return value for out of range inputs to "checkCo()" matches the value of blocked cells, so it only takes one condition check instead of two, as I mentioned before. Another is, we can do bfs on either side, but if we start from both sides and proceed with the one that has less elements in the queue, it may save space and time. The extreme case would be a tree, starting from a leaf would be much more efficient than starting from the root. We can use two queues for that, but considering that the maximum total elements in the two queues is known, we can fit both of them in one array, where one queue moves in increasing address direction while the other moves in the opposite direction.
---
M2101 description:
  You are given a list of bombs. The range of a bomb is defined as the area where its effect can be felt. This area is in the shape of a circle with the center as the location of the bomb.
  The bombs are represented by a 0-indexed 2D integer array bombs where bombs[i] = [xi, yi, ri]. xi and yi denote the X-coordinate and Y-coordinate of the location of the ith bomb, whereas ri denotes the radius of its range.
  You may choose to detonate a single bomb. When a bomb is detonated, it will detonate all bombs that lie in its range. These bombs will further detonate the bombs that lie in their ranges.
  Given the list of bombs, return the maximum number of bombs that can be detonated if you are allowed to detonate only one bomb.
Constraints:
  1 <= bombs.length <= 100
  bombs[i].length == 3
  $1 <= xi, yi, ri <= 10^5$
So we are given a directed graph, and our task is to find the largest number of vertices that can be reached from a single vertex. At first I thought there must be an $O(|V|+|E|)$ algorithm, because I thought the difficulty is dealing with cycles, and there are algorithms that can find strongly connected components in $\Theta(|V|+|E|)$. After that we have a DAG, which should be easier to deal with, right? But the answer is no, which is pointed out in this paper which is posted in the discussion. A DAG doesn't make it any easier. The only possible direction one may take to lower the complexity is probably to utilize the property of circles on 2D Euclidean plane, which is not an arbitrary directed graph. But I haven't found any exploit of that. So we may just simply do a dfs on every vertex to find the maximum.
```cpp
#pragma GCC optimize("Ofast","inline","-ffast-math")
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
class Solution {
    int8_t adj[100][100];
    int8_t by[100];
    int8_t dfs(int8_t i,int8_t boom){
        int8_t r=1;
        by[i]=boom;
        for(int j=0;j<adj[i][99];j++){
            if(by[adj[i][j]]!=boom) r+=dfs(adj[i][j],boom);
        }
        return r;
    }
public:
    int maximumDetonation(const vector<vector<int>>& bombs) {
        static auto _=[](){ios_base::sync_with_stdio(false);cin.tie(0);cout.tie(0);return 0;}();
        const int8_t sz=bombs.size();
        for(int8_t i=1;i<sz;++i){
            long long x1=bombs[i][0],y1=bombs[i][1],r1=bombs[i][2];
            r1*=r1;
            for(int8_t j=0;j<i;++j){
                long long dx=x1-bombs[j][0],dy=y1-bombs[j][1];
                long long dis=dx*dx+dy*dy;
                long long r2=bombs[j][2];
                if(dis<=r1) {auto& _=adj[i][99];adj[i][_++]=j;}
                if(dis<=r2*r2) {auto& _=adj[j][99];adj[j][_++]=i;}
            }
        }
        int8_t m=0;
        memset(by,-1,sz);
        for(int8_t i=0;i<sz;++i){
            if(by[i]==-1) m=max(m,dfs(i,i));
        }
        return m;
    }
};
```
Here I used a 2d array instead of a 2d vector to store the adjacent list, just to make it a little faster. The last element in each row stores the size. The "by[]" array stores which bomb the current one is detonated by.
---
E1232 description:
  You are given an array coordinates, coordinates[i] = [x, y], where [x, y] represents the coordinate of a point. Check if these points make a straight line in the XY plane.
Constraints:
  2 <= coordinates.length <= 1000
  coordinates[i].length == 2
  $-10^4 <= coordinates[i][0], coordinates[i][1] <= 10^4$
  coordinates contains no duplicate point.
The problem is indeed very easy, but I found quite a few bad codes on the submission page. Let's take a look:
Sample 0 ms submission
```cpp
class Solution {
public:
    bool checkStraightLine(vector<vector<int>>& coordinates) {
        float m,c;
        int v=0;
        for(int i =0;i<coordinates.size();i++){  
            int k=coordinates[0][0];
            int t=coordinates[i][0];
            if(k==t)
              v++;
        }
        if(v==coordinates.size())
        return true;
        m=float(coordinates[0][1]-coordinates[1][1])/float(coordinates[0][0]-coordinates[1][0]);      
        for(int i=1;i<coordinates.size()-1;i++){                
            c=float(coordinates[i][1]-coordinates[i+1][1])/float(coordinates[i][0]-coordinates[i+1][0]);
            cout<<c<<" ";
            if(m==c){                    
             continue;
            }
            else{                
             return false;
            }
        }    
        return true;
     
    }
};
```
First, multiplication is better than division, and addition or subtraction is better than multiplication in most scenarios, as long as it doesn't take too much more operations. So if it can be done with multiplications, we should do that, because 1. it avoids converting integers to floating point numbers, 2. it avoids dividing zero problems, 3. it's faster, even though we may need to convert the integer type to a larger one.
Second, using "==" to check equality is a terrible idea due to roundoff errors. It may not present in this example, since it passed all tests, probably because the computation is just a single division. But if there's one more step, e.g. multiplication or addition/subtraction, it may result in errors. Here's an example, try this line "cout<<((double)38/100+(double)65/200);", what do you get? 0.705. Now try this line "cout<<((double)38/100+(double)65/200==0.705);". What do you get? 0! It's false!
Just how bad is it? Let's test it with a simple program:
```cpp
  int cnt=0;
  for(int i=0;i<100;++i){
      for(int j=0;j<200;++j){
          if((double)i/100+(double)j/200!=double(10*i+5*j)/1000) cnt++;
      }
  }
  cout<<cnt;
```
I got 4239 on multiple platforms. So out of 20000 operations, 4239 of the answers are wrong, more than 20%.
So how do you check if two floating point numbers are equal? You may either 1. Express them as simplest form of fractions, but it's only applicable to rational numbers and could be complicated, or 2. Use a criterion $\epsilon$, if the difference between two floating point numbers is less than $\epsilon$, they are considered equal. But numeric errors can still cause false positive or false negative in certain cases. So this is another reason of the first rule of thumb: use integer multiplications instead of floating point divisions if possible.

The third one is a small thing, instead of checking each pair of the last two points, it would be faster to use the first point and the last point to find the slope, because the first point doesn't change, and it's one less addition operation because when we read the $i$th index of an array we add the address and $i$, it would be faster to just read a number that has a fixed address.

Putting all these together, this is the code that I submitted:
```cpp
class Solution {
public:
    bool checkStraightLine(const vector<vector<int>>& coordinates) {
        ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
        const int sz=coordinates.size();
        const long long x0=coordinates[0][0],y0=coordinates[0][1],dy01=coordinates[1][1]-y0,dx01=coordinates[1][0]-x0;
        for(int i=2;i<sz;++i){
            if((coordinates[i][1]-y0)*dx01!=dy01*(coordinates[i][0]-x0)) return false;
        }
        return true;
    }
};
```
It's probably the most reliable and optimized way to solve this problem. If the last constraint, "coordinates contains no duplicate point.", is loosened, the only extra step we need to take is to find a point that has different coordinate than the first one, then we start from there.

A Recursive Geometric Proof for 1962 IMO Problem 6

Youtube recommended a video to me, where the following problem is solved: Consider an isosceles triangle. Let $R$ be the radius of its circ...