Sunday, May 28, 2023

Tuscaridium cygneum and the maximum 8 vertex polyhedron -Follow up

Previously...

So I wrote a short program to calculate the shapes that correspond to those problems that were mentioned in the previous post. Actually, the first one that I ran did not correspond to any of them. I was thinking, maybe the mechanism is on the surface, so instead of using a function of the distances between the points, maybe I should use a function that corresponds to the distance on the surface - or equivalently, the angle between them? Suppose that the capsules keep releasing a certain chemical on the surface that repels other capsules, so that they will find the spot that has the least concentration of the chemical, and the chemical degrades in time. The concentration from each capsule would be inversely proportional to the distance on the surface. If I assume the "force" is proportional to the concentration, the energy would be log of the angle.

I thought the solution would be similar to the cases with the Euclidean distance - but it's totally different. I was surprised by the result!
function=$-\sum\ln\delta_{ij}$, 8 vertices:
 0.0      , 0.0     , 1.0     ,
 0.646104 , 0.621228,-0.443424,
-0.614415 ,-0.575319, 0.539911,
-0.0590085,-0.946303,-0.317851,
 0.0283015, 0.968172, 0.248679,
 0.923682 ,-0.269269, 0.272589,
-0.951041 , 0.240249,-0.194424,
 0.0052322, 0.000739,-0.999986,
It looks like two perpendicular rectangles, nothing like the solutions to the other problems.
The difference actually starts with $n=4$. The solutions to all the Euclidean distance problems are a regular tetrahedron, which is not the solution to this problem - it's a square! Indeed, the product of the vertex-center-vertex angles that a regular tetrahedron makes is about 48.6478, and the one for a square is $\left(\frac{\pi}{4}\right)^4\pi^2\approx 60.0868$. This is probably not the right model for this problem, I suppose.
The result of Thomson problem - i.e., vertices are equal charges and the force is inverse-square to the distance, is
function=$\sum 1/r_{ij}$ (Thomson problem), 8 vertices:
 0.0     , 0.0     , 1.0     ,
 0.67198 ,-0.675175,-0.304273,
 0.90106 , 0.297473, 0.315596,
 0.341148, 0.451193,-0.824647,
-0.970893, 0.172403, 0.166266,
-0.292125,-0.901176, 0.320226,
-0.506278,-0.341556,-0.791847,
-0.145709, 0.982264, 0.118012,
Which is indeed a square antiprism.

I also tried the Whyte's problem (I didn't find any reference about this problem, except that the name was mentioned on wikipedia), where the force is inversly proportional to the distance. The result is
function=$\sum -\ln r_{ij}$ (Whyte's problem), 8 vertices:
 0.0     , 0.0     , 1.0     ,
-0.96761 , 0.171449, 0.185302,
-0.082313, 0.97464 , 0.208092,
 0.482057, 0.538236,-0.69132 ,
 0.50466 ,-0.715044,-0.483767,
 0.933948, 0.019279, 0.356889,
-0.530202,-0.085225,-0.843577,
-0.337356,-0.901919, 0.269692,
It looks very similar to the max volume shape, but doesn't seem to be exactly the same.
And when I compared it with the shape of Tuscaridium cygneum, it seemed closer to the shape than the maximum volume shape.
I wonder if this is the actual shape, and how the organism builds it!

Friday, May 26, 2023

Tuscaridium cygneum and the maximum 8 vertex polyhedron -A curious shape

I saw this thing on a youtube video, and I thought I recognized the shape. Here is my original comment:
Wait... This shape... I think I've seen this shape, the interesting symmetry. The voice in the video says that it has "7 arms", but that's not true, there're actually 8 arms. And it looks surprisingly symmetric. If we take each "arm" as a vertex, It has 8 vertices and all of its faces are triangles. At first I thought of a regular octahedron whose faces are all equilateral triangles, but it has 6 vertices. It's definitely not a cube either. I'm pretty sure I've seen this shape before in my random research on polyhedrons, so I googled "eight vertices polyhedron on a sphere maximum". And there it is: https://mathoverflow.net/questions/73899/optimal-8-vertex-isoperimetric-polyhedron https://math.stackexchange.com/questions/979660/largest-n-vertex-polyhedron-that-fits-into-a-unit-sphere I believe this is the same shape. I wonder how this naturally happens. Maybe this shape is also the solution to "if I put 8 equal charges on a sphere and they can move freely on the sphere, what shape do they make in equilibrium?". Maybe that's exactly how it's formed: 8 "bud"s were released to the surface, and they repelled each other... eventually they ended up in this shape. We should give this polyhedron a name. How about "Tuscaridron"? (What does "Tuscaridium" mean anyway?) This is such an astonishing finding, I'm deeply intrigued by the beauty of the nature...
And it got deleted immediately. Probably because it was long and had links. But still, youtube's spam detection is crap. It happened quite a few times before, too, so I had the habbit to copy my comment somewhere else before posting them. After all, if you want to talk about something serious, youtube comment is not the best place to post it anyway.
I tried to see if the shapes actually match. The data was from Spherical Codes, graphed on GeoGebra.
Light color means the vertex is in the front, dark color means it's in the back.
I also found more images on the internet (Credit to Steven Haddock and MBARI), and I tried to match those, too:
They seem to line up in some degree, but not perfectly.

I also noticed that there could be more than 8 capsules, as the following picture shows:
I'm not sure what that shape is. From what I've read, it seems that the number of capsules is often a multiple of 8, so there're probably 16 in this one. Also, I don't think the shape of the bottom one matches the 8-vertex maximum volume polyhedron, even though there're exactly 8 capsules. I wonder what the shape is - the 4 on the bottom seem to form a square.

On the other hand, I also thought about the other problem: If I put $n$ point charges on a sphere and they can move freely on the sphere, what shape do they make in equilibrium? While I was writing a program to find out, I took a look at the links again and noticed that in another question they mentioned the Thomson problem, which is exactly what I thought about. Apparently the solution for 8 vertices is a square antiprism - which doesn't seem to match any of the shapes above (except that there seem to be a square at the bottom in the last picture above). A closely related problem, Tammes problem, has exactly the same solution to 8 vertices. The proof can be found here.

I wonder if the shape is indeed the one that maximizes the volume, and since the shape is not a square antiprism, I wonder what mechanism drives it to become this shape.

Leetcode M1140 Stone Game II - Push it to the limit

M1140 description:
  Alice and Bob continue their games with piles of stones.  There are a number of piles arranged in a row, and each pile has a positive integer number of stones piles[i].  The objective of the game is to end with the most stones. 
  Alice and Bob take turns, with Alice starting first.  Initially, M = 1.
  On each player's turn, that player can take all the stones in the first X remaining piles, where 1 <= X <= 2M.  Then, we set M = max(M, X).
  The game continues until all the stones have been taken.
  Assuming Alice and Bob play optimally, return the maximum number of stones Alice can get.

Example 1:
  Input: piles = [2,7,9,4,4]
  Output: 10
  Explanation:  If Alice takes one pile at the beginning, Bob takes two piles, then Alice takes 2 piles again. Alice can get 2 + 4 + 4 = 10 piles in total. If Alice takes two piles at the beginning, then Bob can take all three piles left. In this case, Alice get 2 + 7 = 9 piles in total. So we return 10 since it's larger. 
Example 2:
  Input: piles = [1,2,3,4,5,100]
  Output: 104

Constraints:
  $1 <= piles.length <= 100$
  $1 <= piles[i] <= 10^4$
The problem is not trivial, so we most likely need to iterate through all possibilities. But there are a lot of sub-problems that we will encounter over and over again, so we should record the intermediate results. We can keep a table as the maximum points that a player can get at an index $i$ and given the current $M$. Considering the two approaches in DP, the top-down recursive approach is probably easier to write, since it's not easy to tell what the range of $M$ is if we use the bottom-up approach. But bottom-up is usually a little faster, so let's take the challenge. 
```cpp
#pragma GCC optimize("Ofast","inline","-ffast-math")
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
class Solution {
    int tab[100][49];
    int v[100];
    void generate(){
        memset(v,0,sizeof(v));
        v[0]=1;
        for(int i=0;i<100;++i){
            int x=1;
            for(;x<=v[i];++x){
                if(i+x>=100) break;
                v[i+x]=max(v[i+x],v[i]);
            }
            for(;x<=2*v[i];++x){
                if(i+x>=100) break;
                v[i+x]=max(v[i+x],x);
            }
        }
    }
public:
    Solution(){
        generate();
        for(int i=0;i<100;++i){
            tab[i][48]=v[i];
        }
    }
    int stoneGameII(const vector<int>& piles) {
        static auto _=[](){ios_base::sync_with_stdio(false);cin.tie(0);cout.tie(0);return 0;}();
        const int sz=piles.size();
        int sum=0;
        for(int i=sz-1;i>=0;--i){
            sum+=piles[i];
            for(int j=0;j<tab[i][48];++j){
                int canTake=2*(j+1);
                if(canTake>=sz-i){
                    tab[i][j]=sum; continue;
                }
                int minE=INT_MAX;
                for(int x=1;x<=j+1;++x){
                    minE=min(minE,tab[i+x][j]);
                }
                for(int x=j+2;x<=canTake;++x){
                    if(tab[i+x][48]<x) continue;
                    minE=min(minE,tab[i+x][x-1]);
                }
                tab[i][j]=sum-minE;
            }
        }
        return tab[0][0];
    }
};
```
The code above has Runtime 3 ms Beats 100% and Memory 8.3 MB Beats 91.12%, and the "Sample 5 ms submission" now takes 16ms. It breaks the record and I think it is pretty much optimized.

You may wonder where does the magic number "49" come from. Well, I ran the "generate" function and got the ranges of $M$ at each index from 0 to 99, and the largest is 48. Since I use index $j$ to represent $M-1$, I use the index 48 to store the range of $M-1$. If we want to have a more flexible way to write it, we could write
```cpp
static const int maxSize=100;
static const int mRange=(maxSize+1)/2;
//...
int tab[maxSize][mRange];
```
because the range of $M$ satisfies $v[n]<=(n+2)/2$. The proof is left as an exercise.

The code above can be easily modified to work as a C function,
```c
#define min(a,b) ((a)<(b)?(a):(b))
#define max(a,b) ((a)>(b)?(a):(b))
int tab[100][49];
int v[100];
bool first=true;
void generate(){
    memset(v,0,sizeof(v));
    v[0]=1;
    for(int i=0;i<100;++i){
        int x=1;
        for(;x<=v[i];++x){
            if(i+x>=100) break;
            v[i+x]=max(v[i+x],v[i]);
        }
        for(;x<=2*v[i];++x){
            if(i+x>=100) break;
            v[i+x]=max(v[i+x],x);
        }
    }
}
int stoneGameII(int* piles,const int sz){
    if(first){
        generate();
        for(int i=0;i<100;++i){
            tab[i][48]=v[i];
        }
        first=false;
    }
    int sum=0;
    for(int i=sz-1;i>=0;--i){
        sum+=piles[i];
        for(int j=0;j<tab[i][48];++j){
            int canTake=2*(j+1);
            if(canTake>=sz-i){
                tab[i][j]=sum; continue;
            }
            int minE=INT_MAX;
            for(int x=1;x<=j+1;++x){
                minE=min(minE,tab[i+x][j]);
            }
            for(int x=j+2;x<=canTake;++x){
                if(tab[i+x][48]<x) continue;
                minE=min(minE,tab[i+x][x-1]);
            }
            tab[i][j]=sum-minE;
        }
    }
    return tab[0][0];
}
```
Runtime 0 ms Beats 100%, Memory 5.9 MB Beats 85.71%.

Tuesday, May 23, 2023

Leetcode M934 Shortest Bridge Optimization

M934 Shortest Bridge description:
  You are given an n x n binary matrix grid where 1 represents land and 0 represents water.
  An island is a 4-directionally connected group of 1's not connected to any other 1's. There are exactly two islands in grid.
  You may change 0's to 1's to connect the two islands to form one island.
  Return the smallest number of 0's you must flip to connect the two islands.
Constraints:
  $n == grid.length == grid[i].length$
  $2 <= n <= 100$
  $grid[i][j]$ is either 0 or 1.
  There are exactly two islands in grid.
The first step is, of course, finding a '1'. Then the typical approach is, to record the first island of '1's, then do a bfs from there and look for another '1'.

The concept is actually quite simple, but there're a few things that we can modify to increase efficiency.
```cpp
#pragma GCC optimize("Ofast","inline","-ffast-math")
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
class Solution {
    int sz;
    void findB(const vector<vector<int>>& grid,int& x,int& y){
        for(x=0;x<sz;++x){
            for(y=0;y<sz;++y){
                if(grid[x][y]==1) return;
            }
        }
    }
    int checkCo(const vector<vector<int>>& grid,int x,int y){
        if(x<0||x>=sz||y<0||y>=sz) return 3;
        else return grid[x][y];
    }
    const array<array<int,2>,4> directions={ { {-1,0},{0,-1},{0,1},{1,0} } };
    void dfs(vector<vector<int>>& grid,int x,int y){
        int val=checkCo(grid,x,y);
        if(val>=2) return;
        grid[x][y]=2;
        if(val==0){q[qM++]=128*x+y;return;}
        for(auto &dir:directions) {
            dfs(grid,x+dir[0],y+dir[1]);
        };
    }
    int q[10000];
    int qM=0;
    int bfs(vector<vector<int>>& grid) {
        int l=0;
        int level=1;
        while(qM>l){
            int lay=qM-l;
            while(lay--){
                int p=q[l++];
                for(auto &dir:directions) {
                    int t=p+128*dir[0]+dir[1];
                    int x=t/128;
                    int y=t%128;
                    int val=checkCo(grid,x,y);
                    if(val>=2) continue;
                    if(val==1) {qM=0;return level;}
                    grid[x][y]=2;
                    q[qM++]=t;
                }
            }
            level++;
        }
        qM=0;
        return level;
    }
    // void printG(const vector<vector<int>>& grid){
    //     for(auto& v:grid){
    //         for(auto i:v) cout<<i<<", ";
    //         cout<<endl;
    //     }
    // }
public:
    int shortestBridge(vector<vector<int>>& grid) {
    static auto _=[](){ios_base::sync_with_stdio(false);cin.tie(0);cout.tie(0);return 0;}();
        sz=grid.size();
        int x,y;
        findB(grid,x,y);
        dfs(grid,x,y);
        return bfs(grid);
    }
};
```
To save space and time, we can use the original grid to record the visited cells by changing the value to 2. It's easy to modify the program to record the information on a hashset if we are not allowed to modify the original grid.
The first small trick is, instead of returning -1 from "checkCo()" when the coordinates are out of range, it returns 3. This way, a single check is enough to determine if we need to proceed, instead of two.

Another one is, we can add the first layer of '0's outside the island into the queue while we do the dfs. Since we change its value to 2 right away, there will be no duplicates. So there is no need to add the entire island to the bfs queue.

Also, since the grid is at most 100*100, I decided to use 128*x+y to store the two coordinates. It can't deal with out of range cases, so I must make sure that only use this form when the coordinates are valid. Not sure if it's faster than pair<int,int>, though.

The shortest time I got from the code above is 23ms which breaks the record. I think it's pretty much optimized.

The story doesn't end here. I was thinking about if there is a more efficient algorithm for the first part - to find the surface of the island. I thought, we just needed to find the surface of the island, not the entire island. And as long as the island has a "normal" shape - i.e, not hyperbolic (which was a big part of my research) or fractal, the number of elements on the perimeter should be much less than the area. So, if I just follow the boundary of the island, it would be much more efficient when the island is large.

The following code implements this algorithm:
```cpp
class Solution {
    void printG(const vector<vector<int>>& grid){
        for(auto& v:grid){
            for(auto i:v) cout<<i<<", ";
            cout<<endl;
        }
        cout<<"----------------------"<<endl;
    }
    int sz;
    struct Edge{
        int d;//0=up,1=right,2=down,3=left
        int x;//above(h) or to the left of(v), range [0,sz)
        int y;//range [0,sz)
        bool operator!=(const Edge& a){
            return this->d!=a.d||this->x!=a.x||this->y!=a.y;
        }
    };
    void findB(const vector<vector<int>>& grid,int& x,int& y){
        for(x=0;x<sz;++x) for(y=0;y<sz;++y) if(grid[x][y]==1) return;
    }
    int q[10000];
    int qM=0;
    int checkCo(const vector<vector<int>>& grid,int x,int y){
        if(x<0||x>=sz||y<0||y>=sz) return 3;
        else return grid[x][y];
    }
    void up(Edge& edg,vector<vector<int>>& grid){
        int &x=edg.x,&y=edg.y,&d=edg.d;
        if(x==0){++d; return;}
        if(grid[x-1][y]==0){++d;grid[x][y]=2;q[qM++]=(x-1)*128+y; return;}
        if(y==0){--x;return;}
        if(grid[x-1][y-1]==0){--x;grid[x][y]=2;q[qM++]=x*128+y-1;return;}
        d+=3;
        x--;
        y--;
        grid[x][y]=2;
    }
    void right(Edge& edg,vector<vector<int>>& grid){
        int &x=edg.x,&y=edg.y,&d=edg.d;
        if(y==sz-1){++d; return;}
        if(grid[x][y+1]==0){++d;grid[x][y]=2;q[qM++]=x*128+y+1; return;}
        if(x==0){++y;return;}
        if(grid[x-1][y+1]==0){++y;grid[x][y]=2;q[qM++]=(x-1)*128+y;return;}
        d--;
        x--;
        y++;
        grid[x][y]=2;
    }
    void down(Edge& edg,vector<vector<int>>& grid){
        int &x=edg.x,&y=edg.y,&d=edg.d;
        if(x==sz-1){++d; return;}
        if(grid[x+1][y]==0){++d;grid[x][y]=2;q[qM++]=(x+1)*128+y; return;}
        if(y==sz-1){++x;return;}
        if(grid[x+1][y+1]==0){++x;grid[x][y]=2;q[qM++]=x*128+y+1;return;}
        d--;
        x++;
        y++;
        grid[x][y]=2;
    }
    void left(Edge& edg,vector<vector<int>>& grid){
        int &x=edg.x,&y=edg.y,&d=edg.d;
        if(y==0){d=0; return;}
        if(grid[x][y-1]==0){d=0;grid[x][y]=2;q[qM++]=x*128+y-1; return;}
        if(x==sz-1){--y;return;}
        if(grid[x+1][y-1]==0){--y;grid[x][y]=2;q[qM++]=(x+1)*128+y;return;}
        d--;
        x++;
        y--;
        grid[x][y]=2;
    }
public:
    int shortestBridge(vector<vector<int>>& grid) {
        sz=grid.size();
        cout<<"before:\n";
        printG(grid);
        Edge edg;
        edg.d=0;
        findB(grid,edg.x,edg.y);//pointing up, '1' is on the right, so island will always be on the right
        q[qM++]=edg.x*128+edg.y;
        Edge lastE=edg;
        do{
            switch(edg.d){
                case 0:up(edg,grid); break;
                case 1:right(edg,grid); break;
                case 2:down(edg,grid); break;
                case 3:left(edg,grid); break;
            }
            //cout<<"d is "<<edg.d<<",x is "<<edg.x<<", y is "<<edg.y<<endl;
        }while(edg!=lastE);
        cout<<"after:\n";
        printG(grid);
        
        
        
        return 0;
    }
};
```
Unfortunately, it doesn't work in certain cases. I noticed that in the third test case, 
```
1, 1, 1, 1, 1, 
1, 0, 0, 0, 1, 
1, 0, 1, 0, 1, 
1, 0, 0, 0, 1, 
1, 1, 1, 1, 1, 
```
The edge we find is on the left of the first '1', and after we go around the entire island, we don't find anything to continue to work with. So, the algorithm doesn't work when the second island is inside the first island.

In this particular case, we may cross the island and look for a '0' and start from there. But it doesn't work in general. The island can be in any strange shape, and the algorithm doesn't work if it's not singly connected.
![image](https://yjian012.github.io/Yi-blog/img/2023-05-23-Leetcode_M934-Island.png)
It works nicely when the islands are singly connected, though. E.g,
```
before:
0, 1, 
1, 0, 
----------------------
after:
0, 2, 
1, 0, 
----------------------
before:
0, 1, 0, 
0, 0, 0, 
0, 0, 1, 
----------------------
after:
0, 2, 0, 
0, 0, 0, 
0, 0, 1, 
----------------------
before:
1, 1, 1, 1, 1, 
1, 0, 0, 0, 1, 
1, 0, 1, 0, 0, 
1, 0, 0, 0, 1, 
1, 1, 1, 1, 1, 
----------------------
after:
1, 2, 2, 2, 1, 
2, 0, 0, 0, 2, 
2, 0, 1, 0, 0, 
2, 0, 0, 0, 2, 
1, 2, 2, 2, 1, 
----------------------
```
It may be useful somewhere else.

Leetcode M347 Top K Frequent Elements Optimization

M347. Top K Frequent Elements description:
  Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Constraints:
  $1 <= nums.length <= 10^5$
  $-10^4 <= nums[i] <= 10^4$
  $k$ is in the range [1, the number of unique elements in the array].
  It is guaranteed that the answer is unique.
(Update: Apparently using "nth_element" would be the most efficient algorithm, which is O(n), better than the algorithms below.)

A very easy problem for a medium. But I did a few modifications to the typical approach. The code below is the "fastest" 0ms submission, which takes 12~20ms currently.
```cpp
struct myComp {
    constexpr bool operator()(
        pair<int, int> const& a,
        pair<int, int> const& b)
        const noexcept
    {
        return a.second < b.second;
    }
};
class Solution {
public:   
    vector<int> topKFrequent(vector<int>& nums, int k) {
        unordered_map<int,int> mp;
        for(int i:nums) mp[i]++;
        
        priority_queue<pair<int,int>, vector<pair<int,int>>, myComp> pq;
        for(auto i:mp) pq.push(i);
        
        vector<int> ans;
        for(int i = 0; i < k; i++) {
            auto top = pq.top();
            ans.push_back(top.first);
            pq.pop();
        }
        return ans;
    }
};
```
First we count the frequencies, then we find the top $k$ ones with the highest frequency. The code above used a priority queue (pq) to record the order, but adding a number to a pq is $O(\log n)$, and adding $n$ numbers is $O(n\log n - n)$, which is not much faster than using a vector and sorting it. We can do much better with the following code:
```cpp
class Solution {
public:   
    vector<int> topKFrequent(vector<int>& nums, int k) {
        unordered_map<int,int> mp;
        for(int i:nums) mp[i]--;
        
        priority_queue<array<int,2>> pq;
        int i=0;
        auto it=mp.begin();
        for(;i<k;++i,++it) pq.emplace(array<int,2>{it->second,it->first});
        for(;it!=mp.end();++it){
            if(it->second>pq.top()[0]) continue;
            pq.emplace(array<int,2>{it->second,it->first});
            pq.pop();
        }
        vector<int> ans;
        while(!pq.empty()){
            ans.emplace_back(pq.top()[1]);
            pq.pop();
        }
        reverse(ans.begin(),ans.end());
        return ans;
    }
};
```
The trick is, since we only need the top $k$ ones, we can just store those in the pq. And each time we find a value that is larger than the smallest one in the pq, we remove the smallest one and add the new one. So we actually need a pq that implememts a minimum heap. We can do that by writing a custom comparator, but that's unnecessary - if we negate the numbers, the result of comparison would be inverted. So in the first step, instead of adding the frequency count, we subtract it.

Then we add the first $k$ elements, then each time if we encounter an element that has a higher frequency then the minimum frequency that we have, we remove the minimum frequency one and add the new one. The operation is only $O(\log k)$. In the end, we extract the corresponding values, but they will be in the order of small to large, so we need to reverse the order, which is $O(k)$. It's a much better algorithm when $n>>k$.

Leetcode M399 Evaluate Division

M399. Evaluate Division description:
  You are given an array of variable pairs equations and an array of real numbers values, where $equations[i] = [A_i, B_i]$ and $values[i]$ represent the equation $A_i / B_i = values[i]$. Each $A_i$ or $B_i$ is a string that represents a single variable.
  You are also given some queries, where $queries[j] = [C_j, D_j]$ represents the jth query where you must find the answer for $C_j / D_j = ?$.
  Return the answers to all queries. If a single answer cannot be determined, return $-1.0$.

Constraints:
  $1 <= equations.length <= 20$
  $equations[i].length == 2$
  $1 <= A_i.length, B_i.length <= 5$
  $values.length == equations.length$
  $0.0 < values[i] <= 20.0$
  $1 <= queries.length <= 20$
  $queries[i].length == 2$
  $1 <= C_j.length, D_j.length <= 5$
  $A_i, B_i, C_j, D_j$ consist of lower case English letters and digits.
There are quite a few graph related problems recently, this is an interesting one. The first intuition would be finding the shortest path, or just any path, and then multiplying the numbers. But come to think about it, it would be a waste of resources if we find the path for each query. Instead, we can store the previous results for future use.
Actually, there's an even easier approach: After we build the graph, we can simply assign a value for each variable based on the equations. Here is an example:
```cpp
class Solution {
    int parent[40];
    int size[40];
    int find_set(int v) {
        if (v == parent[v]) return v;
        return parent[v] = find_set(parent[v]);
    }
    void union_sets(int a, int b) {
        a = find_set(a);
        b = find_set(b);
        if (a != b) {
            if (size[a] < size[b])
                swap(a, b);
            parent[b] = a;
            size[a] += size[b];
        }
    }
    unordered_map<string,int> mp;
    double value[40];
    int q[40];
    double tab[40][40];
    int adj[40][40];
    int adjLen[40];
    void init(const vector<vector<string>>& equations,const vector<double>& values){
        memset(size,1,sizeof(size));
        //memset(tab,0.0,sizeof(tab));
        memset(adj,-1,sizeof(adj));
        memset(adjLen,0,sizeof(adjLen));
        memset(value,0.0,sizeof(value));
        mp.clear();
        int cnt=0;
        for(int i=0;i<equations.size();++i){
            auto it0=mp.find(equations[i][0]),it1=mp.find(equations[i][1]);
            int n0,n1;
            if(it0==mp.end()){
                if(it1==mp.end()){
                    it1=mp.insert({equations[i][1],cnt++}).first;
                    n1=it1->second;
                    parent[n1]=n1;
                }
                it0=mp.insert({equations[i][0],cnt++}).first;
                n0=it0->second;
                parent[n0]=n1;
            }
            else if(it1==mp.end()){
                n0=it0->second;
                it1=mp.insert({equations[i][1],cnt++}).first;
                n1=it1->second;
                parent[n1]=parent[n0];
            }
            else{
                n0=it0->second;
                n1=it1->second;
                union_sets(n0,n1);
            }
            adj[n0][adjLen[n0]++]=n1;
            adj[n1][adjLen[n1]++]=n0;
            tab[n0][n1]=values[i];
            tab[n1][n0]=1/values[i];
        }
        for(int i=0;i<cnt;++i){
            if(parent[i]!=i) continue;
            value[i]=1;
            q[0]=i;
            int l=0,r=1;
            while(l<r){
                int curr=q[l++];
                for(int j=0;j<adjLen[curr];++j){
                    int nei=adj[curr][j];
                    double& v=value[nei];
                    if(v==0.0){
                        v=value[curr]/tab[curr][nei];
                        q[r++]=nei;
                    }
                }
            }
        }
        // for(auto it=mp.begin();it!=mp.end();++it){
        //     cout<<it->first<<"("<<parent[it->second]<<")"<<"=";
        //     cout<<value[it->second]<<",";
        // }
        // cout<<endl;
    }
public:
    vector<double> calcEquation(const vector<vector<string>>& equations, const vector<double>& values, const vector<vector<string>>& queries) {
        init(equations,values);
        const int qsz=queries.size();
        vector<double> res(qsz);
        for(int i=0;i<qsz;++i){
            if(mp.find(queries[i][0])==mp.end()||mp.find(queries[i][1])==mp.end()){res[i]=-1;continue;}
            auto v0=mp[queries[i][0]],v1=mp[queries[i][1]];
            if(v0==v1){res[i]=1;continue;}
            if(find_set(v0)!=find_set(v1)){res[i]=-1;continue;}
            res[i]=value[v0]/value[v1];
        }
        return res;
    }
};
```
First, we initialize the adjacency list and the table of quotients. We read the equations and use disjoint set union algorithm to find the connected components. After that, we find the root of each component and assign $1$ to its value, then update the values of all the other variables in the component based on the table with bfs.
Then we process the queries, which we simply check if the variables are both known and if they are in the same component, if so we divide their values to get the result.

It's hard to tell how the performance is because the test cases are quite small, all the submissions are 0ms. But I think this algorithm is a good balance between code complexity and efficiency. There are a few places that we can improve, though. First, if there are few queries, we don't need to assign a value to all the variables. Second, imagine a different problem: The input format is {"string1","string2","number"}, where "number==0.0" means it's a query at the current input, otherwise it means it gives the value of "string1" divided by "string2". The algorithm above would be very inefficient in this case. Can we modify the algorithm, so that it updates the values when a new input is given?

It turns out that we can modify the disjoint sets union algorithm to achieve this goal:
```cpp
class Solution {
    pair<int,double> parent_value[40];
    int size[40];
    pair<int,double> find_set(int v) {
        auto& pv=parent_value[v];
        if (v == pv.first) return pv;
        auto rv=find_set(pv.first);
        return {pv.first=rv.first,pv.second*=rv.second};
    }
    void union_sets(int a, int b,double a_b) {
        auto pa = find_set(a);
        auto pb = find_set(b);
        if (pa.first != pb.first) {
            if (size[pa.first] < size[pb.first]){
                swap(pa, pb);
                a_b=1/a_b;
            }
            parent_value[pb.first]={pa.first,pa.second/pb.second/a_b};
            size[pa.first] += size[pb.first];
        }
    }
    unordered_map<string,int> mp;
    void init(const vector<vector<string>>& equations,const vector<double>& values){
        memset(size,1,sizeof(size));
        memset(parent_value,0.0,sizeof(parent_value));
        mp.clear();
        int cnt=0;
        for(int i=0;i<equations.size();++i){
            auto it0=mp.find(equations[i][0]),it1=mp.find(equations[i][1]);
            int n0,n1;
            if(it0==mp.end()){
                if(it1==mp.end()){
                    it1=mp.insert({equations[i][1],cnt++}).first;
                    n1=it1->second;
                    parent_value[n1]={n1,1};
                }
                it0=mp.insert({equations[i][0],cnt++}).first;
                n0=it0->second;
                parent_value[n0]={n1,parent_value[n1].second*values[i]};
            }
            else if(it1==mp.end()){
                n0=it0->second;
                it1=mp.insert({equations[i][1],cnt++}).first;
                n1=it1->second;
                parent_value[n1]={parent_value[n0].first,parent_value[n0].second/values[i]};
            }
            else{
                n0=it0->second;
                n1=it1->second;
                union_sets(n0,n1,values[i]);
            }
        }
    }
public:
    vector<double> calcEquation(const vector<vector<string>>& equations, const vector<double>& values, const vector<vector<string>>& queries) {
        init(equations,values);
        const int qsz=queries.size();
        vector<double> res(qsz);
        for(int i=0;i<qsz;++i){
            if(mp.find(queries[i][0])==mp.end()||mp.find(queries[i][1])==mp.end()){res[i]=-1;continue;}
            auto v0=mp[queries[i][0]],v1=mp[queries[i][1]];
            if(v0==v1){res[i]=1;continue;}
            if(find_set(v0).first!=find_set(v1).first){res[i]=-1;continue;}
            res[i]=parent_value[v0].second/parent_value[v1].second;
        }
        return res;
    }
};
```
This way, we can update the value upon each query.

Sunday, May 14, 2023

Leetcode M2466 Optimization

M2466 description:
Given the integers zero, one, low, and high, we can construct a string by starting with an empty string, and then 
at each step perform either of the following:
    Append the character '0' zero times.
    Append the character '1' one times.
This can be performed any number of times.
A good string is a string constructed by the above process having a length between low and high (inclusive).
Return the number of different good strings that can be constructed satisfying these properties. Since the answer 
can be large, return it modulo $10^9$ + 7.
Constraints:
    1 <= low <= high <= $10^5$
    1 <= zero, one <= low
An optimization to the typical solution is, noticing that if gcd(zero,one) is not one, we can skip a lot of unnecessary computations.
(And different combinations of ("zero","one")s may conform to the same entry in the table.)
We can save what we've calculated for future inputs.

# Code
```
class Solution {
    const int mo=1000000007;
    unordered_map<string,vector<int>> lookUp;
    unsigned int gcd(unsigned int a,unsigned int b){//make sure a,b>0
        while(a>0){
            b=b%a;
            swap(a,b);
        }
        return b;
    }
public:
    int countGoodStrings(int low, int high, int zero, int one) {
        int s=min(zero,one),l=max(zero,one);
        int g=gcd(s,l);
        s/=g;l/=g;low=(low-1)/g+1;high=high/g;
        string st=to_string(s)+','+to_string(l);
        auto it=lookUp.find(st);
        if(it==lookUp.end()){
            vector<int> table(high+1,0);
            for(int i=s;i<=l;i+=s){
                table[i]=1;
            }
            table[l]++;
            for(int i=l+1;i<=high;++i){
                table[i]=(table[i-s]+table[i-l])%mo;
            }
            int r=0;
            for(int i=low;i<=high;++i){
                r+=table[i];
                r%=mo;
            }
            table[0]=high;
            lookUp[st]=move(table);
            return r;
        }
        else{
            auto& table=it->second;
            if(high>table[0]){
                table.resize(high+1);
                for(int i=table[0]+1;i<=high;++i){
                    table[i]=(table[i-s]+table[i-l])%mo;
                }
                table[0]=high;
            }
            int r=0;
            for(int i=low;i<=high;++i){
                r+=table[i];
                r%=mo;
            }
            return r;
        }
    }
};
```
In this case we may change it to
```
unordered_map<long long,vector<int>> lookUp;
long long st=(long long)s*100000+l;
```
which may be faster than converting to a string? I'm just too lazy to write a hash for pair<int,int>...

Leetcode E2215 M1456 E1572 Optimization

E2215 description:
Given two 0-indexed integer arrays nums1 and nums2, return a list answer of size 2 where:
    answer[0] is a list of all distinct integers in nums1 which are not present in nums2.
    answer[1] is a list of all distinct integers in nums2 which are not present in nums1.
Note that the integers in the lists may be returned in any order.
Constraints:
    1 <= nums1.length, nums2.length <= 1000
    -1000 <= nums1[i], nums2[i] <= 1000
Almost all the solutions use two tables, but actually one table is enough.
# Code
```
char tab_b[2001];
char* tab=tab_b+1000;
class Solution {
public:
    vector<vector<int>> findDifference(vector<int>& nums1, vector<int>& nums2) {
        memset(tab_b,0,sizeof(tab_b));
        for(int n:nums1){
            tab[n]=1;
        }
        vector<int> r1,r2;
        r1.reserve(nums1.size());
        r2.reserve(nums2.size());
        for(int n:nums2){
            if(tab[n]==0){
                if(tab[n]!=2) r2.emplace_back(n);
            }
            tab[n]=2;
        }
        for(int n:nums1){
            if(tab[n]==1){r1.emplace_back(n); tab[n]=0;}
        }
        return vector<vector<int>>{r1,r2};
    }
};
```
Runtime 7 ms Beats 100%
Memory 26.7 MB Beats 94.43%

---
M1456 description:
Given a string s and an integer k, return the maximum number of vowel letters in any substring of s with length k.
Vowel letters in English are 'a', 'e', 'i', 'o', and 'u'.
Constraints:
    1 <= s.length <= $10^5$
    s consists of lowercase English letters.
    1 <= k <= s.length
Another very easy problem. But is your solution optimized? The following method may be 10 times faster:
# Code
```
class Solution {
    bool isVow['z'+1]={false};
    void isVowInit(){
        isVow['a']=isVow['e']=isVow['i']=isVow['o']=isVow['u']=true;
    }
public:
    Solution(){
        isVowInit();
    }
    int maxVowels(string s, int k) {
        int m=0,i=0;
        for(;i<k;++i) if(isVow[s[i]]) m++;
        int max=m;
        for(;i<s.size();++i){
            m+=isVow[s[i]];
            m-=isVow[s[i-k]];
            max=m>max?m:max;
        }
        return max;
    }
};
```
1 operation instead of (mostly and at most) 5, so I was expecting a 5x speedup. But somehow it's almost 10 times faster?
![M1456. benchmark.PNG](https://assets.leetcode.com/users/images/9f69c641-cc30-481e-bb9d-ff393965fa63_1683253659.5501282.png)

---
E1572 description:
Given a square matrix mat, return the sum of the matrix diagonals.
Only include the sum of all the elements on the primary diagonal and all the elements on the secondary diagonal 
that are not part of the primary diagonal.
Also very easy. But instead of checking if we are at the center every time, we can just check it once at the end. Also we can do it in one iterations instead of 2.
```
class Solution {
public:
    int diagonalSum(const vector<vector<int>>& mat) {
        int s=0;
        const int l=mat.size();
        for(int i=0;i<l;++i) s+=mat[i][i]+mat[l-1-i][i];
        return s-l%2*mat[l/2][l/2];
    }
};
```

Leetcode E1822 Optimization

I think I'll stop posting solutions on leetcode discussion because everyone just posts the same stuff there and anything novel doesn't seem to be recognized, so I'll just post them here instead. Starting from E1822. Sign of the Product of an Array.
Description:
    There is a function signFunc(x) that returns:
        1 if x is positive.
        -1 if x is negative.
        0 if x is equal to 0.
    You are given an integer array nums. Let product be the product of all values in the array nums.
    Return signFunc(product).
It's very easy. But can you find the most optimized way to solve it? Comparing with normal approach, a simple modification may speed up your code 70% to 100%.
# Code
```
class Solution {
public:
    int arraySign(vector<int>& nums) {
        int r=0;
        for(int i : nums){
            if(i==0) return 0;
            r^=i;
        }
        return r>=0?1:-1;
    }
};
```
---

edit:
So I took a look at the assembly code that the following two codes generate,

code1
```
int arraySign(int nums[],int len){
  int r=1;
  for(int i=0;i<len;++i){
    if(nums[i]==0) return 0;
    r^=nums[i];
  }
  return r>=0?1:-1;
}
```

code2
```
int arraySign(int nums[],int len){
  int r=1;
  for(int i=0;i<len;++i){
    if(nums[i]==0) return 0;
    if(nums[i]<0) r=-r;
  }
  return r;
}
```
without optimization:
code1

	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movq	%rdi, -24(%rbp)
	movl	%esi, -28(%rbp)
	movl	$1, -8(%rbp)
	movl	$0, -4(%rbp)
.L5:
	movl	-4(%rbp), %eax
	cmpl	-28(%rbp), %eax
	jge	.L2
	movl	-4(%rbp), %eax
	cltq
	leaq	0(,%rax,4), %rdx
	movq	-24(%rbp), %rax
	addq	%rdx, %rax
	movl	(%rax), %eax
	testl	%eax, %eax
	jne	.L3
	movl	$0, %eax
	jmp	.L4
.L3:
	movl	-4(%rbp), %eax
	cltq
	leaq	0(,%rax,4), %rdx
	movq	-24(%rbp), %rax
	addq	%rdx, %rax
	movl	(%rax), %eax
	xorl	%eax, -8(%rbp)
	addl	$1, -4(%rbp)
	jmp	.L5
.L2:
	cmpl	$0, -8(%rbp)
	js	.L6
	movl	$1, %eax
	jmp	.L8
.L6:
	movl	$-1, %eax
.L8:
	nop
.L4:
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

code2

	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movq	%rdi, -24(%rbp)
	movl	%esi, -28(%rbp)
	movl	$1, -8(%rbp)
	movl	$0, -4(%rbp)
.L6:
	movl	-4(%rbp), %eax
	cmpl	-28(%rbp), %eax
	jge	.L2
	movl	-4(%rbp), %eax
	cltq
	leaq	0(,%rax,4), %rdx
	movq	-24(%rbp), %rax
	addq	%rdx, %rax
	movl	(%rax), %eax
	testl	%eax, %eax
	jne	.L3
	movl	$0, %eax
	jmp	.L4
.L3:
	movl	-4(%rbp), %eax
	cltq
	leaq	0(,%rax,4), %rdx
	movq	-24(%rbp), %rax
	addq	%rdx, %rax
	movl	(%rax), %eax
	testl	%eax, %eax
	jns	.L5
	negl	-8(%rbp)
.L5:
	addl	$1, -4(%rbp)
	jmp	.L6
.L2:
	movl	-8(%rbp), %eax
.L4:
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc


with -O1 (same result as -O):
code1

	.cfi_startproc
	endbr64
	testl	%esi, %esi
	jle	.L4
	movq	%rdi, %rax
	leal	-1(%rsi), %edx
	leaq	4(%rdi,%rdx,4), %rsi
	movl	$1, %ecx
.L3:
	movl	(%rax), %edx
	testl	%edx, %edx
	je	.L1
	xorl	%edx, %ecx
	addq	$4, %rax
	cmpq	%rsi, %rax
	jne	.L3
	sarl	$31, %ecx
	movl	%ecx, %edx
	orl	$1, %edx
.L1:
	movl	%edx, %eax
	ret
.L4:
	movl	$1, %edx
	jmp	.L1
	.cfi_endproc

code2

	.cfi_startproc
	endbr64
	testl	%esi, %esi
	jle	.L5
	movq	%rdi, %rax
	leal	-1(%rsi), %edx
	leaq	4(%rdi,%rdx,4), %rdi
	movl	$1, %ecx
.L4:
	movl	(%rax), %edx
	testl	%edx, %edx
	je	.L1
	movl	%ecx, %esi
	negl	%esi
	testl	%edx, %edx
	cmovs	%esi, %ecx
	addq	$4, %rax
	cmpq	%rdi, %rax
	jne	.L4
	movl	%ecx, %edx
.L1:
	movl	%edx, %eax
	ret
.L5:
	movl	$1, %edx
	jmp	.L1
	.cfi_endproc


with -Ofast (same result as -O2 and -O3):
code1

	.cfi_startproc
	endbr64
	testl	%esi, %esi
	jle	.L4
	leal	-1(%rsi), %eax
	movl	$1, %edx
	leaq	4(%rdi,%rax,4), %rcx
	jmp	.L3
	.p2align 4,,10
	.p2align 3
.L12:
	addq	$4, %rdi
	xorl	%eax, %edx
	cmpq	%rcx, %rdi
	je	.L11
.L3:
	movl	(%rdi), %eax
	testl	%eax, %eax
	jne	.L12
	ret
	.p2align 4,,10
	.p2align 3
.L11:
	movl	%edx, %eax
	sarl	$31, %eax
	orl	$1, %eax
	ret
.L4:
	movl	$1, %eax
	ret
	.cfi_endproc

code2

	.cfi_startproc
	endbr64
	testl	%esi, %esi
	jle	.L5
	leal	-1(%rsi), %eax
	movl	$1, %edx
	leaq	4(%rdi,%rax,4), %rsi
	jmp	.L4
	.p2align 4,,10
	.p2align 3
.L12:
	movl	%edx, %ecx
	negl	%ecx
	testl	%eax, %eax
	cmovs	%ecx, %edx
	addq	$4, %rdi
	cmpq	%rsi, %rdi
	je	.L11
.L4:
	movl	(%rdi), %eax
	testl	%eax, %eax
	jne	.L12
	ret
	.p2align 4,,10
	.p2align 3
.L11:
	movl	%edx, %eax
	ret
.L5:
	movl	$1, %eax
	ret
	.cfi_endproc

So, indeed code 1 has fewer operations inside the loop, with or without optimization... With optimization, it has 7/10 of the number of operations of code 2 inside the loop, it seems?
---
So I did some benchmark too,
![image](https://i.postimg.cc/k4tZ19vm/E1822-benchmark.png)
![image](https://i.postimg.cc/V6y4Ntnx/E1822-benchmark1.png)
![image](https://i.postimg.cc/FsPZL80g/E1822-benchmark2.png)
It indeed saves about 30% to 50% time.

---
Further optimization? If the 64bit operation is as fast as 32bit operation and "int" is 32bit, it may seem that using e.g. uint_fast64_t would further halve the time because we can do two "xor"s at the same time. But in this case, it doesn't help much because checking 0 would be much more complicated. You would have to define two more constants to check if the first half of the 64 bit int is all 0, then the second half. So it probably doesn't work. I think the code above is pretty much optimized for this problem.

An algorithm to iterate through all perfect matchings on a complete graph (Chord diagram)

I thought this function was helpful for solving another problem. It did work, but it was unnecessary since there were faster algorithms to solve it. But I was interested, so I ended up finishing it. The algorithm iterates through all possible ways to match $2n$ points, i.e. chord diagrams.

Input constraints: $v.size()$ is even, elements are distinct and comparable, $v[2i]<v[2i+1]$.
Complexity: $O(n)$

```cpp
template<typename T>
bool next_matching(vector<T>& v){
    if(v.size()<=2) {return false;}
    size_t s=v.size();
    int mini=v[s-2],maxi=v[s-1];
    size_t start=s-4;
    while(start>0&&v[start]<=v[start+2]&&v[start+1]>=v[start+3]){start-=2;mini=v[start];maxi=v[start+1];}
    if(start==0&&v[0]<=v[2]&&v[1]>=v[3]){
        vector<T> v1(s);
        for(size_t i=0;i<s/2;++i){v1[i]=v[2*i];v1[s-i-1]=v[2*i+1];}
        v=move(v1);
        return false;
    }
    vector<int> v1(s);
    for(size_t i=0;i<=start+1;++i){
        v1[i]=v[i];
    }
    for(size_t i=0;i<(s-start)/2-1;++i){
        v1[start+2+i]=v[start+2+2*i];v1[s-i-1]=v[start+2+2*i+1];
    }
    size_t u=upper_bound(v1.begin()+start+2,v1.end(),v1[start+1])-v1.begin();
    swap(v1[start+1],v1[u]);
    v=move(v1);
    return true;
}
```

Sample output:

```cpp
vector<vector<int>> allMatchings(int n){
    vector<int> v(n);
    for(int i=0;i<n;++i) v[i]=i;
    int cnt=1;
    for(int i=n-1;i>0;i-=2) cnt*=i;
    vector<vector<int>> r;
    r.reserve(cnt);
    r.emplace_back(v);
    while(next_matching(v)) r.emplace_back(v);
    return r;
}
int main(){
    vector<vector<int>> r=allMatching(6);
    for(auto&v:r){
        for(int i=0;i<v.size()/2;++i){
            cout<<"("<<v[2*i]<<","<<v[2*i+1]<<")";
        }
        cout<<endl;
    }
    return 0;
}
```
output:
(0,1)(2,3)(4,5)
(0,1)(2,4)(3,5)
(0,1)(2,5)(3,4)
(0,2)(1,3)(4,5)
(0,2)(1,4)(3,5)
(0,2)(1,5)(3,4)
(0,3)(1,2)(4,5)
(0,3)(1,4)(2,5)
(0,3)(1,5)(2,4)
(0,4)(1,2)(3,5)
(0,4)(1,3)(2,5)
(0,4)(1,5)(2,3)
(0,5)(1,2)(3,4)
(0,5)(1,3)(2,4)
(0,5)(1,4)(2,3)

The only other place that mentions the same problem is here, solved with a recursive algorithm. But the iterative is more efficient. Benchmarking shows that the recursive solution is 20 times slower for input size $2n=8$, and 30 times slower for input size $2n=10$.

Benchmarking result

Not sure what the applications might be, but I'll just share it here.

A Recursive Geometric Proof for 1962 IMO Problem 6

Youtube recommended a video to me, where the following problem is solved: Consider an isosceles triangle. Let $R$ be the radius of its circ...