{x|x∉x}

Sunday, July 13, 2025

IBM Ponder this 2025 06 solution

Problem description can be found here.
You can play the games below with a standard keyboard.
Choose main game or bonus game.
The current location of the frog is colored in green, to make it easier to tell.
Controls: 
  Use numpad keys 1 2 3 4 6 7 8 9 to move the frog in the corresponding directions.
  To make it easier to solve, you can use key 5 or 0 to undo previous moves.
  Make sure "Num lock" is on!

You won!

Input path (only characters 'A'-'H' are allowed):

My solutions are
CHCHCHCHCHCHCHCHCHCHCHCHCHCHCHCHCHCHADACECBCEFGDFCFCFCFCFCFCFCFCFCFCFCFCFCFCFEBECABEDADADADADADADADADADADADCHCAGHCBGBGBGBGBGBGBGBGBGBGBGBGBGBGBAFAGEFAHEHEHEHEHEHEHEHGFCFCFCFCFCFCFCFCFCFCFCFCFCFEBEBEDADADADADADCHCHCBGBGBGBGBGBGBGBGBAFAHEHEHEHEHGDGECDGFCFCFCFCFEDADADACAHBGBGBHC
to the main problem, and
EGFGFECBCECBADEBDCHCAHBGFGAHEHFGFGABCHGBHADACDEBADEDBHADACDEFDGH
to the bonus problem.

I've been quite busy lately, so I'll stop solving the PonderThis puzzles for a while. Don't know when I'll resume, anyway, good luck to the other challengers! Cheers!

Thursday, April 10, 2025

How smart are LLMs? The BrainF test

A few days ago I was thinking, "How smart are LLMs?" It's well known that LLMs are very knowledgeable - You ask them anything, and they will spit out a summarization of human knowledge on that topic. But there's hardly any creativity. So, to test how smart they are, I should find a question that not many people has asked or answered before.
Later, I remembered an interesting question that I tried to answer before. That was when I first heard of the "Brainfuck" programming language. It's a Turing Complete language with only eight commands. Programming in that language is not easy. The problem is, "What's the shortest Brainfuck program that outputs the word 'Brainfuck'?"
After thinking about it for a while, I came up with a program that contains only 94 characters. Later, I found this webpage which generates brainfuck programs that output the given text. The program that this page generates is 107 characters long - impressive for a general solver, although not optimal. And it only works with 8 bit cells, while my program works with any cell size at least 7 bit.
Today, I thought, "What if I ask the LLMs this question? How well will they perform?" The following is the answer from ChatGPT and deepseek, given the prompt "write a brainfuck program that outputs the word "Brainfuck", make it as short as possible."

ChatGPT quickly gave the following answer:

Here is a Brainfuck program that outputs the word "Brainfuck":
```brainfuck
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<<<.+++.------.--------.>>+.>++.

```
This program is optimized to output "Brainfuck" using a combination of loops and efficient value increments.

But when I tested this program, the output is "Hello HKE=Y"". So, apparently it just copied a "Hello World" program, maybe from wikipedia, then messed it up with its "optimizations". Kind of expected.
  
Deepseek thought for about a minute, then it stopped thinking. I clicked "continue thinking", then it started to output its thoughts.
It's pretty long, so I'll omit its thought process. After 495 seconds, it gave me an answer.
In its thought process, it found a length 135 solution:
+++++++[>++++++++++<-]>----.<++++++[>++++++++<-]>.<++[>--------<-]>-.++++++++.+++++.--------.<+++[>+++++<-]>.<+++[>------<-]>.++++++++.
This solution gives correct output. But its calculation of the length is wrong, somehow it said that the length is 108. It seems that it didn't include the characters '.','[' and ']'. This seems similar to the "strawberry" problem.
Then it thought about how to make it shorter. This time it came up with this solution:
++++++[>+++++++++++<-]>.<++++++[>++++++++<-]>.<++[>--------<-]>-.++++++++.+++++.--------.<+++[>+++++<-]>.<+++[>------<-]>.++++++++.
This solution still gives the correct output. The length is 131 characters, but deepseek still skipped those characters and said that the length is also 108 characters. And it considered this to be optimal.
Here's its final answer:

To output the word "Brainfuck" in Brainfuck with minimal code length, we can optimize by using loops to generate ASCII values efficiently. Here's the shortest solution:
```brainfuck
++++++[>+++++++++++<-]>.<++++++[>++++++++<-]>.<++[>--------<-]>-.++++++++.+++++.--------.<+++[>+++++<-]>.<+++[>------<-]>.++++++++.
```
**Explanation:**
1. **B (66):** 
   - `++++++[>+++++++++++<-]>.`
   - Initialize cell0 to 6, loop to add 11 six times (6*11=66), then print.
2. **r (114):** 
   - `<++++++[>++++++++<-]>.`
   - Move left to cell0 (now 0), set to 6, loop to add 8 six times (48) to cell1 (66+48=114), then print.
3. **a (97):** 
   - `<++[>--------<-]>-.`
   - Move left to cell0 (0), set to 2, loop to subtract 8 twice (16) from cell1 (114-16=98), subtract 1 more (97), then print.
4. **i (105):** 
   - `++++++++.`
   - Add 8 to cell1 (97+8=105) and print.
5. **n (110):** 
   - `+++++.`
   - Add 5 to cell1 (105+5=110) and print.
6. **f (102):** 
   - `--------.`
   - Subtract 8 from cell1 (110-8=102) and print.
7. **u (117):** 
   - `<+++[>+++++<-]>.`
   - Move left to cell0 (0), set to 3, loop to add 5 three times (15) to cell1 (102+15=117), then print.
8. **c (99):** 
   - `<+++[>------<-]>.`
   - Move left to cell0 (0), set to 3, loop to subtract 6 three times (18) from cell1 (117-18=99), then print.
9. **k (107):** 
   - `++++++++.`
   - Add 8 to cell1 (99+8=107) and print.
This approach efficiently uses loops and pointer navigation to minimize the number of commands, resulting in the shortest possible Brainfuck program to output "Brainfuck".

Comparing the results:
  Human designer (me): length=94
  Human created program: length=107 (only works with 8 bit cells)
  ChatGPT: wrong answer
  Deepseek: length=131
So, apparently some reasoning LLMs are able to give a correct answer, but it's unable to achieve human level optimizations, which requires some creativity.
If given a huge amount of data, the LLMs may be able to beat human on this as well, just from pure experience. But that makes it meaningless, because a human don't really need any experience at all! I just learned the rules and I found quite a few shortcuts. It's similar in research, one must find creative "tricks" to work around the obstacles that no one else has encountered before. I wonder how well the more advanced LLMs perform. If anyone has access to them, please let me know the result. I'd be very surprised if they come up with an even shorter solution without being extensively trained on this. Well, I'm also interested in their performance after being trained... How much improvement can that provide? Anyway, at least for now, they are far from beating my record.
If any human being is interested in this problem, let me know what's your best solution! I'll keep my 94 character solution as a secret, until it's beaten. :D

Sunday, December 29, 2024

A Recursive Geometric Proof to 1962 IMO Problem 6

Youtube recommended a video to me, where the following problem is solved:
Consider an isosceles triangle. Let $R$ be the radius of its circumscribed circle and $r$ the radius of its inscribed circle. Prove that the distance $d$ between the centers of these two circles is $d=\sqrt{R(R-2r)}$.
As usual, I tried to solve it myself before watching the rest of the video. And I found a really nice solution.
Let the triangle be $\triangle ABC$, where $AB=AC$. Let $O$ be the center of the circumscribed circle, and $I$ be the center of the inscribed circle. So, $d=IO=|AI-AO|=|AI-R|$. What we want to prove becomes
\begin{align}
d^2&=R(R-2r)\\
(AI-R)^2&=R(R-2r)\\
AI^2-2AI*R+R^2&=R^2-2rR\\
AI^2&=2R(AI-r)\\
\frac{AI}{2R}&=\frac{AI-r}{AI}
\end{align}
Let the inscribed circle touch $AB$ at $P$, and $AC$ at $Q$, and intersect $AI$ at $X$. So, $AI-r=AX$. We want to show that $\frac{AI}{2R}=\frac{AX}{AI}$. Notice that $2R$ is the diameter of the circumcircle, I wonder if $AI$ can be a diameter too...
Notice that $IP\perp AB$ and $IQ\perp AC$, if we create a circle with diameter $AI$, it will pass through points $P$ and $Q$, so it's the circumcircle of $\triangle APQ$. Because of symmetry, we must have $AP=AQ$. So $\triangle APQ \sim \triangle ABC$. If $X$ is the incenter of $\triangle APQ$, it's obvious that $\frac{AI}{2R}=\frac{AX}{AI}$ due to similarity.
$\angle APX=\frac{1}{2}\angle AIP=\frac{1}{2}\angle B=\frac{1}{2}\angle APQ$, so $\angle APX=\angle XPQ$. $\angle PAX=\angle QAX$. So, indeed, $X$ is the incenter of $\triangle APQ$, and the proof is complete.

This shows an interesting property of isosceles triangles: $AI$ is the diameter of the circumcircle of $\triangle APQ$, and $X$ is its incenter. We can do this recursively to $\triangle APQ$, and get an even smaller version of this problem.
On the other hand, we can also extend $AO$ to intersect its circumcircle at $D$, and draw a tangent line at $D$, then $DB\perp AB$ and $DC\perp AC$. If we extend $AD$ to $E$ so that $DE=DB=DC$, and create a line $l\perp AE$ at $E$ which intersects $AB$ at $M$ and $AC$ at $N$, $D$ would be the incenter of $\triangle AMN$. This can go down infinitely as well. $X$,$I$,$D$,..., they are both the incenter of the bigger triangle, and the bottom of the diameter of the smaller triangle.

After I found this cool recursive geometric proof, I went back to watch the rest of the video. Again, I was disappointed - basically it used trigonometry to "brute force" the proof. It's not wrong, of course, but boring.
It did mention that this is an IMO problem, so I wonder which one it was from. I searched it and it turned out to be the Problem 6 of 1962 IMO, and here is another solution. This solution not only solves this problem, but also proves that it applies to not only isosceles triangles, but ALL triangles!
The solution on that page is an elegant geometric proof, but since it doesn't demonstrate the recursive structure of the problem on isosceles triangles, I think my proof is a nice alternative solution.

Thursday, December 26, 2024

Traversing a Tree with Adjacecncy Sums - LeetCode H2872. Maximum Number of K-Divisible Components, H3203. Find Minimum Diameter After Merging Two Trees

Here's a simple question: How do you traverse a tree, which is represented as a set of edges in the form of pairs of vertices?
My first reaction used to be a typical graph traversing algorithm, like BFS or DFS. They are the most common algorithms for traversing graphs. But last week I found a better algorithm particularly suitable for trees which is more efficient than the typical BFS or DFS approach.

I invented this algorithm while solving the daily problem "H2872. Maximum Number of K-Divisible Components" on Dec 24, so let's use it as an example to see how this algorithm works.

2872. Maximum Number of K-Divisible Components:

There is an undirected tree with n nodes labeled from 0 to n - 1. You are given the integer n and a 2D integer array edges of length n - 1, where edges[i] = [a_i, b_i] indicates that there is an edge between nodes a_i and b_i in the tree.

You are also given a 0-indexed integer array values of length n, where values[i] is the value associated with the i^th node, and an integer k.

A valid split of the tree is obtained by removing any set of edges, possibly empty, from the tree such that the resulting components all have values that are divisible by k, where the value of a connected component is the sum of the values of its nodes.

Return the maximum number of components in any valid split.
Constraints:

1 <= n <= 3 * 10⁴

0 <= values[i] <= 10⁹

1 <= k <= 10⁹

Sum of values is divisible by k.

...

Let's look at the example:
Input: n = 5, edges = [[0,2],[1,2],[1,3],[2,4]], values = [1,8,1,4,4], k = 6
Output: 2

We can cut it into two parts, where the sum of the values of each part, 6 and 12 in this example, is a multiple of $k=6$. It's impossible to cut it into more parts without violating the condition. Fairly easy to understand.

Before jumping right into the solution, I have a comment on if it's not limited to trees:
If it's on any graph, then it can be NP-Hard.
Example: A complete graph. You can choose any set of vertices, and they are all connected.
So, basically the question becomes, there is a multiset of values, and you must find the maximum number of non-overlapping subsets that you can separate it into where the sum of each of them is a multiple of $k$.
You're guaranteed that the sum of the multiset is a multiple of $k$, but it doesn't really matter. It can be an arbitrary multiset, and to make the sum a multiple of $k$, just add another number $-sum$ in it (or $mk-sum$ with a large enough $m$, if it has to be positive), where $sum$ is the sum of the original multiset.
To determine if it's possible to separate it into two subsets is already NP-Complete, see Subset sum problem. Finding the maximum number of subsets should be harder, so I believe it's NP-Hard.

Now it's time to solve this problem.
First, we look for just one edge to cut. Where do we start?
The situation is the simplest if the edge is in a long chain which ends at a leaf. In that case, we just need the sum of the values from the leaf to the cut to be a multiple of $k$. And if we find such an edge, we can simply cut it out and keep going. The tree becomes smaller, and we make progress.
But what happens when the next vertex is not on a chain, instead it has at least 3 branches? If we can cut the edge that connects the current vertex and the branching one, then it's easy, this branch is simply removed and the branching vertex has its degree decreased by one.
What happens if we can't cut that edge? That means the branch must be connected to that vertex forever. So, the branch is basically "attached" to it. In that case, we can simply "absorb" that branch into the vertex - in this problem, we absorb the sum of the values in that branch into the value of that vertex, and remove the branch.
So, either way, we can remove branches after branches, until the branching vertex has degree 2, which turns it into a chaining vertex. At last, the tree will be reduced to a single chain, and when we reach another leaf, the whole tree is processed.

I came up with this algorithm myself. A few days later, @Sergei mentioned that this was called "Kahn's algorithm". Then I checked the editorial, which called it "Topological Sort / Onion Sort".
At this point, the algorithm is pretty straight forward. The only thing left is to implement it.

First, we must construct the tree from the list of edges so that we can traverse it. There are many ways to store a graph. A typical approach is to use an adjacency list, where for each vertex we store an array of its neighboring vertices. A 2d C-array would be fast, but it must take $O(N_{max}^2)$ space, which may not be feasible. To save space, typically a vector of vectors or array of vectors is used. But that requires dynamic allocation for each vertex.
Next, a problem is, I want to be able to remove a branch when I reach a branching vertex. In other words, I want to be able to modify the adjacency list to remove edges. It can be done with a vector of vectors, i.e, in the vector of neighbors of the branching vertex, find the vertex that we come from and delete it by swapping it with the last element, then pop_back(). But it's not very efficient (although the compexity is the same). A better way may be to use a hash set to store the list of neighbors, but it's also a bit complicated.
So, I thought, is there any simpler ways to remove a neighbor of a vertex from the stored data? Imagine if we can just subtract the vertex id from the total of the neighbors...

And that's where things start to become interesting. I explored this idea further: If, for every vertex, I only store the sum of the ids of its neighbors, how much can I achieve? We start from a leaf, and the sum gives us its only neighbor. Next, if we are at a vertex of degree 2, the sum is of its two neighbors. If we subtract the previous vertex from the sum, the result is the vertex that we need to go next!
But what happens if the next vertex has a degree more than 2? In that case, we can't tell where to go next.
But we don't need to go anywhere! The branch that we come from should be removed, and that can be done by subtracting the current vertex from the sum of the neighbors of the branching vertex, then decreasing its degree by one. Then, we simply start from another leaf, and repeat this process.
Let's take a look at the example with this implementation.

The numbers inside the circles are the vertex ids, to the right are the values. The blue numbers to the left are the sum of neighbors.
Suppose we start from the leaf #4. The current value is 4, we can't cut this edge. The sum of its neighbor is 2, so we check vertex #2. Its degree is 3, greater than 2, so we proceed by absorbing the current value into its value, i.e, update its value to 1+4=5 (black number). Then, we want to remove this branch. We do this by subtracting the vertex that we come from, 4, from the sum of neighbors of that vertex, 5, so the updates sum of neighbors of #2 becomes 5-4=1 (blue number). We also decrease the degree of #2 by one, so now its degree is 2.
This branch is removed, so we start from another leaf, either #0 or #3. Now the tree is reduced to a chain, it's easy to traverse it and get the result.

Basically, the algorithm goes like this:

1.Prepare the adjacency sums representation of the tree, i.e, an array of pairs of unsigned integers, one for the sum of its neighbors, the other for its degree, both initialized to be 0. Read the list of edges, and update the sums and degrees accordingly.

2.Go through the adjacency sums, until we find a degree one vertex - a leaf. Set "current vertex" to its id, and "previous vertex" to 0. Add its value to the sum of values.
Check if the sum of values %k is 0. If so, add 1 to the result.
Check its neighbor:
If it has degree 1, it's over, add 1 to the result and return it.
If it has degree 2, we're in a chain. Assign "current vertex" to "previous vertex", and subtract "current vertex" from the sum of neighbors of that vertex to get where to go next. Set that neighbor to "current vertex". Basically we moved one step along the chain.
If its degree is more, we add our current sum of values to the value of that vertex. Then we subtract our "current vertex" from the sum of neighbors of that vertex, and decrease its degree by one. Then we go back to Step 2., and look for the next leaf and start from there.

This way, we only need a single array of pairs of unsigned integers to process the entire tree! Although the complexity is still O(n), It's extremely fast and space efficient, because we only need one dynamic allocation - or even zero if we know the maximum input size and simply use a C-array. Also, the space is contiguous, which gives higher cache hit.

The following is my solution:

```cpp
const int nmax=30000;
struct pa{int sn,d;};
pa adjl[nmax];
int d1vs[nmax];
class Solution {
public:
    int maxKDivisibleComponents(int n,const vector<vector<int>>& edges, vector<int>& values, int k) {
        if(k==1) return n;
        memset(adjl,0,n*sizeof(adjl[0]));
        for(const auto &v:edges){
            auto &v0=adjl[v[0]];
            v0.sn+=v[1];
            v0.d++;
            auto &v1=adjl[v[1]];
            v1.sn+=v[0];
            v1.d++;
        }
        int d1sz=0;
        for(int i=0;i<n;++i) if(adjl[i].d==1) d1vs[d1sz++]=i;
        int cnt=0;
        int pre=0;
        while(1){
            int cur=d1vs[--d1sz],nei=adjl[cur].sn-pre;
            //cout<<"pre="<<pre<<",cur="<<cur<<",nei="<<nei<<endl;
            int sum=values[cur]%k;
            while(sum){
                auto &neis=adjl[nei];
                //cout<<"degree of "<<nei<<" is "<<neis.d<<endl;
                if(neis.d==1) return cnt+1;
                if(neis.d==2){
                    sum=(sum+values[nei])%k;
                    neis.sn-=cur;
                    pre=cur;
                    cur=nei;
                    nei=neis.sn;
                    continue;
                }
                values[nei]=(values[nei]+sum)%k;
                neis.sn-=cur;
                neis.d--;
                break;
            }
            if(!sum){
                cnt++;
                //cout<<"sum==0;next cur="<<nei<<",pre="<<cur<<endl;
                if(adjl[nei].d==1) return cnt+1;
                if(adjl[nei].d==2) {d1vs[d1sz++]=nei;pre=cur;}
                else adjl[nei].sn-=cur,adjl[nei].d--,sum=0,pre=0;
            }
            else sum=0,pre=0;
        }
        return 0x1337C0DE;
    }
};
```

The previous runtime record was 70ms. The best runtime I got with this program was... 3ms. A 23 times speed up. And the memory record before was 158.6MB. My submission was 139.79MB. A huge difference. Here's my solution page.
You may have noticed that the algorithm that I used in this solution was slightly different from the one decribed above. That's because that later I noticed that I didn't really need to store all the degree one vertices in a stack ("d1vs"). I can simply go through each leaf one by one in the adjacency sums. Also, checking the sum of the value can be done before checking the neighbor, which simplifies the logic. Anyway, it's easy to simplify it.

There's another thing that can be improved. Notice that here I used a pair of "int". It's fine when the maximum vertex id is 3*10^4-1, because the sum must be smaller than 10^9. But if the maximum is larger, like in the next problem, it may overflow. Anyway, let's take a look at the problem first.

3203. Find Minimum Diameter After Merging Two Trees:

There exist two undirected trees with n and m nodes, numbered from 0 to n - 1 and from 0 to m - 1, respectively. You are given two 2D integer arrays edges1 and edges2 of lengths n - 1 and m - 1, respectively, where edges1[i] = [a_i, b_i] indicates that there is an edge between nodes a_i and b_i in the first tree and edges2[i] = [u_i, v_i] indicates that there is an edge between nodes u_i and v_i in the second tree.

You must connect one node from the first tree with another node from the second tree with an edge.

Return the minimum possible diameter of the resulting tree.

The diameter of a tree is the length of the longest path between any two nodes in the tree.
Constraints: 1 <= n, m <= 10⁵

The intuition is, the vertices that we want to choose to connect the two trees should be the "center"s of them. But how do we define this "center", and how do we find it?
The center vertex should be the vertex that is the furthest away from the leaves. To find it, we can simply remove the vertices layer by layer, where the first layer comprises all the leaves.
But what happens when we approach a branching point?
Notice that the diameter of the tree is the longest path. We want to find the vertex that has the smallest radius, which is the maximum distance from it to any leaf. So during the process of removing the vertices layer by layer, if we encounter a branching point, the first branch that approaches it - meaning that it's the shorter than the others - should just be ignored, because only the longest branch counts.
That means, this problem is perfect for the adjacency sums representation! When we encounter a branching point, we simply remove this branch, and continue with another leaf.

Here's the algorithm:

0. If the edge list is empty, return 0 as its diameter.
1. Prepare the adjacency sums.
2. Put all the leaves in an array. Initialize the layer count to be 0.
3. While the size of the array is greater than 2:
Increment layer count by 1
Go through the elements in the array:
If its neighbor has degree == 2, we move forward, replace this element with its next neighbor and update the array.
Otherwise, we subtract this vertex from the neighbor's sum of neighbors, and decrease its degree, then remove this vertex from the array by assigning it with the last element, then pop_back.
4. Finally the tree has only two leaves, meaning that it's reduced to a chain. The diameter of the tree is just the layer count times two, plus the length of this chain, where is easy to compute. We just go from one leaf to the other and find the length.
5. Now we can find the diameters of both trees. After we connect their "center" vertices, if the new longest path goes through this new edge, the diameter is the sum of the radii of the two trees plus one. But it's also possible that one tree has much smaller radius than the other, so the diameter of the new tree is simply the one of the larger tree. We can simply take the maximum of all the 3 possiblilties.

Here is my implementation:

```cpp
struct pa{uint sn,d;} adjl[100000];
struct pr{uint ind,pre;} q[100000];
class Solution {
    int getD(const vector<vector<int>>& edges){
        const uint n=edges.size()+1;
        if(n==1) return 0;
        memset(adjl,0,n*sizeof(adjl[0]));
        for(const auto &v:edges){
            adjl[v[0]].sn+=v[1];
            adjl[v[0]].d++;
            adjl[v[1]].sn+=v[0];
            adjl[v[1]].d++;
        }
        int qsz=0,lay=0;
        for(uint i=0;i<n;++i) if(adjl[i].d==1) q[qsz++]=pr{i,0};
        while(qsz>2){
            lay++;
            uint i=0;
            while(i<qsz){
                int nei=adjl[q[i].ind].sn-q[i].pre;
                if(adjl[nei].d<=2){
                    q[i].pre=q[i].ind;
                    q[i].ind=nei;
                    ++i;
                    continue;
                }
                adjl[nei].sn-=q[i].ind;
                adjl[nei].d--;
                q[i]=q[--qsz];
            }
        }
        int l=0,ind=q[0].ind,pre=q[0].pre,tar=q[1].ind;
        while(ind!=tar){
            l++;
            int nxt=adjl[ind].sn-pre;
            pre=ind;
            ind=nxt;
        }
        return 2*lay+l;
    }
public:
    int minimumDiameterAfterMerge(const vector<vector<int>>& edges1,const vector<vector<int>>& edges2) {
        int d1=getD(edges1),d2=getD(edges2);
        return max(max(d1,d2),((d1+1)>>1)+((d2+1)>>1)+1);
    }
};
```

Here, I used "uint" for the sum. You may wonder, isn't it possible to overflow, since the maximum number of vertices is 10^5 this time, and the sum can be at most 1+2+...+99999=99999*50000 which is about 5*10^9? Or is it simply because this doesn't happen in the test cases?
It's true that the test cases don't trigger any overflows, because I tested with "int" as well. But later, I realized that it doesn't even matter if it overflows.
The reason is, we don't use its value until the degree is one. So, as we accumulate the sum, if it overflows, basically it becomes mod 2^32. While we run the program, we subtract the vertices from this sum and decrease the degree. When degree is one, the sum must be the id of the vertex mod 2^32 - which is simply its id as long as it's under 2^32! So, it works as long as the vertex ids are unique numbers within the range of "uint".
Here is my solution page.
As @Suboptimal Coder mentioned in the comment, we may use "xor" instead for this sum. In modern computer processors, they should both be one clock cycle. But there may be situations where "xor" is faster/more feasible on certain hardwares, since it requires less transistors. So, it's probably a good idea to replace the addition with xor.

The previous runtime record was 209ms. An optimized implementation of BFS by @Sergei, which used a custom queue, had a runtime of 16ms. At the time of writing this article, the best runtime of my algorithm is 3ms, as shown on the submission page. The old memory record was 282.3MB, while my program used 217.75MB, again a huge improvement.

If the edges of the tree are directed instead, the logic is actually simpler. We again start from the leaves, go up the chain, remove the branch when encountering a branching vertex. For each vertex, we need to store its parent vertex and its out-degree. It's more straight forward than the undirected version.

I don't know if this data structure / implementation has ever been proposed. Anyway, I'll just call it "Jiang's Adjacency Sums Algorithm for Tree Traversing" for now, until someone points out that it's already invented by someone else, XD.

Update 250129: Edge sum variation
Today's daily problem, M684. Redundant Connection, is another one about trees. A list of edges is given, where they form a tree plus one extra edge which creats a cycle. The goal is to find the edge in the cycle that occurs last in the input.
I can use the algorithm above to remove all the braches from the graph, and only the cycle will be left. But then I won't be able to identify any of the edges in the cycle, because I will only know the sum of the two neighboring vertices for each vertex.
After giving it some thoughts, I figured out how to do it. Instead of storing the sum of the neighboring vertex indices, I store the sum of the incident edge indices. This way, I can tell which edges are removed, and I can still look up the next vertex through the input list of edges.
Here's an example of this implementation:

```
struct pa{uint s,d;} adjs[1001];
bool mark[1001];
class Solution {
public:
    vector<int> findRedundantConnection(const vector<vector<int>>& edges) {
        const uint sz=edges.size();
        memset(adjs+1,0,sz*sizeof(adjs[0]));
        memset(mark,1,sz);
        for(int i=0;i<sz;++i){
            const auto &v=edges[i];
            adjs[v[0]].d++;
            adjs[v[0]].s^=i;
            adjs[v[1]].d++;
            adjs[v[1]].s^=i;
        }
        for(int i=1;i<=sz;++i){
            if(adjs[i].d==1){
                int curV=i,curE=adjs[curV].s,nxtV=edges[curE][0];
                mark[curE]=0;
                if(nxtV==curV) nxtV=edges[curE][1];
                while(adjs[nxtV].d==2){
                    int nxtE=adjs[nxtV].s^curE;
                    curV=nxtV;
                    nxtV=edges[nxtE][0];
                    if(nxtV==curV) nxtV=edges[nxtE][1];
                    curE=nxtE;
                    mark[curE]=0;
                }
                adjs[nxtV].d--;
                adjs[nxtV].s^=curE;
            }
        }
        for(int i=sz;i>=0;--i) if(mark[i]) return edges[i];
        return edges[0];
    }
};
```

Wednesday, May 8, 2024

The limit of an averaging sequence

Consider this sequence of length $n$: $P_n=(p_0,p_1,p_2,\dots,p_{n-1})$, where $p_0=2$, and each time the next number $p_k$ can be any number within $[2,1+p_{k-1}]$ for $k\in [1,n-1]$. Consider the set of all such sequences of length $n$, $\{P_n\}$. The first few sets are:
\begin{align*}n=1:& (2)\\
n=2:& (2,2),(2,3)\\
n=3:& (2,2,2),(2,2,3),(2,3,2),(2,3,3),(2,3,4)\\
n=4:& (2,2,2,2),(2,2,2,3),(2,2,3,2),(2,2,3,3),(2,2,3,4),(2,3,2,2),(2,3,2,3),(2,3,3,2),\\
 &(2,3,3,3),(2,3,3,4),(2,3,4,2),(2,3,4,3),(2,3,4,4),(2,3,4,5)
\end{align*}
Let $P_{nj}$ be any path in $\{P_n\}$, and $\prod P_{nj}$ be the product of all the numbers in that path. The question is, what is this limit: $\text{lim}_{n\rightarrow\infty} \sum_{P_{nj}\in \{P_n\}} (\prod P_{nj})^{-1}$?

The first few sums are

$$
\renewcommand\arraystretch{1.5}
\begin{matrix}
    n=1:& \frac{1}{2}\\
    n=2:& \frac{1}{2}\frac{1}{2}+\frac{1}{2}\frac{1}{3}\\
    n=3:& \frac{1}{2}\frac{1}{2}\frac{1}{2}+\frac{1}{2}\frac{1}{2}\frac{1}{3}+\frac{1}{2}\frac{1}{3}\frac{1}{2}+\frac{1}{2}\frac{1}{3}\frac{1}{3}+\frac{1}{2}\frac{1}{3}\frac{1}{4}
\end{matrix}$$


Answer:
The limit is $e^{-1}$.


How do you prove that it converges to this value?

IBM Ponder this 2024 04 solution

Problem description, and the solution.
Hey, this time I got first place! Although there's not much to talk about.
The problem looks like a finite state machine problem. The states can be represented by a length $n$ ternary number, where the lowest digit represents the location of the largest disk, the second digit is the location of the second largest one, and so on. After each step, it transits to the next state, depending on the move.
The solution can be found by, like the solution page says, finding the periods of the sequences. Both the state and the move must be considered.

Or, just simulate the entire process. Out of all the problems that I've solved so far, this one is the most brute-forcible, if that's a word.
Each game is represented by 3 integer arrays, which correspond to the disks on the 3 rods, and a index of the move. Each time, read the move and change the arrays accordingly.
Usually the simulation of the periodic increment of the index is implemented as something like
  i=(i+1)%size;
or
  i=i+1;
  if(i==size) i=0;
But modulo and branching are costly operations, and the routine is called many times. To make it faster, I replaced it with a lookup table. The speedup is not huge, 90s to 63s, but still quite noticeable.

Wednesday, April 24, 2024

A nice identity of the sum of a certain fraction series

I saw this question on stackexchange, and I was intrigued. The equation in question is,
$$\sum_{k=1}^n\frac{2^{2k-1}}{k}\frac{\binom{2n-2k}{n-k}}{\binom{2n}{n}}=\sum_{k=1}^n\frac{1}{2k-1}$$
Are you able to prove this identity without looking at the proof that I gave under the question?





After simplifying the expression, it turned into
$$\sum_{k=1}^n\frac{1}{2k}\frac{2n}{2n-1}\frac{2n-2}{2n-3}\dots\frac{2n-2k+2}{2n-2k+1}=\sum_{k=1}^n\frac{1}{2k-1}$$
The first few $n$ gives:
$$\frac{1}{2}\frac{2}{1}=1$$
$$\frac{1}{2}\frac{4}{3}+\frac{1}{4}\frac{4}{3}\frac{2}{1}=1+\frac{1}{3}$$
$$\frac{1}{2}\frac{6}{5}+\frac{1}{4}\frac{6}{5}\frac{4}{3}+\frac{1}{6}\frac{6}{5}\frac{4}{3}\frac{2}{1}=1+\frac{1}{3}+\frac{1}{5}$$
$$\frac{1}{2}\frac{8}{7}+\frac{1}{4}\frac{8}{7}\frac{6}{5}+\frac{1}{6}\frac{8}{7}\frac{6}{5}\frac{4}{3}+\frac{1}{8}\frac{8}{7}\frac{6}{5}\frac{4}{3}\frac{2}{1}=1+\frac{1}{3}+\frac{1}{5}+\frac{1}{7}$$
By grouping the terms, I came up with this expression:
$$\frac{8}{7}\left(\frac{1}{2}+\frac{6}{5}\left(\frac{1}{4}+\frac{4}{3}\left(\frac{1}{6}+\frac{2}{1}\left(\frac{1}{8}\right)\right)\right)\right)=1+\frac{1}{3}+\frac{1}{5}+\frac{1}{7}$$
An interesting thing that I noticed is, if I replace the numbers $\frac{1}{2},\frac{1}{4},\frac{1}{6},\frac{1}{8}$ with ones, it suddenly becomes obvious. We get this identity:
$$\frac{8}{7}+\frac{8}{7}\frac{6}{5}+\frac{8}{7}\frac{6}{5}\frac{4}{3}+\frac{8}{7}\frac{6}{5}\frac{4}{3}\frac{2}{1}=\frac{8}{7}\left(1+\frac{6}{5}\left(1+\frac{4}{3}\left(1+\frac{2}{1}1\right)\right)\right)=8=2+2+2+2$$
Then I wondered, what if I inverse the order of them? And I found that
$$\frac{1}{8}\frac{8}{7}+\frac{1}{6}\frac{8}{7}\frac{6}{5}+\frac{1}{4}\frac{8}{7}\frac{6}{5}\frac{4}{3}+\frac{1}{2}\frac{8}{7}\frac{6}{5}\frac{4}{3}\frac{2}{1}=\frac{8}{7}\left(\frac{1}{8}+\frac{6}{5}\left(\frac{1}{6}+\frac{4}{3}\left(\frac{1}{4}+\frac{2}{1}\left(\frac{1}{2}\right)\right)\right)\right)=\left(1+\frac{1}{1}\right)\left(1+\frac{1}{3}\right)\left(1+\frac{1}{5}\right)\left(1+\frac{1}{7}\right)-1$$
which is very easy to prove.

The proof of the first equation, though, is much harder to construct, because the coefficient changes in the opposite direction of $n$ from the inside to the outside, so each time $n$ changes, the previous expression is no longer a part of the new expression, making the induction argument difficult to find. But after some manipulation, I found the proof, which I posted on the stackexchange page, so I won't repeat it here.
I wonder if more simple sequences can be constructed with other sequences as the coefficients

This reminds me of continued fraction. I think maybe this should have a name, like, continued product? But it seems that this name has already been taken by something different. So, maybe "rolling product"?
A similar concept seems to be Engel expansion, but here we have a way more general form, $a_1(b_1+a_2(b_2+a_3(b_3+\dots)))$, where $a_i$ and $b_j$ are rational, while the result is the sum of a different series. Or maybe this can further be generalized to any real sequences, or even complex sequences? I wonder...

Wednesday, April 10, 2024

IBM Ponder this 2024 03 solution

Problem description can be found here. The solution is posted here.
Not much to talk about this time. The result from $X_1$ to $X_{100}$ is,

1,8,9,9,15,15,15,24,24,24,90,90,90,105,105,105,114,114,114,114,
114,114,225,225,225,225,225,225,264,264,264,264,264,264,300,300,300,300,300,300,
300,300,300,300,300,300,945,945,945,945,945,945,945,945,945,945,945,945,945,945,
945,945,945,945,945,945,945,945,945,945,945,945,945,945,945,945,945,945,945,945,
945,945,945,945,945,945,945,945,5349,5349,5349,5349,5349,5349,5349,5349,5349,5349,5349,5349,

Recording only when the result changes, the indices and values of $X_i$ for $i\leq 300$ are,

inds:
0,1,2,4,7,10,13,16,22,28,34,46,88,100,103,124,157,247,274,283,
vals:
1,8,9,15,24,90,105,114,225,264,300,945,5349,7035,11739,17280,35475,46914,190365,351645,
idiffs:
1,1,2,3,3,3,3,6,6,6,12,42,12,3,21,33,90,27,9,
vdiffs:
7,1,6,9,66,15,9,111,39,36,645,4404,1686,4704,5541,18195,11439,143451,161280,

where "inds" are the indices (0-indexed, so it's $i-1$), "vals" are the values $X_i$, "idiffs" are the differences in the indices, and "vdiffs" are the differences in the values.
There's no obvious pattern here. So, the only way is to test every number.

To find $X_{1000}$, it's alright to use the slow way of generating a list of primes, i.e., testing if the number $n$ divides any primes $\leq \sqrt n$. The result is 115192665.

But this method is way too slow for the bonus question.
To test the primality of a large range of numbers, the most efficient way is also the most ancient way, by sieving. With sieving, the larger the range, the more efficient it is. But what range should I use?
If I use one bit for each number, 8GB RAM can hold $2^{36}$ numbers. Is that enough?

I decided to try that with CUDA. And I found this very nice GPU prime number generating algorithm. But it turns out that the result is beyond $2^{36}$.

Let's take a look at the trend of the values. The following plot is the values vs indices in the result above plus the point (1000, 115192665) in log-log scale:

It seems that it still curves up, so at $n=2000$ it seems that it should be about $10^{11}$, which is about $2^{36.5}$. But it's hard to tell if that's enough, it may very well be much larger. What can I do?
Noticing that with primes list up to $2^n$ as seed, I can sieve primes up to $2^{2n}$. So, since I can already generate primes up to $2^{36}$, theoretically I can use that to generate primes up to $2^{72}$. But there's not enough ram or even storage to hold that many numbers.

Well, I can't hold all the numbers, but I can cut the range into sections, and sieve them one by one.

To save space, I only kept the odd numbers. The shifts, $i(i-1)/2$, are also separated into two lists, the odd ones and the even ones, for quick look up. It's not hard to come up with the formulas, i.e.

odd_sift[i]=(2*i+1)*(i+1-(i&1)), where i $\in$ [0, n/2+((n&3)==3) )
even_shift[i]=(2*i+1)*(i+(i&1)), where i $\in$ [0, n/2+((n&3)==1) )

So if the current number is odd, I only check even shifts, and odd shifts for even numbers.

The algorithm goes like this:

-Generate primes list up to $2^k$ on CPU.
-Using that as seed, generate primes list up to $2^{2k}$ with CUDA.
-Copy that to GPU. With that as seed, I can sieve primes up to $2^{4k}$.
-Determine the sizes of the bit arrays "good_numbers", "sieve_buffer", and integer arrays "odd_sift" and "even_shift", and allocate them on GPU. The sieve range must be greater than the good_numbers range plus $n(n-1)/2$. The larger the range, the more efficient it is. Initialize the bit arrays to 1 bits.
-Use the GPU prime sieving algorithm to sieve each range [sieve_start, sieve_end). Use binary search to find the starting index in the primes list. Also, since the minimum number to multiply is 3, and only odd numbers are considered, the number of threads is only 1/6 of the range.
-Check if the end of the range of good_numbers plus $n(n-1)/2$ is beyond the sieve range. If so, change sieve_start to the current starting point of good_numbers, and sieve this new range.
-For each bit index in good_numbers (offset by the current starting point), check if the number shifted by odd_shift or even_shift is a prime. If it hits a prime, mark the bit to 0.
-Use cub::DeviceReduce::Max() to check if there's any non-zero values in good_numbers. If not, move current start point to the previous end point. Otherwise, the answer is found, return the location of the first 1 bit.
(If cub::DeviceReduce has something like an ArgReduce() function which takes a custom reduce function, this could be done more efficiently, but I didn't find any. I've submitted a feature request, don't know if it'll be added one day.)
(Update: I was informed that the function that matches this description is thrust::find_if. Don't need to worry about the "thrust::device", just pass the C array and it works, like this:

    struct is_non_zero{
        __host__ __device__
        bool operator()(uint64_t x){return x!=0;}
    };
            auto iter=thrust::find_if(thrust::device,good_num_device,good_num_device+good_num_size,is_non_zero());
            int byteshift=iter-good_num_device;

This solves the problem at hand, but still, I think it'd be nice to have an argument version of reduce. Maybe it'll be useful somewhere else.
)

When I wrote this program, I fell into a pitfall. I tested the program for $n\leq 300$ and $n=1000$, it worked pretty well. But when I tried $n=2024$, it just kept going, until my GPU runtime got disconnected. Then I switched to the CPU version, and very quickly, an answer popped out, 1412159754600. I was surprised that the answer wasn't found by the GPU, but it's right after where it left off. I didn't think too much about it, so I just submitted it. Since you've already seen the answer, you probably noticed that this is not the correct answer. Indeed, this is about 12 times larger than the actual answer.

So, where did it go wrong? It took me quite some time to find out where the bug is. It turned out, it was not in the logic. It was not even in my implementation. It was in this line:

uint64_t i = (blockIdx.x * blockDim.x + threadIdx.x)*2+3;

It turned out, there were a lot of indices that $i$ didn't reach, meaning that many composite numbers were not sieved, instead they were left as prime. That's why the program never ended, it saw prime numbers everywhere, and every number hit a prime within only a few steps.
So what's wrong with this line? When I finally tried to output the indices "blockIdx.x, blockDim.x, threadIdx.x", a warning caught my attention. It said that I was trying to print an unsigned int type as an unsigned long long type. So, that's it. The indices, blockIdx.x etc, are all 32 bit unsigned int. But the number of threads can be greater than $2^{32}$. When that happens, it doesn't give a runtime error like a signed integer. It just quietly overflows. Very sneaky. Since these variables are not defined by me, it's not easy for me to check what type they are in.
I unconsciously neglected this possibility because I saw "blockIdx.x * blockDim.x + threadIdx.x" as the index in an array in many CUDA program examples, so I naturally thought, "they must be in size_t type, so that it can be used to represent any index in an array." Well, how wrong I was.
Anyway, at least I learned my lessons. Never assume the type of a variable not defined by myself, always check them. And, CUDA indices are 32 bit unsigned int. Guess I will remember this for the rest of my life...
Anyway, converting blockIdx.x to uint64_t fixed it.

With seed primes up to $2^{20}$, sieve_range=$2^{34}$, good_numbers range=$2^{22}$, the program finds the solution in 2m40s.



Miscellaneous thoughts:

Primality test:
While I was debugging the program, I looked for primality test algorithms to double check the number of steps.
There are a few primality test algorithms available. I heard about the AKS test 10 years ago. This time, I learned more about other algorithms, and I realized that although AKS algorithm is the first deterministic primality test that runs in polynomial time, in practice we may use a faster probabilistic test. The Baillie–PSW primality test is a good candidate. Even though it's probabilistc, no counter-examples have ever been found. I found the FLINT implememtation of the BPSW algorithm and learned how to install and use it. It was pretty easy to use (although the installation is very slow...).

Generalization to other sequences:
In general, we may consider the function
$f(a_0)=\min\{n|a_0+s_n \text{is prime}\}$
We get a different function $f$ for every different sequence $s_n$. The problem above is the case where $s_n=0+1+\dots+(n-1)=n(n-1)/2$.
A similar problem is when $s_n=(n-1)^2$.
A simpler sequence, $s_n=n-1$, results in $f(a_0)$ being the distance from $a_0$ to the next prime number, which gives this list.
We may also consider exponential sequencies, like $s_n=2^{n-1}$.

An interesting sequence is $s_n=(n-1)!$. Here, 0! is better defined as 0, I think. The biggest difference between this sequence and those above is, for any $a_0$, the function $f(a_0)$ with $s_n$ given above should be finite (except when $s_n=k^{n-1}$ and $\gcd(a_0,k)\neq 1$ or both $a_0$ and $k$ are odd), but it's not at all clear where it will stop until we find the solution. But with factorial sequence, it's the opposite: we must have $f(a_0)<m(a_0)$ or $f(a_0)=\infty$, where $m(1)=1$ and for $x\neq 1$, $m(x)$ is the minimum prime factor of $x$.

I did some calculations on what numbers will give infinite $f(a_0)$, grouped by their minimum prime factor.
min prime factor = 2: 8,14,20,24,26,... Which is obvious, those that equals an odd composite minus one.
min prime factor = 3: 33,63,75,...
min prime factor = 5: 295,445,505,665,... I didn't find any up to 100 by hand, so I wrote a program to find them.

To calculate this sequence efficiently, we need a sieve algorithm that leave the minimum prime factor of a number. This is not hard to construct, just initialize the values to uint(-1), and use min() function in the sieving. The problem is when the composite number is a product of very large primes, in which case we must test the primality of $n+k!$ for $k$ from 1 to $p-1$, where $p$ is $n$'s minimum prime factor. The number can be very large and there are a lot of tests to do.

Another similar sequence is the primorial sequence.

I wonder if these functions give any interesting result...

Thursday, March 14, 2024

IBM Ponder this 2024 02 solution

Problem description can be found here. The solution is posted here.
The first thing to do is, finding the probabilities that A wins a round and B wins a round. The range of the sum of the numbers is [5,59]. There are $total=$4*6*8*10*12*20=460800 combinations in total. Now I just need to find out how many combinations result in each sum, which can be done by repeatedly shifting and adding. Then the counts of combinations for A to win are summed, the same is done for B. The result is countA=114399, countB=230400, so the probability that A wins a round is $p_a$=countA/total, and for B it is $p_b$=countB/total. (For the bonus question, countB=116001.)

Next is a Markov process, with 2n+1 states: $N$ - the neutral state, $A_1$ - A wins one round, $A_2$ - A wins two rounds, ..., $A_n$ - A wins $n$ rounds, and $B_1$ to $B_n$ with similar meanings. The transition matrix is not hard to construct, basically, $A_n$ and $B_n$ results in themselves no matter the result (probability=1), for the other $A_i$s, the probability that it goes to $A_{i+1}$ is $p_a$, that it goes to $B_1$ is $p_b$, to $N$ is $1-p_a-p_b$, similar for $B_i$ and $N$.
The goal is to find out the probability that it ends up at $A_n$ or $B_n$ eventually.

The standard way to solve this type of problem is, to do an eigen decomposition on the matrix, and remove any eigen values that's less than one, then multiply the matricies back to get the probability. The reason is simple: After each round, the new probabilities vector is the old one multiplied by the Markov matrix, and the goal is finding the result after multiplying the matrix for infinite times, i.e. taking limit of the power of the matrix to infinity. With eigen decomposition, the pair of inverse matrices cancel out, only the eigen values are powered, and any eigen values that are less than one become zero when the power is taken to infinity.

By doing that, the solution to the first question is easy to find, which is about 0.000167566680157.

The bonus question, though, cannot be solved with this technique. The reason is that there is not enough numerical precision. The method above gives nonsensical results like probability larger than 1, and sum of the probabilities has an order of magnitude $10^{-17}$. It started to break down at around $n=26$, let alone when $n=300$. What should I do, then?

I thought, the reason for the eigen decomposition is to find out the limit of the power of the matrix, but what if I just power the matrix itself and see how that goes? Numerical precision might still be an issue, but it was worth a try.
The naive matrix powering algorithm that repeatedly muliplies the matrix is too slow. The efficient matrix powering algorithm has $O(\log(n))$ complexity, where $n$ is the power. Since I just need to take the limit to infinity, I simply repeatedly squared the matrix. After $n$ squares, the transition matrix $M$ is raised to $M^{2^n}$.
After 65 square operations, the result $p=prA/(prA+prB)$ converges to 0.01525753293679526. But the numerical error still appears, because $prA+prB=2.861639890862583e-163$ instead of 1. I wasn't sure if this invalidates the result. Maybe I can normalize the sum every time after I square it, so that it remains to be a Markov matrix, I thought. And this time, it converged to $p=0.01525753293679551$ after 603 iterations. Using "long double" type in python doesn't change the result by much.

At this point I believed that the result is accurate enough, so I submitted it. Then I got a reply from Ponder This, which informed me that there was a formula of the exact solution. So I took another look and figured it out!

A typical approach to find the final state probabilities is, finding a martingale. A martingale, as I understand it, is like a conserved quantity in physics. In physics, when we want to know the final state of an object, sometimes we don't need to know the dynamics at every moment in the process. Instead, we just need to use the conservation laws, which states that certain quantities must remain the same in the initial state and the final state, which provides a shortcut to the calculation. A martingale in a stochastic process is a conservation of expectation. If the expectation of a quantity doesn't change after one step in the process, then the expectation in the beginning must be the same of that in the end, which makes it very suitable for solving final probabilities.

We may rewrite this problem as a random walk problem. Let's use point 0 to represent the neutral state, and negative points to represent $A_i$, positive points for $B_i$. We start from point 0, and if we roll an A, we move to -1, and if we roll a B, we move to 1, and we stay at 0 if we roll a draw. At any positive (B) point except the last one which is the absorbing state, if we roll an A, we go straight to -1, and if we roll a B, we move to the right by 1, and a draw moves us to 0. Similar for negative points. At the absorbing states, we don't move.

I wasn't able to find a martingale in this model, because the distance that we travel depends on where we are, so the expectation of our location can't be conserved. But then I thought, the expectation of the location depends on both the probability and the distance. The probabilities are fixed, but I'm free to change the distances anyway I want!
Now, if I want the expectation to be conserved, the expectation of the change in the location at any point must be zero. Let the coordinate of $A_1,A_2,\dots,A_n$ be $-a_1,-a_2,\dots,-a_n$ respectively, and similar for $B_i$. Denote $p_a,p_b,p_n$ for the probability that $A$ wins the round, $B$ wins, and a draw respectively. The expectation after one step from point 0 is $p_a(-a_1)+0+p_b(b_1)$ which must equal zero. At $A_1$, the expectation of the displacement after one step is $p_a(a_1-a_2)+p_na_1+p_b(a_1+b_1)$ which again must be zero. Similarly we get $p_a(a_{i-1}-a_i)+p_na_{i-1}+p_b(a_{i-1}+b_1)=0$. To get the equations for $B_i$, simply exchange $a_i$ and $b_i$, $p_a$ and $p_b$. At the two end points $-a_n$ and $b_n$, we never move, so it's guaranteed that the expectation doesn't change. Now, our coordinate is a martingale, because no matter where we are, the expectation of our coordinate after one step is always the same. Which means, if we start from 0, the expectation of our coordinate after infinite steps is still 0. Since we must be either at $-a_n$ with probability $prA$ or at $b_n$ with probability $prB$, we must have $prA\cdot(-a_n)+prB\cdot b_n=0$. Now we just need to find the coordinates of the two end points, which can be solved by solving the recursive equations.

The only constraint on $a_1$ and $b_1$ is the equation at point 0. So, as long as $a_1:b_1=p_b:p_a$, we can scale them by any constant. For simplicity, let's set $a_1=p_b, b_1=p_a$. With $p_a+p_b+p_n=1$, we can simplify the equations to $p_aa_i=a_{i-1}+p_ap_b$.
Let $D_i=a_i-a_{i-1}$, so we have $p_aD_i=D_{i-1}$.
$D_i=\frac{D_1}{p_a^{i-1}}$.
Here we define $D_1=a_1=p_b$. We then have $D_i=\frac{p_b}{p_a^{i-1}}$.
Summing it up,
$$a_n=\sum_{i=1}^n D_i = p_b\frac{\frac{1}{p_a^{n}}-1}{\frac{1}{p_a}-1}$$
The final result, the probability that $A$ first wins $n$ consecutive rounds, is
$$prA=\frac{b_n}{a_n+b_n}$$
$$=\frac{1}{\frac{a_n}{b_n}+1}$$
$$=\frac{1}{\frac{p_b}{p_a}\frac{\frac{1}{p_a^n}-1}{\frac{1}{p_a}-1}\frac{\frac{1}{p_b}-1}{\frac{1}{p_b^n}-1} +1}$$
$$=\frac{1}{\frac{1-p_b}{1-p_a}\frac{\frac{1}{p_a^n}-1}{\frac{1}{p_b^n}-1} +1}$$
Plugging in $p_a,p_b,n$, we get the result.

Tuesday, February 20, 2024

A Formula for Accumulated XOR Operations in a Continuous Range (LC E268)

I came up with this formula when I was solving E268. Missing Number again.

Description:
Given an array nums containing n distinct numbers in the range [0, n], return the only number in the range that is missing from the array.
Constraints:
    n == nums.length
    1 <= n <= $10^4$
    0 <= nums[i] <= n
    All the numbers of nums are unique.


Here is my original solution:

```
class Solution {
public:
    int missingNumber(vector<int>& nums) {
        int s=0,l=nums.size();
        for(int &i:nums)s+=i;
        return l*(l+1)/2-s;
    }
};
```

And of course there's the explicit sum solution, without using the formula:

```
class Solution {
public:
    int missingNumber(vector<int>& nums) {
        int result = nums.size();
        for (int i = 0; i < nums.size(); ++i) {
            result += i;
            result -= nums[i];
        }
        return result;
    }
};
```

But it should be slower than using the formula, since there are n additions, instead of one multiplication and one division.

And then I saw the bitwise operation solution, like this:

```
class Solution {
public:
    int missingNumber(vector<int>& nums) {
        int result = nums.size(),i=0;
        for(int num:nums){
            result ^= num;
            result ^= i;
            i++;
        }
        return result;
    }
};
```

Which avoids possible integer overflows. But it suffers from the same issue as the explicit summation solution: it takes O(n) operations to find the accumulated result.
So, I wonder, is there an easier way to get the accumulated result?
And I found a surprisingly simple form of it. I'm surprised nobody mentioned it so far (at least not in the top solutions).
Let's see what the accumulated results are for n=0,1,2,...
      n   Xor from 0 to n
      0    0
      1    1
     10   11
     11    0
    100  100
    101    1
    110  111
    111    0
   1000 1000
   
There is clearly a pattern. After giving it some thought, it's not hard to see why the result is simply a 1 or 0 if n is odd. If you xor an odd number with its previous number, you'll just get a 1. So it simply depends on how many 1s are there. So the result is 1 if n=4k+1, and 0 if n=4k+3.
When n is even, we just need to xor it with its previous accumulated result, which ends with an odd number.

So, we can get the accumulated result in O(1), similar to the summation with the formula.

```
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
class Solution {
public:
    int missingNumber(const vector<int>& nums) {
        int s,n=nums.size();
        switch(n&3){
            case 0:s=n;break;
            case 1:s=1;break;
            case 2:s=n^1;break;
            case 3:s=0;
        }
        for(int i:nums)s^=i;
        return s;
    }
};
```

This can easily be generalized to the accumulation of an arbituary continuous range of non-negative integers [a,b], it's just XorAccumulated(a-1)^XorAccumlated(b).
Actually, it works for negative range as well, given the nice properties of two's complement expression. So this works for all integer ranges.