Sunday, October 29, 2023

Probability of line segments intersecting - A surprisingly simple result

I was drawing random stuff on a piece of paper, suddenly this problem came into my mind:
If I draw a length 1 line segment randomly, then draw another one, what's the probability that they'll intersect?
Of course this description is ambiguous. To give a better description, we need to define things more accurately. First, we should limit the area that we can draw the line segments in. Second, we should define what it means by "drawing a line segment randomly". To avoid annoying boundary situations, let's use periodic boundary conditions. But then, the size can't be too small, otherwise we may encounter complicated situations like crossing multiple times. Apparently, if the second line segment intersects the first, both of the end points of the second line segment must lie within the envelope curve that has distance 1 to the first line segment. This shape is called a "stadium", where a=r=1 in this case. If we choose a rectangular area w×l where w>=2 and l>=2, multiple intersections won't happen. The formal definition of the problem is as the following:
Consider a rectangular area of size w×l with periodic boundary conditions, where w>=2,l>=2. Draw two length 1 line segments randomly. Here, by randomly it means, first choose a random point with uniform distribution over the rectangular area, then from the circle that is centered at the first point with radius one, randomly choose a point with uniform distribution. These two points are the end points of the line segment. Question: What's the probability that the two line segments intersect?
Let's solve this problem! Because of the periodic boundary conditions, linear translations don't matter. We can move the first line segment to the center of the rectangle area. Then, the probability is only none zero if the second line segment lies within the stadium shape centered at the first line segment. It doesn't matter how the stadium shape is oriented, so we can assume that the first line segment lies horizontally. To find the probability, we can just integrate the probability that the second line segment intersects with the first one for all possible first end points of the second line segment. The area outside the stadium has zero probability density. Due to the symmetry of the shape, we only need to consider a quarter of the stadium, and multiply the answer by 4. The situations are different depending on which part of the stadium shape the first end point is in. We can put them in 4 parts:
(Actually, P1 and P3 can be combined in the calculations in polar coordinates. I started with Cartesian, so I didn't notice until later.) The probability is p=1A×4×12π(p1+p2+p3+p4), where p1=01rdr0π2θsin1(rsinθ)dθ p2=121dx01x2tan1xy+tan11xydy =1201rdrsin1r2π2θ+tan11rsinθrcosθdθ p3=01rdr0sin1r2θ+cos1(rcosθ)dθ p4=012dx1x212cos1y dy (We may combine p1 and p3 and just calculate 01rdrπ2sin1r2θ+cos1(rcosθ)dθ instead.) I'm not sure if Cartesian or polar coordinate system is easier to manually integrate them... I put them on wolfram alpha, and here's the result: p1=14 p2=0.583664 p3=0.155138 p4=0.011197=172(2733ππ2) Then I added them together, the summation is... 0.999999. Well, this can't be a coincidence, can it? I'll get the exact result when I have more time... So, the probability is 2πA, where A is the area that the line segment "occupies". I quickly wrote a program to simulate the problem in a 3×3 square. Averaging 100000000 results, the probability is 0.0707416, where the theoretical value is 29π=0.07073553. This reminds me of the result of Buffon's needle problem, which has probability 2lπt, where l is the length of the needles and t is the spacing between the lines, lt. This result can be taken as a generalization of Buffon's needle problem. It can be described as
Assuming there is a random uniform distribution with density 1A of line segments with length 1 on the 2D plane. More precisely, the distribution is, the middle points of the line segments is uniformly distributied on the plane with density 1A and the angle is also uniformly distributed from 0 to 2π. Then, if we draw another line segment of length one randomly, as described in the beginning, the expected number of intersections between this line segment and the line segments in the "background" is 2πA.
I wonder if this can be further generalized. An obvious first step is, changing the size of the line segments, which probably will give something similar to the result of Buffon's needle problem. How else can we generalize this? 2023.10.30 Update: I think, indeed, the Buffon's needle problem can be seen as a special case of this problem. If we see each line as an infinite chain of length 1 line segments, and the distance between the lines is t, then each line segment occupies an area t, thus the expectation of the number of intersections between a length 1 needle with the lines is 2πA=2πt. Due to the additivity of expectation, a length l needle will have expectation 2lπt. When lt, there can be at most one intersection, thus this is the probability of them intersecting. Regarding the integrations, Claude Leibovici gives the exact expressions to them, except for p3 which is still based on numerical evidence (albeit a very strong one). It seems very complicated, so I thought maybe it would be easier to do it in the other direction. If we integrate r first instead,the interval of θ is [0,π6] and the interval of r is [2sinθ,1]. The integration is p3=0π6dθ2sinθ1(θ+cos1(rcosθ))rdr =0π6θ2(14sin2θ)+[r2cos2θ2cos1(rcosθ)+14sin1(rcosθ)rcosθ41r2cos2θ]2sinθ11cos2θdθ =0π6θ2(14sin2θ)+(θcos2θ2+14(π2θ)sinθcosθ4sin22θ2(π22θ)14(2θ)+sin2θcos2θ4)1cos2θdθ I got lazy again and just lumped it into wolfram alpha. This time it gives the exact result, 136(9+33π2π2). Combined with the result p2=172(933π+5π2) and the other two known results, I think the proof that the probability equals 2πA is complete.

Wednesday, October 18, 2023

The long journey from O(n^2) to O(n^1.5), Leetcode H1269. Number of Ways to Stay in the Same Place After Some Steps

H1269 description:
You have a pointer at index 0 in an array of size arrLen. At each step, you can move 1 position to the left, 1 position to the right in the array, or stay in the same place (The pointer should not be placed outside the array at any time). Given two integers steps and arrLen, return the number of ways such that your pointer is still at index 0 after exactly steps steps. Since the answer may be too large, return it modulo 109+7. Constraints: 1 <= steps <= 500 1 <= arrLen <= 106
An O(stepsmin(steps,arrLen)) time complexity, O(min(steps,arrLen)) space complexity solution is not hard to find, which is the typical approach and featured in the Editorial.

Basically we build up a table from the first step to the last step, where each row records the number of ways to be in that position at that step. It's easy to see that one can move at most steps/2 steps to the right, so if arrLen>steps/2 we just need to calculate step/2+1 positions. And since the position after each step only depends on the position at the previous step, one can use just two arrays to represent them and switch them after each step.

Here is a slightly improved version based on that approach:
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
class Solution {
public:
    int numWays(int steps, int arrLen){
        if(arrLen<=2){
            if(arrLen==1) return 1;
            int p=(steps-1)/63,q=(steps-1)%63;
            unsigned long long r=(1ULL<<q)%mod,c=(1ULL<<63)%mod;
            while(p--) r=(r*c)%mod;
            return r;
        }
        int b=min(steps/2+1,arrLen),tmp2;
        unsigned long long tmp1;
        memset(cnt,0,b*sizeof(cnt[0]));
        cnt[0]=1;
        for(int i=0;i<steps;++i){
            tmp1=cnt[0];
            cnt[0]=(cnt[0]+cnt[1])%mod;
            int s=min(min(i+1,steps-i),b-1);
            for(int j=1;j<s;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
            cnt[s]=(tmp1+cnt[s])%mod;
        }
        return cnt[0];
    }
};
Since the new position only depends on the 3 array elements of the previous step which is a fixed number, we don't really need to use an array to store that. Just use two integers to store two of the elements and roll them forward to update the single array. This saves half of the space.

Also noticing that the shape of the relevant elements in the array forms a triangle (first row has one non-zero element, second row has two, ..., after halfway we must go back, so the position afterwards are irrelevant and there's no need to comput them... second to last row has two relevant elements, the last has one), we can skip those unnecessary calculations, which saves at most half of the time.

When arrLen==2, the result has a very simple form: 2steps1, so we just need to find 2steps1mod109+7. The power function can be done in O(logn), but here since steps<=500, if we just factor 2steps1 into power of 263 and the remainder, it can be done in at most 8 operations, which is probably more efficient than using a power function.

This is an optimization to the typical approach. So far so good.

But, can this be optimized further?

Let's see what actually happens when we evolve the array after each step. aj+1[0]=aj[0]+aj[1], aj+1[1]=aj[0]+aj[1]+aj[2], ...
If we write the array as a column vector, the new vector is the old one multiplied by a matrix:
aj+1=Haj
where H is
1 1 0 0 0 ... 0
1 1 1 0 0 ... 0
0 1 1 1 0 ... 0
0 0 1 1 1 ... 0
...
0 ... 0 1 1 1 0
0 ... 0 0 1 1 1
0 ... 0 0 0 1 1
After s operations, the resulting vector is as=Hsa0, where a0 is a vector whose first element equals one and the rest zero. The answer to the problem is just as[0], which equals the product of the first row of Hs and a0. Since the only non-zero element of a0 is the first element which is one, the answer is just Hs[0][0].

A simple implementation of matrix multiplication has O(n3) complexity. Strassen algorithm can bring it down to O(nlog27)O(n2.81), and recent research brought the bound to as low as O(n2.372). But they are considered galactic algorithms. The matrix H is at most 500*500, even the not-so-complicated Strassen algorithm probably won't accelerate it by much. Furthermore, it must be multiplied by O(logs) to include the complexity of computing the power s. So this does not improve the performance at all.

But noticing that the matrix is a symmetric tridiagonal toeplitz matrix, maybe we can exploit this symmetry and find a better algorithm.

A Hermitian matrix is always diagonalizable, H=PDP, where D is a diagonal matrix with elements equal to the eigen values of H, and P is a unitary matrix P1=P. If H is symmetric, P is orthogonal, P1=PT. This property makes it very easy to calculate powers of Hermitian/symmetric matrices, since all the P and P1 in the middle cancel out, Hs=PDsP, and Ds is very easy to calculate - just raise every element on the diagonal to power s.

This property alone doesn't help much, since eigendecomposition typically has the same complexity as matrix multiplication. The only thing it improves is, the power is easy to solve, so the logs factor is gone. This is not good enough.
  
But this matrix is not only symmetric, it's also Toeplitz. A Toeplitz matrix can be decomposed in O(n2). A tridiagonal Toeplitz matrix has known eigen values, namely a+2bccos(kπn+1),k=1,,n, where a=b=c=1 in this case.

Since Ds is diagonal, it's easy to see that Hs[0][0]=iP[0][i]Ds[i][i]PT[i][0]=iP2[0][i]Ds[i][i]. We already have the eigen values, now if we can find the first row of the matrix P without actually decomposing the matrix, it would be an O(n) algorithm!

Here are the eigen values and the first rows of the P of various sizes of H that I found, the first rows are the sizes, the second rows are the eigen values, the third rows are the first row of the corresponding P:
(If the table below is all messed up, try refreshing. For speed (I guess) or whatever reason, sometimes browsers don't load the scripts in order, which messes up the table.)
3*3
1+2, 1, 12
1/2, 1/2, 1/2
4*4
(3+5)/2, (1+5)/2, (15)/2, (35)/2
1/5+5, 1/55, 1/5+5, 1/55
5*5
1+3, 2, 1, 13, 0
1/23, 1/2, 1/3, 1/23, 1/2
6*6
2.80194, 2.24698 , 1.44504 , -0.801938, 0.554958, -0.24698
0.231921 , -0.417907, 0.521121, 0.231921, 0.521121, 0.417907
7*7
2.84776 , 2.41421 , 1.76537 , 1, -0.847759 , -0.414214, 0.234633
0.191342 , 0.353553 , 0.46194, 0.5 , 0.191342, 0.353553, 0.46194
Noticing the symmetry of the elements in P and the corresponding eigen values, remembering that the eigen values are 1+2cosθ, it's easier to tell by returning the eigen values back to the cos value, i.e., subtract 1 from the values and then divide by 2.
3*3
2/2 , 0, 2/2
1/2 , 1/2, 1/2
4*4
(1+5)/4, (1+5)/4, (15)/4, (15)/4
1/5+5, 1/55, 1/5+5, 1/55
5*5
3/2, 1/2, 0, 3/2, 1/2
1/23, 1/2, 1/3, 1/23, 1/2
6*6
1.80194/2, 1.24698/2 , 0.44504/2 , -1.801938/2, -0.44504/2, -1.24698/2
0.231921 , -0.417907, 0.521121, 0.231921, 0.521121, 0.417907
7*7
1.84776/2 , 1.41421/2 , 0.76537/2 , 0, -1.847759/2 , -1.414214/2, -0.76537/2
0.191342 , 0.353553 , 0.46194, 0.5 , 0.191342, 0.353553, 0.46194
I noticed that for 3*3 5*5 and 7*7 matrices, the elements corresponding to 0 are 1/2,1/3,1/4 respectively, and for 4*4 all elements have a factor of 5, so there could be a factor of 1/n+1. I guessed that it may also have something to do with sinθ, since as the absolute value of the eigen value decreases, the element value increases. And I figured it out!
The formula is,
θi=iπn+1,i=1,2,...n
ei=1+2cosθi
pi=2n+1sinθi
ans=ieispi2,s=steps
The following code implements this algorithm. The only problem now is, the result could be very large, so how do we find the remainder of it mod109+7 accurately?
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
class Solution {
public:
    int numWays(int steps, int arrLen){
        if(arrLen<=2){
            if(arrLen==1) return 1;
            int p=(steps-1)/63,q=(steps-1)%63;
            unsigned long long c=(1ULL<<63)%mod,r=(1ULL<<q)%mod;
            while(p--) r=(r*c)%mod;
            return r;
        }
        int b=min(steps/2+1,arrLen),tmp2;

        //new algorithm
        double r=0.0;
        for(int i=1;i<=b;++i){
            double c=cos(M_PI*i/(b+1));
            r+=pow(1+2*c,steps)*(1-pow(c,2));
        }
        r=r*2/(b+1);
        if(r<mod) return round(r);

        //may not work due to rounding errors if result is too large, so back to old algorithm
        unsigned long long tmp1;
        memset(cnt,0,b*sizeof(cnt[0]));
        cnt[0]=1;
        for(int i=0;i<steps;++i){
            tmp1=cnt[0];
            cnt[0]=(cnt[0]+cnt[1])%mod;
            int s=min(min(i+1,steps-i),b-1);
            for(int j=1;j<s;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
            cnt[s]=(tmp1+cnt[s])%mod;
        }
        return cnt[0];
    }
};
Simply changing to 
        r=remainder(r,mod);
        return round(r);
gives numeric errors:
Input
steps = 47
arrLen = 38
Output 318653590
Expected 318671228
(In this example, the result is "r=1.149565e+20", it's accurate up to the 15th decimal place.
Which means we can safely do that when the result is less than 1015 which is about 250, so we can change
if(r<mod) return round(r);
to
if(r<(1ULL<<50)) return round(fmod(r,mod));//fmod() avoids negative results that remainder() may return
and the result is precise enough.)

So it requires too much precision for very large outputs. But in general, it gives a very good approximation in O(n) complexity. Amazing!

But can we improve the complexity for the original problem? Let's go down this path further and see where it leads us!

If we apply the trivial result that answer=1 when steps=1, we get an interesting identity,
i(1+2cosθi)sin2θi=n+12
,where θi=iπn+1,i=1,2,,n.
One way to prove it is, simply write cosθk=(eiθk+eiθk)/2 and sinθk=(eiθkeiθk)/2i and expand the expression, most of the summations cancel out, only the number 1/2 repeats n+1 times, thus completes the proof.

There is an easier way to prove it, though. Noticing that cos(πθ)=cosθ and sin(πθ)=sinθ, the terms cosθksin2θk cancel out! Now we're left with ksin2θk=1cos(2θk)2=n212cos(2θk). cos(2θk) can be taken as the real part of ei2θk, where θk=kπn+1,k=1,,n. So it becomes k=1neik2πn+1. It would make a full circle if we include k=0 which contributes an extra 1, thus the summation must equal -1. This gives the expected result n+12.

Can we generalize this to the general case with power s? Let's find out!

The function is zero for θ=0, so adding k=0 doesn't change the result.
Then, if we also include θk=kπn+1 where k=n+1,n+2,2n+1, they are just the repeat of the values for k=0,1,n, thus the result is doubled.
Applying these changes and express the sin and cos in exponents, we get
ans=2n+1k=02n+1(eiθk+eiθk+1)s(e2iθk+e2iθk2)/8
If we expand it as a polynomial of eiθk, ans=kmameimθk, most of the terms will cancel out after the summation of k, since the summation would become (1ei(2n+2)mπn+1)/(1eimπn+1), which is zero unless mn+1 is a multiple of 2, in which case the summation is 2n+2 since it's 1 repeated 2n+2 times.

Thus, only the terms with power m being a multiple of 2n+2 contributes to the result. Each one of those contributs am(2n+2). How do we find am? The coefficient of the term eimθk in (eiθk+eiθk+1)s is easy to find, namely j=0sm2(sj)(sjj+m). Let's denote that by c(m). Multiplying it by e2iθk+e2iθk2 will shift it to the powers m+2, m2 and m respectively.
Thus, the result is
ans=12(m2c(m)c(m+2)c(m2))
where c(m)=j=0sm2(sj)(sjj+m).

Then, what are the ms? Since the power can be at most s, and m must be a multiple of 2n+2, the ratio must be within [s2n+2,s2n+2]. We can then just iterate through that and multiply by 2n+2. This completes the algorithm.

This algorithm has complexity O(stepsmax(steps/arrLen,1)). Interestingly, one would expect the complexity to grow with arrLen, but with this algorithm it's the opposite. When arrLen>steps, the old algorithm at the beginning takes O(steps2) while this one takes O(steps)! If we combine this algorithm and the old algorithm which has complexity O(stepsmin(steps,arrLen)), i.e., we use the new algorithm for large arrLen and the old one for small arrLen, we get an O(step1.5) algorithm! We choose the new one if arrLen>steps and the old one otherwise.

(Building the binomial table has complexity O(n2) but it is not included in the complexity, since it only needs to be built once. For multiple requests the amortized complexity can be arbitrarily low. Explictly, if the function is called k times, while the maximum of the number of steps is N, then the amortized complexity is N2k. If kN, the amortized comlexity is still O(N1.5).
On the other hand, one could use a factorial and inverse of factorial table instead, which can be built in O(N), and (ab) can be calculated as a!(b!)1(ab)!1, at the cost of 2 extra mulitplication operations on each call to binom. But this makes it exactly O(N1.5).
)

The following code implements this algorithm:
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
vector<vector<int>> binom{{1}};
void buildBinom(int n){
    for(int i=binom.size()-1;i<=n;++i){
        vector<int> v;
        v.reserve(i+2);
        v.emplace_back(1);
        const auto& b=binom.back();
        for(int j=0;j<i;++j) v.emplace_back((b[j]+b[j+1])%mod);
        v.emplace_back(1);
        binom.emplace_back(move(v));
    }
}
int getCoe(int s,int j,int t){
    if(j+t<0||s<2*j+t) return 0;
    return (long long)binom[s-j][j+t]*binom[s][j]%mod;
}
class Solution {
public:
    int numWays(int steps, int arrLen){
        if(arrLen<=2){
            if(arrLen==1) return 1;
            int p=(steps-1)/63,q=(steps-1)%63;
            unsigned long long c=(1ULL<<63)%mod,r=(1ULL<<q)%mod;
            while(p--) r=(r*c)%mod;
            return r;
        }
        if(steps==1) return 1;
        if(arrLen>steps||arrLen*arrLen>steps){
            buildBinom(steps);
            long long t=0;
            int tmp=2*arrLen+2;
            for(int m=-steps/tmp*tmp;m<=steps;m+=tmp){
                int upper=(steps-m)/2+2;
                for(int j=0;j<=upper;++j){
                    t=(t+2*getCoe(steps,j,m)-getCoe(steps,j,m+2)-getCoe(steps,j,m-2))%mod;
                }
            }
            return ((t*500000004)%mod+mod)%mod;
        }
        int tmp2;
        unsigned long long tmp1;
        memset(cnt,0,arrLen*sizeof(cnt[0]));
        cnt[0]=1;
        for(int i=0;i<steps;++i){
            tmp1=cnt[0];
            cnt[0]=(cnt[0]+cnt[1])%mod;
            for(int j=1;j<arrLen;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
            cnt[arrLen]=(tmp1+cnt[arrLen])%mod;
        }
        return cnt[0];
    }
};
Ah, one more thing. The number 500000004 here acts as 21, because we can't simply divide t by 2. To divide a number modq, we must multiply the inverse of the element. Now, the O(n1.5) algorithm is complete.

To get rid of this 21, noticing the symmetry of the coefficients c(m)=c(m), it's easy to see that the sum of the terms corresponding to m and m are the same. With this, the middle part can be written as
        if(arrLen>steps||arrLen*arrLen>steps){
            buildBinom(steps);
            long long t=0;
            for(int j=0;j<=steps/2;++j) t=(t+getCoe(steps,j,0)-getCoe(steps,j,2))%mod;
            int tmp=2*arrLen+2;
            for(int m=tmp;m<=steps;m+=tmp){
                int upper=(steps-m)/2+2;
                for(int j=0;j<=upper;++j){
                    t=(t+2*getCoe(steps,j,m)-getCoe(steps,j,m+2)-getCoe(steps,j,m-2))%mod;
                }
            }
            return (t+mod)%mod;
        }
which also saves the time by half.


I really enjoyed the journey to this surprising result. It reminds me of the first time I learned about Strassen algorithm. The fact that "this is achievable!" and seeing it unveils in front of me is so satisfying!

(ps: Just learned that the solution for the case when arrLen>=steps/2+1 is known as the Motzkin numbers, which has an recurrence relation: Mn=2n+1n+2Mn1+3n3n+2Mn2. This gives an O(n) algorithm without the need to build the binomial table for this special case. The only downside is, we must find (i+2)1mod1e9+7 for i=1steps, which takes about 30 operations for each of them by finding (i+2)1e9+5. But it's still a really good algorithm. I wonder if this can be generalized for the "correction terms" above, i.e., if there is a recurrence relation for Mn,m=c(m)(c(m2)+c(m+2))/2. The Motzkin numbers are the Mn,0 here. If so, the binomial table will not be necessary.)

A Recursive Geometric Proof for 1962 IMO Problem 6

Youtube recommended a video to me, where the following problem is solved: Consider an isosceles triangle. Let R be the radius of its circ...