H1269 description:
You have a pointer at index 0 in an array of size arrLen. At each step, you can move 1 position to the left, 1 position to the right in the array, or stay in the same place (The pointer should not be placed outside the array at any time).
Given two integers steps and arrLen, return the number of ways such that your pointer is still at index 0 after exactly steps steps. Since the answer may be too large, return it modulo .
Constraints:
1 <= steps <= 500
1 <= arrLen <=
An time complexity, space complexity solution is not hard to find, which is the typical approach and featured in the Editorial.
Basically we build up a table from the first step to the last step, where each row records the number of ways to be in that position at that step. It's easy to see that one can move at most steps to the right, so if we just need to calculate positions. And since the position after each step only depends on the position at the previous step, one can use just two arrays to represent them and switch them after each step.
Here is a slightly improved version based on that approach:
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
class Solution {
public:
int numWays(int steps, int arrLen){
if(arrLen<=2){
if(arrLen==1) return 1;
int p=(steps-1)/63,q=(steps-1)%63;
unsigned long long r=(1ULL<<q)%mod,c=(1ULL<<63)%mod;
while(p--) r=(r*c)%mod;
return r;
}
int b=min(steps/2+1,arrLen),tmp2;
unsigned long long tmp1;
memset(cnt,0,b*sizeof(cnt[0]));
cnt[0]=1;
for(int i=0;i<steps;++i){
tmp1=cnt[0];
cnt[0]=(cnt[0]+cnt[1])%mod;
int s=min(min(i+1,steps-i),b-1);
for(int j=1;j<s;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
cnt[s]=(tmp1+cnt[s])%mod;
}
return cnt[0];
}
};
```cpp
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
class Solution {
public:
int numWays(int steps, int arrLen){
if(arrLen<=2){
if(arrLen==1) return 1;
int p=(steps-1)/63,q=(steps-1)%63;
unsigned long long r=(1ULL<<q)%mod,c=(1ULL<<63)%mod;
while(p--) r=(r*c)%mod;
return r;
}
int b=min(steps/2+1,arrLen),tmp2;
unsigned long long tmp1;
memset(cnt,0,b*sizeof(cnt[0]));
cnt[0]=1;
for(int i=0;i<steps;++i){
tmp1=cnt[0];
cnt[0]=(cnt[0]+cnt[1])%mod;
int s=min(min(i+1,steps-i),b-1);
for(int j=1;j<s;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
cnt[s]=(tmp1+cnt[s])%mod;
}
return cnt[0];
}
};
```
Since the new position only depends on the 3 array elements of the previous step which is a fixed number, we don't really need to use an array to store that. Just use two integers to store two of the elements and roll them forward to update the single array. This saves half of the space.
Also noticing that the shape of the relevant elements in the array forms a triangle (first row has one non-zero element, second row has two, ..., after halfway we must go back, so the position afterwards are irrelevant and there's no need to comput them... second to last row has two relevant elements, the last has one), we can skip those unnecessary calculations, which saves at most half of the time.
When , the result has a very simple form: , so we just need to find . The power function can be done in , but here since , if we just factor into power of and the remainder, it can be done in at most 8 operations, which is probably more efficient than using a power function.
This is an optimization to the typical approach. So far so good.
But, can this be optimized further?
Let's see what actually happens when we evolve the array after each step. , , ...
If we write the array as a column vector, the new vector is the old one multiplied by a matrix:
where is
1 1 0 0 0 ... 0
1 1 1 0 0 ... 0
0 1 1 1 0 ... 0
0 0 1 1 1 ... 0
...
0 ... 0 1 1 1 0
0 ... 0 0 1 1 1
0 ... 0 0 0 1 1
```
1 1 0 0 0 ... 0
1 1 1 0 0 ... 0
0 1 1 1 0 ... 0
0 0 1 1 1 ... 0
...
0 ... 0 1 1 1 0
0 ... 0 0 1 1 1
0 ... 0 0 0 1 1
```
After operations, the resulting vector is , where is a vector whose first element equals one and the rest zero. The answer to the problem is just , which equals the product of the first row of and . Since the only non-zero element of is the first element which is one, the answer is just .
A simple implementation of matrix multiplication has complexity. Strassen algorithm can bring it down to , and recent research brought the bound to as low as . But they are considered galactic algorithms. The matrix is at most 500*500, even the not-so-complicated Strassen algorithm probably won't accelerate it by much. Furthermore, it must be multiplied by to include the complexity of computing the power . So this does not improve the performance at all.
But noticing that the matrix is a symmetric tridiagonal toeplitz matrix, maybe we can exploit this symmetry and find a better algorithm.
A Hermitian matrix is always diagonalizable, , where is a diagonal matrix with elements equal to the eigen values of , and is a unitary matrix . If is symmetric, is orthogonal, . This property makes it very easy to calculate powers of Hermitian/symmetric matrices, since all the and in the middle cancel out, , and is very easy to calculate - just raise every element on the diagonal to power .
This property alone doesn't help much, since eigendecomposition typically has the same complexity as matrix multiplication. The only thing it improves is, the power is easy to solve, so the factor is gone. This is not good enough.
But this matrix is not only symmetric, it's also Toeplitz. A Toeplitz matrix can be decomposed in . A tridiagonal Toeplitz matrix has known eigen values, namely , where in this case.
Since is diagonal, it's easy to see that . We already have the eigen values, now if we can find the first row of the matrix without actually decomposing the matrix, it would be an algorithm!
Here are the eigen values and the first rows of the of various sizes of that I found, the first rows are the sizes, the second rows are the eigen values, the third rows are the first row of the corresponding :
(If the table below is all messed up, try refreshing. For speed (I guess) or whatever reason, sometimes browsers don't load the scripts in order, which messes up the table.)
6*6 |
|
|
|
|
|
2.80194, |
2.24698 , |
1.44504 , |
-0.801938, |
0.554958, |
-0.24698 |
0.231921 , |
-0.417907, |
0.521121, |
0.231921, |
0.521121, |
0.417907 |
7*7 |
|
|
|
|
|
|
2.84776 , |
2.41421 , |
1.76537 , |
1, |
-0.847759 , |
-0.414214, |
0.234633 |
0.191342 , |
0.353553 , |
0.46194, |
0.5 , |
0.191342, |
0.353553, |
0.46194 |
| 3*3 | | |
|:--:|:--:|:--:|
| , | 1, | |
| 1/2, | , | 1/2 |
|4*4| | | |
|:--:|:--:|:--:|:--:|
|,| ,|,||
|,|,|,||
|5*5| | | | |
|:--:|:--:|:--:|:--:|:--:|
|,|,|,|,||
|,|,|,|,||
|6*6| | | | | |
|:--:|:--:|:--:|:--:|:--:|:--:|
|2.80194,| 2.24698 ,| 1.44504 ,| -0.801938,| 0.554958,| -0.24698|
|0.231921 ,|-0.417907,| 0.521121,| 0.231921,| 0.521121,| 0.417907|
7*7| | | | | | |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|2.84776 ,| 2.41421 ,| 1.76537 ,|1,| -0.847759 ,|-0.414214,| 0.234633|
|0.191342 ,|0.353553 ,|0.46194,| 0.5 ,| 0.191342,| 0.353553,| 0.46194|
Noticing the symmetry of the elements in and the corresponding eigen values, remembering that the eigen values are , it's easier to tell by returning the eigen values back to the value, i.e., subtract 1 from the values and then divide by 2.
6*6 |
|
|
|
|
|
1.80194/2, |
1.24698/2 , |
0.44504/2 , |
-1.801938/2, |
-0.44504/2, |
-1.24698/2 |
0.231921 , |
-0.417907, |
0.521121, |
0.231921, |
0.521121, |
0.417907 |
7*7 |
|
|
|
|
|
|
1.84776/2 , |
1.41421/2 , |
0.76537/2 , |
0, |
-1.847759/2 , |
-1.414214/2, |
-0.76537/2 |
0.191342 , |
0.353553 , |
0.46194, |
0.5 , |
0.191342, |
0.353553, |
0.46194 |
|3*3| | |
|:--:|:--:|:--:|
| ,| 0,| |
|1/2 ,|,| 1/2|
|4*4| | | |
| :--:|:--: | :--:|:--:|
|,|,|,||
|,|,|,||
|5*5| | | | |
| :--:|:--: | :--: |:--:|:--:|
|,|,|,|,||
|,|,|,|,||
|6*6| | | | | |
| :--: | :--: |:--: |:--:|:--:|:--:|
|1.80194/2,| 1.24698/2 ,| 0.44504/2 ,| -1.801938/2,| -0.44504/2,| -1.24698/2|
|0.231921 ,|-0.417907,| 0.521121,| 0.231921,| 0.521121,| 0.417907|
7*7| | | | | | |
| :--: | :--: |:--: |:--:|:--:|:--:|:--:|
|1.84776/2 ,| 1.41421/2 ,| 0.76537/2 ,|0,| -1.847759/2 ,|-1.414214/2,| -0.76537/2|
|0.191342 ,|0.353553 ,|0.46194,| 0.5 ,| 0.191342,| 0.353553,| 0.46194|
I noticed that for 3*3 5*5 and 7*7 matrices, the elements corresponding to are ,, respectively, and for 4*4 all elements have a factor of , so there could be a factor of . I guessed that it may also have something to do with , since as the absolute value of the eigen value decreases, the element value increases. And I figured it out!
The formula is,
The following code implements this algorithm. The only problem now is, the result could be very large, so how do we find the remainder of it accurately?
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
class Solution {
public:
int numWays(int steps, int arrLen){
if(arrLen<=2){
if(arrLen==1) return 1;
int p=(steps-1)/63,q=(steps-1)%63;
unsigned long long c=(1ULL<<63)%mod,r=(1ULL<<q)%mod;
while(p--) r=(r*c)%mod;
return r;
}
int b=min(steps/2+1,arrLen),tmp2;
double r=0.0;
for(int i=1;i<=b;++i){
double c=cos(M_PI*i/(b+1));
r+=pow(1+2*c,steps)*(1-pow(c,2));
}
r=r*2/(b+1);
if(r<mod) return round(r);
unsigned long long tmp1;
memset(cnt,0,b*sizeof(cnt[0]));
cnt[0]=1;
for(int i=0;i<steps;++i){
tmp1=cnt[0];
cnt[0]=(cnt[0]+cnt[1])%mod;
int s=min(min(i+1,steps-i),b-1);
for(int j=1;j<s;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
cnt[s]=(tmp1+cnt[s])%mod;
}
return cnt[0];
}
};
```
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
class Solution {
public:
int numWays(int steps, int arrLen){
if(arrLen<=2){
if(arrLen==1) return 1;
int p=(steps-1)/63,q=(steps-1)%63;
unsigned long long c=(1ULL<<63)%mod,r=(1ULL<<q)%mod;
while(p--) r=(r*c)%mod;
return r;
}
int b=min(steps/2+1,arrLen),tmp2;
//new algorithm
double r=0.0;
for(int i=1;i<=b;++i){
double c=cos(M_PI*i/(b+1));
r+=pow(1+2*c,steps)*(1-pow(c,2));
}
r=r*2/(b+1);
if(r<mod) return round(r);
//may not work due to rounding errors if result is too large, so back to old algorithm
unsigned long long tmp1;
memset(cnt,0,b*sizeof(cnt[0]));
cnt[0]=1;
for(int i=0;i<steps;++i){
tmp1=cnt[0];
cnt[0]=(cnt[0]+cnt[1])%mod;
int s=min(min(i+1,steps-i),b-1);
for(int j=1;j<s;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
cnt[s]=(tmp1+cnt[s])%mod;
}
return cnt[0];
}
};
```
Simply changing to
r=remainder(r,mod);
return round(r);
```
r=remainder(r,mod);
return round(r);
```
gives numeric errors:
Input
steps = 47
arrLen = 38
Output 318653590
Expected 318671228
```
Input
steps = 47
arrLen = 38
Output 318653590
Expected 318671228
```
(In this example, the result is "r=1.149565e+20", it's accurate up to the 15th decimal place.
Which means we can safely do that when the result is less than which is about , so we can change
if(r<mod) return round(r);
```
if(r<mod) return round(r);
```
to
if(r<(1ULL<<50)) return round(fmod(r,mod));
```
if(r<(1ULL<<50)) return round(fmod(r,mod));//fmod() avoids negative results that remainder() may return
```
and the result is precise enough.)
So it requires too much precision for very large outputs. But in general, it gives a very good approximation in complexity. Amazing!
But can we improve the complexity for the original problem? Let's go down this path further and see where it leads us!
If we apply the trivial result that answer=1 when steps=1, we get an interesting identity,
,where .
One way to prove it is, simply write and and expand the expression, most of the summations cancel out, only the number 1/2 repeats times, thus completes the proof.
There is an easier way to prove it, though. Noticing that and , the terms cancel out! Now we're left with . can be taken as the real part of , where . So it becomes . It would make a full circle if we include which contributes an extra 1, thus the summation must equal -1. This gives the expected result .
Can we generalize this to the general case with power ? Let's find out!
The function is zero for , so adding doesn't change the result.
Then, if we also include where , they are just the repeat of the values for , thus the result is doubled.
Applying these changes and express the and in exponents, we get
If we expand it as a polynomial of , , most of the terms will cancel out after the summation of , since the summation would become , which is zero unless is a multiple of 2, in which case the summation is since it's 1 repeated times.
Thus, only the terms with power being a multiple of contributes to the result. Each one of those contributs . How do we find ? The coefficient of the term in is easy to find, namely . Let's denote that by . Multiplying it by will shift it to the powers , and respectively.
Thus, the result is
where .
Then, what are the s? Since the power can be at most , and must be a multiple of , the ratio must be within . We can then just iterate through that and multiply by . This completes the algorithm.
This algorithm has complexity . Interestingly, one would expect the complexity to grow with , but with this algorithm it's the opposite. When , the old algorithm at the beginning takes while this one takes ! If we combine this algorithm and the old algorithm which has complexity , i.e., we use the new algorithm for large and the old one for small , we get an algorithm! We choose the new one if and the old one otherwise.
(Building the binomial table has complexity but it is not included in the complexity, since it only needs to be built once. For multiple requests the amortized complexity can be arbitrarily low. Explictly, if the function is called times, while the maximum of the number of steps is , then the amortized complexity is . If , the amortized comlexity is still .
On the other hand, one could use a factorial and inverse of factorial table instead, which can be built in , and can be calculated as , at the cost of 2 extra mulitplication operations on each call to binom. But this makes it exactly .
)
The following code implements this algorithm:
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
vector<vector<int>> binom{{1}};
void buildBinom(int n){
for(int i=binom.size()-1;i<=n;++i){
vector<int> v;
v.reserve(i+2);
v.emplace_back(1);
const auto& b=binom.back();
for(int j=0;j<i;++j) v.emplace_back((b[j]+b[j+1])%mod);
v.emplace_back(1);
binom.emplace_back(move(v));
}
}
int getCoe(int s,int j,int t){
if(j+t<0||s<2*j+t) return 0;
return (long long)binom[s-j][j+t]*binom[s][j]%mod;
}
class Solution {
public:
int numWays(int steps, int arrLen){
if(arrLen<=2){
if(arrLen==1) return 1;
int p=(steps-1)/63,q=(steps-1)%63;
unsigned long long c=(1ULL<<63)%mod,r=(1ULL<<q)%mod;
while(p--) r=(r*c)%mod;
return r;
}
if(steps==1) return 1;
if(arrLen>steps||arrLen*arrLen>steps){
buildBinom(steps);
long long t=0;
int tmp=2*arrLen+2;
for(int m=-steps/tmp*tmp;m<=steps;m+=tmp){
int upper=(steps-m)/2+2;
for(int j=0;j<=upper;++j){
t=(t+2*getCoe(steps,j,m)-getCoe(steps,j,m+2)-getCoe(steps,j,m-2))%mod;
}
}
return ((t*500000004)%mod+mod)%mod;
}
int tmp2;
unsigned long long tmp1;
memset(cnt,0,arrLen*sizeof(cnt[0]));
cnt[0]=1;
for(int i=0;i<steps;++i){
tmp1=cnt[0];
cnt[0]=(cnt[0]+cnt[1])%mod;
for(int j=1;j<arrLen;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
cnt[arrLen]=(tmp1+cnt[arrLen])%mod;
}
return cnt[0];
}
};
```
#pragma GCC target("avx,mmx,sse2,sse3,sse4")
auto _=[]()noexcept{ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);return 0;}();
unsigned int cnt[251];
const int mod=1000000007;
vector<vector<int>> binom{{1}};
void buildBinom(int n){
for(int i=binom.size()-1;i<=n;++i){
vector<int> v;
v.reserve(i+2);
v.emplace_back(1);
const auto& b=binom.back();
for(int j=0;j<i;++j) v.emplace_back((b[j]+b[j+1])%mod);
v.emplace_back(1);
binom.emplace_back(move(v));
}
}
int getCoe(int s,int j,int t){
if(j+t<0||s<2*j+t) return 0;
return (long long)binom[s-j][j+t]*binom[s][j]%mod;
}
class Solution {
public:
int numWays(int steps, int arrLen){
if(arrLen<=2){
if(arrLen==1) return 1;
int p=(steps-1)/63,q=(steps-1)%63;
unsigned long long c=(1ULL<<63)%mod,r=(1ULL<<q)%mod;
while(p--) r=(r*c)%mod;
return r;
}
if(steps==1) return 1;
if(arrLen>steps||arrLen*arrLen>steps){
buildBinom(steps);
long long t=0;
int tmp=2*arrLen+2;
for(int m=-steps/tmp*tmp;m<=steps;m+=tmp){
int upper=(steps-m)/2+2;
for(int j=0;j<=upper;++j){
t=(t+2*getCoe(steps,j,m)-getCoe(steps,j,m+2)-getCoe(steps,j,m-2))%mod;
}
}
return ((t*500000004)%mod+mod)%mod;
}
int tmp2;
unsigned long long tmp1;
memset(cnt,0,arrLen*sizeof(cnt[0]));
cnt[0]=1;
for(int i=0;i<steps;++i){
tmp1=cnt[0];
cnt[0]=(cnt[0]+cnt[1])%mod;
for(int j=1;j<arrLen;++j){tmp2=cnt[j]; cnt[j]=(tmp1+cnt[j]+cnt[j+1])%mod; tmp1=tmp2;}
cnt[arrLen]=(tmp1+cnt[arrLen])%mod;
}
return cnt[0];
}
};
```
Ah, one more thing. The number 500000004 here acts as , because we can't simply divide by 2. To divide a number , we must multiply the inverse of the element. Now, the algorithm is complete.
To get rid of this , noticing the symmetry of the coefficients , it's easy to see that the sum of the terms corresponding to and are the same. With this, the middle part can be written as
if(arrLen>steps||arrLen*arrLen>steps){
buildBinom(steps);
long long t=0;
for(int j=0;j<=steps/2;++j) t=(t+getCoe(steps,j,0)-getCoe(steps,j,2))%mod;
int tmp=2*arrLen+2;
for(int m=tmp;m<=steps;m+=tmp){
int upper=(steps-m)/2+2;
for(int j=0;j<=upper;++j){
t=(t+2*getCoe(steps,j,m)-getCoe(steps,j,m+2)-getCoe(steps,j,m-2))%mod;
}
}
return (t+mod)%mod;
}
```
if(arrLen>steps||arrLen*arrLen>steps){
buildBinom(steps);
long long t=0;
for(int j=0;j<=steps/2;++j) t=(t+getCoe(steps,j,0)-getCoe(steps,j,2))%mod;
int tmp=2*arrLen+2;
for(int m=tmp;m<=steps;m+=tmp){
int upper=(steps-m)/2+2;
for(int j=0;j<=upper;++j){
t=(t+2*getCoe(steps,j,m)-getCoe(steps,j,m+2)-getCoe(steps,j,m-2))%mod;
}
}
return (t+mod)%mod;
}
```
which also saves the time by half.
I really enjoyed the journey to this surprising result. It reminds me of the first time I learned about Strassen algorithm. The fact that "this is achievable!" and seeing it unveils in front of me is so satisfying!
(ps: Just learned that the solution for the case when is known as the Motzkin numbers, which has an recurrence relation: . This gives an algorithm without the need to build the binomial table for this special case. The only downside is, we must find for , which takes about 30 operations for each of them by finding . But it's still a really good algorithm. I wonder if this can be generalized for the "correction terms" above, i.e., if there is a recurrence relation for . The Motzkin numbers are the here. If so, the binomial table will not be necessary.)