{x|x∉x}: April 2023

Thursday, April 27, 2023

Random Subarray Algorithm

After I discussed with someone else who also wrote a Digits solver, I looked into a little bit more of the codes that they used to generate the puzzles. I noticed an interesting function there:

function getRandomInArray(arr, n) {
  var len = arr.length
  if (n > len)
    throw new RangeError(
      'getRandom: more elements taken than available',
    )
  var result = new Array(n),taken = new Array(len)
  while (n--) {
    var x = Math.floor(Math.random() * len)
    result[n] = arr[x in taken ? taken[x] : x]
    taken[x] = --len in taken ? taken[len] : len
  }
  return result
}

Apparently the function takes an array and returns a length-$n$ random subarray of the argument. But I was confused because it uses a grammar that I was not familiar with - I thought the "in" operator here is similar to the "in" in Python, which misled me at first. I realized that that's definitely not what it does when I tested it, indeed, it returns if the property (in this case, the index) exists, not the value.

I wonder why it's written this way, instead of the more straightforward Fisher–Yates shuffle algorithm, because I noticed that this algorithm gives exactly the same result with the Fisher–Yates algorithm. I thought about this problem a few years ago in my research. After a few attempts, I re-discovered this algorithm myself. I was quite happy about it, but at the same time I thought that this must have already been discovered long ago. Years later, I finally searched about it and learned its official name. The original Fisher–Yates shuffle algorithm may be implemented this way:

function getRandomInArray(arr, n) {
  var len = arr.length
  if (n > len)
    throw new RangeError(
      'getRandom: more elements taken than available',
    )
  var result = arr;
  while (len-->n) {
    var x = Math.floor(Math.random() * len)
    let temp=result[x]
    result[x] = result[len]
    result[len]=temp
  }
  return result.slice(n)
}

But the first function has a few advantages in certain circumstances. One issue with the original algorithm is, if we are not allowed to modify the original array, we must make a copy of the entire array. Even if we are allowed to modify it, exchanging two elements may still consume a lot of resources if the elements are large objects. One way to solve this is, we can just record the indices that are exchanged instead of the elements, and construct the new array based on the indices. This avoids the copying issue, but if the array is very large and we only need a few elements, creating an integer array of the full length and initialize it may still be time consuming. That's probably why the function in the beginning uses this "in" operator - so it kind of serves as a map of int to int. I'm not sure about the overhead of a "new Array" in JS, I wonder if it would be more efficient with an actual map?

On another note, I noticed that this function is an exact copy of this answer. I wonder if it's by the same author... or someone just googled it and copied this. XD

Anyway, here's my implementation of this algorithm in C++:

#include<time.h>
#include<vector>
#include<unordered_map>
#include<random>
#include<chrono>

std::default_random_engine generator(chrono::system_clock::now().time_since_epoch().count());
using std::vector;
using std::unordered_map;
template<typename T>
vector<T> ranSub(const vector<T>& arr,size_t n,bool trunc=false){
    if(arr.size()<n){
        if(trunc) n=arr.size();
        else throw;
    }
    size_t len=arr.size();
    vector<T> res(n);
    unordered_map<size_t,size_t> taken;
    taken.reserve(n);
    while(n--){
        std::uniform_int_distribution<size_t> distribution(0,len-1);
        size_t x=distribution(generator);
        res[n]=arr[taken.find(x)!=taken.end()?taken[x]:x];
        taken[x]=taken.find(--len)!=taken.end()?taken[len]:len;
    }
    return res;
}

New York Times Digits Solver

Solver

Target is

Numbers are (seperate by space)

The closest solution is:

Complexity Analysis

This problem is quite similar to the 24 puzzle game, but slightly different. They both provide 4 binary operators that we can choose from, namely $+$ $-$ $*$ and $/$. Thus, any valid expression that you construct naturally forms a full binary tree, which has a nice recursive structure, where there're $m$ leaves, each takes the value of a number, and $m-1$ non-leaf nodes, each takes one of the operators. So, the question becomes, how many different trees can you construct from $n$ numbers?

Let's consider the case where we use all of the numbers. Any solutions that use less numbers would just be a subtree of a tree of $n$ leaves. Let's consider the structure of the tree first. Starting from 2 leaves, there is only one way to construct it: two numbers, one operator. When we include the third number, we must choose one of the leaves and replace it with an operator node, and attach two number nodes as its children. We repeat this process for the rest of the numbers, each time we choose a leaf and replace it with an operator node, then attach two number nodes. There are $k-1$ choices when we include the $k$th number, so over all there are $(n-1)!$ structures.

Next, the number of different permutations of the numbers is $n!$, and the number of all combinations of the operators is $4^{n-1}$. Before we multiply them, notice that this problem has a symmetry: if we exchange the left and right subtree of a '$+$' or '$*$' operator node, the value of the expression doesn't change. If we consider a '$-$' node, we can exchange the subtrees when the two value on the subtrees are the same, and if they are different, one of the expressions is invalid, because the game prevents you from subtracting a large number from a small number. (Even if it's allowed, it wouldn't make a difference if the target is positive and at least one of the given numbers is positive.) The same goes for '$/$' nodes. (Notice that in the case of the 24 puzzle, the situation is different. You have to use fractions in certain cases, e.g. 5 5 5 1, 3 3 7 7, etc.) So, for every ordering of the two subtrees of a operator node, only one needs to be considered. So the upper bound is $n!(n-1)!\frac{4^{n-1}}{2^{n-1}}=2^{n-1}n!(n-1)!$. In practice, though, the cases that one number is a multiple of the other is rare, so it would be closer to $\left(\frac{3}{2}\right)^{n-1}n!(n-1)!$, but it is a minor change on the complexity.

This is exactly how we are going to solve it. Each time we apply an operator on two of the numbers, record the new value (because we don't have to use all of the numbers, an intermediate result could be the solution. This is another difference between this problem and the 24 puzzle game) and its corresponding expression, then we take the new value and the rest of the numbers as the input to the same problem, but the number of the numbers is one less, so we can solve it recursively. If the solver finds the exact solution, we can return early. But if there's no exact solution, it returns the closest possible solution.

We can easily modify the function to return the first $k$ closest solutions, e.g. with a priority queue, but there will be many duplicated expressions with the same solution value. We can use a set to skip the expressions that give the same value that we've already found. Implementations are simple, so they're left as exercises! :)

An Example of Bad Code

I took a look at their JS code for puzzle generation, and I found this function:

function getSmartOperation(firstNum, secondNum) {
  var randInt = getRandomInt(0, 100)
  const quotient = firstNum / secondNum
  if (randInt < 20 && firstNum + secondNum <= 999) {
    // 20% chance of +
    return '+'
  } else if (
    randInt >= 21 &&
    randInt < 40 &&
    firstNum - secondNum >= 0
  ) {
    // 20% chance of -
    return 'âˆ’'
  } else if (
    randInt >= 41 &&
    randInt < 65 &&
    firstNum * secondNum <= 999
  ) {
    // 25% chance of x
    return 'Ã—'
  } else if (randInt >= 65 && Number.isInteger(quotient)) {
    // 35% chance of Ã·
    return 'Ã·'
  } else {
    return getSmartOperation(firstNum, secondNum)
  }
}

This is part of their algorithm to generate the target number from a randomly generated numbers' list. The problem with this function is in the last statement. If all the attempts to find an appropriate operator for the two numbers fail, it puts the same two numbers back into the function and repeats the process. Since the numbers are randomly generated, there's 50% chance that the first number is less than the second number, so when they are put back, if the RNG rolls to $-$ or $/$ again, it would just be a waste of time and we just roll the RNG again until it ends up being $+$ or $*$. The worse thing is, if the first number is smaller than the second number, and their summation and product are both larger than 999, the recursion will never end. (It doesn't happen in this particular scenario because the second number is always from the original list, which seems to be always small ($\leq 25$ from what I see), so if the first number $+$ the second one $\geq 999$, then the first number must be greater than the second one, so $-$ is always valid. But still, this function is bad in itself.)

A better approach is, exchange the two numbers if the first is less than the second, then attemp to assign the division operator with high probability. If they are not divisible, try multiplying and adding with their assigned probabilities. If both fail, just assign the $-$ operator, because the first number is larger or equal to the second number and the difference must be less or equal to the first number, it's guaranteed that the result is a valid number.