Friday, March 11, 2022

Rearrange String k Distance Apart very easy O(n) solution

Problem description:

Given a non-empty string s and an integer k, rearrange the string such that the same characters are at least distance k from each other.
All input strings are given in lowercase letters. If it is not possible to rearrange the string, return an empty string "".
 
Example 1:
Input: s = "aabbcc", k = 3
Output: "abcabc"
Explanation:
The same letters are at least distance 3 from each other.
Example 2:
Input: s = "aaabc", k = 3
Output: ""
Explanation:
It is not possible to rearrange the string.
Example 3:
Input: s = "aaadbbcc", k = 2
Output: "bacabcd"
Explanation:
Another possible answer is: "abcabcda"
The same letters are at least distance 2 from each other.
 

 (Update: There is a much simpler construction, which I posted here. I'll update the article when I have time.)

After I solved this problem, It's funny to see that people used all kinds of complicated data structures for this problem, e.g. heap (priority queue), hash map, etc.

In fact, there is an extremely simple algorithm that doesn't need any of them and runs in O(n) time by construction.

Let's look at the problem. We must separate the same characters by distance at least k, so let's simply cut it into n/k pieces, each has length k. The last one must have length n%k, and it looks like this:

The most problematic character is, of course, the one that shows up the most. So first we count the frequency of the characters, and sort in descending order (sorting has complexity O(n lg n), but since the number of characters in a string are upper-bounded by a constant, it's considered O(1) here).

We simply put the first character in the front of each substring. If the frequency is more than the number of substrings, then it's impossible.

Next, we simply put the second character in the next empty place, continuing from where the last character was put. E.g, if we have 6 of 'x' and 5 of 'y', and 8 substrings, then after filling in the first two characters, the substrings are

xy
xy
xy
x
x
x
y
y

It would violate the condition if the last 'y' reaches or goes beyond the substring above the first 'y' - but it won't happen, since the frequency of the second character must be equal or smaller than the first one, so if that happens, then it must happen to the first character which is before it. Since we must check if the frequency of the first character is more than the number of substrings, we would've already noticed that it's impossible. And if they all have the same length n/k(+1), then 'y' would just fill in all the substrings from the first to the last as well, so there won't be any problems either!

Now we can fill in the characters from top to bottom, when it's full we move to the right. If n%k==0, that's all we need to do!

It would be a little more complicated if n%k!=0. Let's see what happens.

When we reach the point where the first n%k columns are full, and part of the next column is occupied, if the next character has frequency n/k, it will become a failure, because the last character will be right above the row of the first character, and it's one space to the right, so the distance is k-1:
So, how do we prevent this?
Since we know that if this character has frequency n/k+1 then it's impossible (we simply return empty string), we can guarantee that the frequency of this character is at most n/k. So, if we start from a new column, there wouldn't be any problem like this, just like what we did in the beginning! And we can bring the last few characters back to where we skipped!
Does this work? Yes, if there are enough columns. But if not, it may still cause a failure:
Suppose after we filled in the first n%k columns, and we skip the partially filled column, and we fill the empty column next to it with the next character, x. Then we keep going with the next character, y, which is the last character. Now, if we bring it back to the empty space, they will be to the left of the first few y's!

So, how can we fix it?








Notice that it's because the last few characters are brought from the top to the bottom, but the first few characters were already in the bottom, maybe we can change the direction! If we go from bottom to top, is this going to fix the problem?
Yes! This time, even if the character has frequency n/k, its tail wouldn't reach its head because of the same reasoning that we stated in the beginning. And we can simply move the last part to the front! Even if there's only one extra column, it wouldn't cause any trouble:







But hey, remember that the rows are strings, it seems a little too complicated to fill in the first n%k+1 columns, then skip some of the characters, fill the columns after it, then come back and fill in the last few characters from where we skipped. Can we do it in a better way?

Since we already reversed the direction of going up or down, what if we also reverse the direction of left and right?
Now, after we filled the first n%k columns, we simply continue from the right-most column and start filling it down to up! There wouldn't be any conflict due to the same reasoning we did for the left side, and it will end up where we left at the (n%k+1)th column.

The better thing is, we don't even need to do it backwards - we can simply reverse the order of filling it! After filling the first n%k columns, we check if the next character has frequency <= n/k, if not we simply return empty string, but if it's true, we fill the rest of the spaces from the character with the lowest frequency, and keep using the characters in the ascending order of frequency!

This way, we don't need any fancy data structures or complicated algorithms, it's all straight forward.

No comments:

Post a Comment

A Recursive Geometric Proof for 1962 IMO Problem 6

Youtube recommended a video to me, where the following problem is solved: Consider an isosceles triangle. Let $R$ be the radius of its circ...