Tuesday, May 23, 2023

Leetcode M347 Top K Frequent Elements Optimization

M347. Top K Frequent Elements description:
  Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Constraints:
  $1 <= nums.length <= 10^5$
  $-10^4 <= nums[i] <= 10^4$
  $k$ is in the range [1, the number of unique elements in the array].
  It is guaranteed that the answer is unique.
(Update: Apparently using "nth_element" would be the most efficient algorithm, which is O(n), better than the algorithms below.)

A very easy problem for a medium. But I did a few modifications to the typical approach. The code below is the "fastest" 0ms submission, which takes 12~20ms currently.
```cpp
struct myComp {
    constexpr bool operator()(
        pair<int, int> const& a,
        pair<int, int> const& b)
        const noexcept
    {
        return a.second < b.second;
    }
};
class Solution {
public:   
    vector<int> topKFrequent(vector<int>& nums, int k) {
        unordered_map<int,int> mp;
        for(int i:nums) mp[i]++;
        
        priority_queue<pair<int,int>, vector<pair<int,int>>, myComp> pq;
        for(auto i:mp) pq.push(i);
        
        vector<int> ans;
        for(int i = 0; i < k; i++) {
            auto top = pq.top();
            ans.push_back(top.first);
            pq.pop();
        }
        return ans;
    }
};
```
First we count the frequencies, then we find the top $k$ ones with the highest frequency. The code above used a priority queue (pq) to record the order, but adding a number to a pq is $O(\log n)$, and adding $n$ numbers is $O(n\log n - n)$, which is not much faster than using a vector and sorting it. We can do much better with the following code:
```cpp
class Solution {
public:   
    vector<int> topKFrequent(vector<int>& nums, int k) {
        unordered_map<int,int> mp;
        for(int i:nums) mp[i]--;
        
        priority_queue<array<int,2>> pq;
        int i=0;
        auto it=mp.begin();
        for(;i<k;++i,++it) pq.emplace(array<int,2>{it->second,it->first});
        for(;it!=mp.end();++it){
            if(it->second>pq.top()[0]) continue;
            pq.emplace(array<int,2>{it->second,it->first});
            pq.pop();
        }
        vector<int> ans;
        while(!pq.empty()){
            ans.emplace_back(pq.top()[1]);
            pq.pop();
        }
        reverse(ans.begin(),ans.end());
        return ans;
    }
};
```
The trick is, since we only need the top $k$ ones, we can just store those in the pq. And each time we find a value that is larger than the smallest one in the pq, we remove the smallest one and add the new one. So we actually need a pq that implememts a minimum heap. We can do that by writing a custom comparator, but that's unnecessary - if we negate the numbers, the result of comparison would be inverted. So in the first step, instead of adding the frequency count, we subtract it.

Then we add the first $k$ elements, then each time if we encounter an element that has a higher frequency then the minimum frequency that we have, we remove the minimum frequency one and add the new one. The operation is only $O(\log k)$. In the end, we extract the corresponding values, but they will be in the order of small to large, so we need to reverse the order, which is $O(k)$. It's a much better algorithm when $n>>k$.

No comments:

Post a Comment

A Recursive Geometric Proof for 1962 IMO Problem 6

Youtube recommended a video to me, where the following problem is solved: Consider an isosceles triangle. Let $R$ be the radius of its circ...