M347. Top K Frequent Elements description: Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order. Constraints: $1 <= nums.length <= 10^5$ $-10^4 <= nums[i] <= 10^4$ $k$ is in the range [1, the number of unique elements in the array]. It is guaranteed that the answer is unique.
(Update: Apparently using "nth_element" would be the most efficient algorithm, which is O(n), better than the algorithms below.) A very easy problem for a medium. But I did a few modifications to the typical approach. The code below is the "fastest" 0ms submission, which takes 12~20ms currently.
```cpp struct myComp { constexpr bool operator()( pair<int, int> const& a, pair<int, int> const& b) const noexcept { return a.second < b.second; } }; class Solution { public: vector<int> topKFrequent(vector<int>& nums, int k) { unordered_map<int,int> mp; for(int i:nums) mp[i]++; priority_queue<pair<int,int>, vector<pair<int,int>>, myComp> pq; for(auto i:mp) pq.push(i); vector<int> ans; for(int i = 0; i < k; i++) { auto top = pq.top(); ans.push_back(top.first); pq.pop(); } return ans; } }; ```
First we count the frequencies, then we find the top $k$ ones with the highest frequency. The code above used a priority queue (pq) to record the order, but adding a number to a pq is $O(\log n)$, and adding $n$ numbers is $O(n\log n - n)$, which is not much faster than using a vector and sorting it. We can do much better with the following code:
```cpp class Solution { public: vector<int> topKFrequent(vector<int>& nums, int k) { unordered_map<int,int> mp; for(int i:nums) mp[i]--; priority_queue<array<int,2>> pq; int i=0; auto it=mp.begin(); for(;i<k;++i,++it) pq.emplace(array<int,2>{it->second,it->first}); for(;it!=mp.end();++it){ if(it->second>pq.top()[0]) continue; pq.emplace(array<int,2>{it->second,it->first}); pq.pop(); } vector<int> ans; while(!pq.empty()){ ans.emplace_back(pq.top()[1]); pq.pop(); } reverse(ans.begin(),ans.end()); return ans; } }; ```
The trick is, since we only need the top $k$ ones, we can just store those in the pq. And each time we find a value that is larger than the smallest one in the pq, we remove the smallest one and add the new one. So we actually need a pq that implememts a minimum heap. We can do that by writing a custom comparator, but that's unnecessary - if we negate the numbers, the result of comparison would be inverted. So in the first step, instead of adding the frequency count, we subtract it. Then we add the first $k$ elements, then each time if we encounter an element that has a higher frequency then the minimum frequency that we have, we remove the minimum frequency one and add the new one. The operation is only $O(\log k)$. In the end, we extract the corresponding values, but they will be in the order of small to large, so we need to reverse the order, which is $O(k)$. It's a much better algorithm when $n>>k$.
No comments:
Post a Comment