查找给定数组中出现次数最多的k个数字

原文: https://www.geeksforgeeks.org/find-k-numbers-occurrences-given-array/

给定n个数字和一个正整数k组成的数组。 问题是找到出现次数最多的k个数字,即找到频率最高的前k个数字。 如果两个数字具有相同的频率,则应优先选择较大的数字。 数字应按其频率降序显示。 假设该数组由出现次数最多的k个数字组成。

示例

Input: 
arr[] = {3, 1, 4, 4, 5, 2, 6, 1}, 
k = 2
Output: 4 1
Explanation:
Frequency of 4 = 2
Frequency of 1 = 2
These two have the maximum frequency and
4 is larger than 1.

Input : 
arr[] = {7, 10, 11, 5, 2, 5, 5, 7, 11, 8, 9},
k = 4
Output: 5 11 7 10
Explanation: 
Frequency of 5 = 3
Frequency of 11 = 2
Frequency of 7 = 2
Frequency of 10 = 10
These four have the maximum frequency and
10 is largest among rest.

方法 1

  • 方法:思维过程应从创建HashMap开始,以将元素频率对存储在HashMap中。 HashMap用于在固定时间内执行插入和更新。 然后以频率降序对元素-频率对进行排序。 这给出了有关每个元素及其在数组中存在的次数的信息。 要获取数组的k个元素,请打印已排序数组的前k个元素。

  • HashmapHashMap是 Java 1.2 以来的 Java 集合的一部分。 它提供了 Java Map接口的基本实现。 它以(键,值)对存储数据。 要访问一个值,必须知道其键。 HashMap被称为哈希映射,因为它使用了一种称为哈希的技术。 哈希是一种将大字符串转换为代表相同字符串的小字符串的技术。 较短的值有助于索引编制和更快的搜索。 HashSet还内部使用HashMap。 它在内部使用链表来存储已在HashSet中详细解释的键值对以及其他文章。

    有关HashMap的更多信息

  • 算法

    1. 创建一个Hashmap hm,以存储键值对,即元素频率对。

    2. 从头到尾遍历数组。

    3. 对于数组中的每个元素更新hm[array[i]]++

    4. 将元素频率对存储在向量中,然后按频率降序对向量进行排序。

    5. 打印排序数组的前k个元素。

  • 实现

    C++

    ```

    // C++ implementation to find k numbers with most // occurrences in the given array

    include

    using namespace std;

    // comparison function to sort the 'freq_arr[]' bool compare(pair p1, pair p2) {     // if frequencies of two elements are same     // then the larger number should come first     if (p1.second == p2.second)         return p1.first > p2.first;

    // sort on the basis of decreasing order     // of frequencies     return p1.second > p2.second; }

    // funnction to print the k numbers with most occurrences void print_N_mostFrequentNumber(int arr[], int n, int k) {     // unordered_map 'um' implemented as frequency hash table     unordered_map um;     for (int i = 0; i < n; i++)         um[arr[i]]++;

    // store the elements of 'um' in the vector 'freq_arr'     vector > freq_arr(um.begin(), um.end());

    // sort the vector 'freq_arr' on the basis of the     // 'compare' function     sort(freq_arr.begin(), freq_arr.end(), compare);

    // display the top k numbers     cout << k << " numbers with most occurrences are:\n";     for (int i = 0; i < k; i++)         cout << freq_arr[i].first << " "; }

    // Driver program to test above int main() {     int arr[] = { 3, 1, 4, 4, 5, 2, 6, 1 };     int n = sizeof(arr) / sizeof(arr[0]);     int k = 2;     print_N_mostFrequentNumber(arr, n, k);     return 0; }

    ```

    Java

    ```

    // Java implementation to find k elements with max occurence. import java.util.*; public class KFrequentNumbers {     static void print_N_mostFrequentNumber(int[] arr, int n, int k)     {

    Map mp = new HashMap();

    // Put count of all the distinct elements in Map         // with element as the key & count as the value.         for (int i = 0; i < n; i++) {

    // Get the count for the element if already              // present in the Map or get the default value             // which is 0.             mp.put(arr[i], mp.getOrDefault(arr[i], 0) + 1);         }

    // Create a list from elements of HashMap         List > list =            new ArrayList >(mp.entrySet());

    // Sort the list         Collections.sort(list, new Comparator >() {             public int compare(Map.Entry o1,                                Map.Entry o2)             {                 if (o1.getValue() == o2.getValue())                     return o2.getKey() - o1.getKey();                 else                     return o2.getValue() - o1.getValue();             }         });

    for (int i=0; i<k; i++)            System.out.println(list.get(i).getKey());     }

    // Driver Code to test the code.     public static void main(String[] args)     {         int arr[] = { 3, 1, 4, 4, 5, 2, 6, 1 };         int n = arr.length;         int k = 2;         print_N_mostFrequentNumber(arr, n, k);     } }

    ```

    Python3

    ```

    Python3 implementation to find k numbers

    with most occurrences in the given array

    funnction to print the k numbers with

    most occurrences

    def pr_N_mostFrequentNumber(arr, n, k):

    um = {}     for i in range(n):         if arr[i] in um:             um[arr[i]] += 1         else:             um[arr[i]] = 1     a = [0] * (len(um))     j = 0     for i in um:         a[j] = [i, um[i]]         j += 1     a = sorted(a, key = lambda x : x[0],                          reverse = True)     a = sorted(a, key = lambda x : x[1],                           reverse = True)

    # display the top k numbers      print(k, "numbers with most occurrences are:")     for i in range(k):         print(a[i][0], end = " ")

    Driver code

    if name =="main":     arr = [3, 1, 4, 4, 5, 2, 6, 1]     n = 8     k = 2     pr_N_mostFrequentNumber(arr, n, k)

    This code is contributed by

    Shubham Singh(SHUBHAMSINGH10)

    ```

    [

    输出

    ``` 2 numbers with most occurrences are: 4 1

    ```

  • 复杂度分析

    • 时间复杂度O(d log d),其中d是数组中不同元素的计数。 要对数组排序需要O(d log d)时间。

    • 辅助空间O(d),其中d是数组中不同元素的计数。 要将元素存储在HashMap中,需要O(d)空间复杂性。

方法 2

  • 方法: Create a HashMap to store element-frequency pair in the HashMap. HashMap is used to perform insertion and updation in constant time. Then use a priority queue to store the element-frequency pair (Max-Heap). This gives the element which has maximum frequency at the root of the Priority Queue. Remove the top or root of Priority Queue K times and print the element. To insert and delete the top of the priority queue O(log n) time is required.

    优先级队列:优先级队列是一种容器适配器,经过专门设计,使得队列中的第一个元素是队列中所有元素中最大的,并且元素的顺序是非递增的(因此我们可以看到,队列中的每个元素都具有优先级【固定顺序】)。

    有关优先级队列的更多信息:C++ STL priority_queue

  • 算法

    1. 创建一个Hashmap hm,以存储键值对,即元素频率对。

    2. 从头到尾遍历数组。

    3. 对于数组中的每个元素更新hm[array[i]]++

    4. 将元素频率对存储在优先级队列中并创建优先级队列,这需要O(n)时间。

    5. 运行k次循环,并在每次迭代中删除优先级队列的顶部并打印该元素。

  • 实现

    CPP

    ```

    // C++ implementation to find k numbers with most // occurrences in the given array

    include

    using namespace std;

    // comparison function defined for the priority queue struct compare {     bool operator()(pair p1, pair p2)     {         // if frequencies of two elements are same         // then the larger number should come first         if (p1.second == p2.second)             return p1.first < p2.first;

    // insert elements in the priority queue on the basis of         // decreasing order of frequencies         return p1.second < p2.second;     } };

    // funnction to print the k numbers with most occurrences void print_N_mostFrequentNumber(int arr[], int n, int k) {     // unordered_map 'um' implemented as frequency hash table     unordered_map um;     for (int i = 0; i < n; i++)         um[arr[i]]++;

    // store the elements of 'um' in the vector 'freq_arr'     vector > freq_arr(um.begin(), um.end());

    // priority queue 'pq' implemented as max heap on the basis     // of the comparison operator 'compare'     // element with the highest frequency is the root of 'pq'     // in case of conflicts, larger element is the root     priority_queue, vector\ >,                    compare>         pq(um.begin(), um.end());

    // display the top k numbers     cout << k << " numbers with most occurrences are:\n";     for (int i = 1; i <= k; i++) {         cout << pq.top().first << " ";         pq.pop();     } }

    // Driver program to test above int main() {     int arr[] = { 3, 1, 4, 4, 5, 2, 6, 1 };     int n = sizeof(arr) / sizeof(arr[0]);     int k = 2;     print_N_mostFrequentNumber(arr, n, k);     return 0; }

    ```

    Java

    ```

    // Java implementation to find k elements with max occurence. import java.util.*; public class KFrequentNumbers {     static void print_N_mostFrequentNumber(int[] arr, int n, int k)     {         Map mp = new HashMap();

    // Put count of all the distinct elements in Map         // with element as the key & count as the value.         for (int i = 0; i < n; i++) {

    // Get the count for the element if already present in the Map             // or get the default value which is 0.             mp.put(arr[i], mp.getOrDefault(arr[i], 0) + 1);         }

    // Create a Priority Queue to sort based on the count         // or on the key if the count is same         PriorityQueue > queue =             new PriorityQueue<>(             (a, b) -> a.getValue().equals(b.getValue()) ?                        Integer.compare(b.getKey(), a.getKey()) :                        Integer.compare(b.getValue(), a.getValue()));

    // Insert the data from the map to the Priority Queue.         for (Map.Entry entry : mp.entrySet())             queue.offer(entry);

    // Print the top k elements         for (int i = 0; i < k; i++) {             System.out.println(queue.poll().getKey());         }     }

    // Driver Code to test the code.     public static void main(String[] args)     {         int arr[] = { 3, 1, 4, 4, 5, 2, 6, 1 };         int n = arr.length;         int k = 2;         print_N_mostFrequentNumber(arr, n, k);     } }

    // This code is contributed by Shubham Kumar Shah

    ```

    输出

    ``` 2 numbers with most occurrences are: 4 1

    ```

  • 复杂度分析

    • 时间复杂度O(k log d + d),其中d是数组中不同元素的计数。

      要删除优先队列的顶部,需要O(log d)时间,因此,如果删除了k个元素,则需要O(k log d)时间,并且需要遍历不同元素O(d)时间。

    • 辅助空间O(d),其中d是数组中不同元素的计数。

      要将元素存储在HashMap中,需要O(d)空间复杂性。

在线性时间中查找最常见的k

参考https://www.careercup.com/question?id=5082885552865280