Heap
堆的本质是一个树型结构complete BT,具体实现不一定。complete BT, perfectly balanced except in the bottom level. Height is log N
- 可以用数组实现,对于任意index,他的parent节点坐标为(index-1)/2,左儿子为2index+1,右儿子为2index+2
- or the other way
- parent of node at k is at k/2
- children of node at k are at 2k and 2k+1
HeapSort
- promotion in a heap, when child's key becomes larger key than parent's key
void swim(int k) { while (k > 1 && less(k/2, k)) { exch(k, k/2); k = k/2; } } - demotion, when parent's key becomes smaller than one or both child's
void sink(int k) { while (2*k <= N) { int j = 2*k; // find the larger child node if (j < N && less(j, j+1)) j++ if (!less(k, j)) break; exch(k, j); k = j; } } - When delete the max from the heap, exchange the top with the last element in the array
- HeapSort, in place sorting and NlogN
- but makes poor use of cache memory
- not stable
- more inner compares than quicksort
void sort(Comparable[] a) { int N = a.length; // build heap using bottom-up method for (int k = N/2; k >= 1; k--) { sink(a, k, N) } // remove the max every time, after that array will be sorted while (N > 1) { exch(a, 1, N--); sink(a, 1, N); } }
PriorityQueue
push O(log n)
- 本质是先add到末尾,再进行shiftup,在一个complete BT 上最多向上log n次
poll O(log n)
- 本质是先与末尾元素交换,然后删掉,最后把被交换的元素进行shiftdown
remove O(n) 用HashHeap可以降为O(logn)
top O(1)
- 若有重复元素,则多加入一个count来判断,每次操作完进行相应的更新
转化为max heap
PriorityQueue<Integer> heap = new PriorityQueue<Integer>(Collections.reverseOrder());
- 另外当求第k个元素的时候,可以通过变成min/max heap来降低复杂度,即使得heap只需要保存k个元素
这么做有两个好处:
- 把时间复杂度从单纯的排序比较O(nlogn)降到了O(nlogk)
- 可以处理realtime stream (online) 具体实现起来有两种方法: 对于获得前k个大的元素可以
- max heap, 初始化为
PriorityQueue<Integer> heap = new PriorityQueue<Integer>(k, Collections.reverseOrder()); - min heap, 当heap.size() > k, heap.poll()这样较小的元素就被pop出去,只剩下最大的那些
- 可以根据排序的要求自己实现comparator,
class PointComp implements Comparator<Point> {
public int compare(Point a, Point b) {
}
}
PriorityQueue<Point> heap = new PriorityQueue<>(new PointComp());
PriorityQueue<Node> pq = new PriorityQueue<Node>(initCapacity, new Comparator<Node>() {
public int compare(Node n1, Node n2) {
// compare n1 and n2
}
});
Comparator
http://www.lintcode.com/accounts/signin/?next=/problem/kth-largest-in-n-arrays/
Find Median from Data Stream
- 还是有点quickselect的思想,在中位数的左边有x个数,则右边必须是x或者x+1个元素。如果当前的数不是中位数,那么就可以左右两边的个数差异知道,必须取左边的最大或者右边的最小来进行新的中位数的尝试。于是就想到了MaxHeap和MinHeap