9.3 Find Median from Data Stream
9.3.1 Problem Metadata
- Platform: LeetCode
- Problem ID: 295
- Difficulty: Hard
- URL: https://leetcode.com/problems/find-median-from-data-stream/
- Tags: Blind 75, NeetCode 150
- Techniques: Heap, Design
9.3.2 Description
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
For example:
- For arr = [2,3,4], the median is 3.
- For arr = [2,3], the median is (2 + 3) / 2 = 2.5.
Implement the MedianFinder class:
MedianFinder()initializes theMedianFinderobject.void addNum(int num)adds the integernumfrom the data stream to the data structure.double findMedian()returns the median of all elements so far. Answers within \(10^{-5}\) of the actual answer will be accepted.
9.3.3 Examples
Example 1:
Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]
Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0
9.3.4 Constraints
- \(-10^5 \le \text{num} \le 10^5\)
- There will be at least one element in the data structure before calling
findMedian. - At most \(5 \times 10^4\) calls will be made to
addNumandfindMedian.
9.3.5 Solution - Two Heaps
9.3.5.1 Walkthrough
This solution uses a dual-heap approach to maintain the median efficiently. Instead of sorting all elements on every query, we partition numbers into two balanced groups:
Core Strategy:
- Maintain two heaps:
- Max heap (
maxHeap1): Stores the smaller half of numbers (largest of the small half at root) - Min heap (
minHeap2): Stores the larger half of numbers (smallest of the large half at root)
- Max heap (
- Key invariants:
- Ordering: All elements in
maxHeap1\(\le\) all elements inminHeap2 - Balancing:
|maxHeap1.size() - minHeap2.size()| $\le$ 1
- Ordering: All elements in
- Why this works:
- The median is always at the “boundary” between the two halves
- If balanced equally: median = average of both heap roots
- If one heap has 1 more: median = root of the larger heap
- This gives O(1) median access!
Step-by-step execution for [1, 2, 3]:
addNum(1):
1. maxHeap1.offer(1) → maxHeap1=[1]
2. minHeap2.offer(maxHeap1.poll()) → maxHeap1=[], minHeap2=[1]
3. No rebalancing needed
Result: maxHeap1=[], minHeap2=[1]
findMedian() → minHeap2.peek() = 1
addNum(2):
1. maxHeap1.offer(2) → maxHeap1=[2]
2. minHeap2.offer(maxHeap1.poll()) → maxHeap1=[], minHeap2=[1,2]
3. minHeap2.size()=2 > maxHeap1.size()+1? YES (2 > 1)
4. maxHeap1.offer(minHeap2.poll()) → move 1 back
Result: maxHeap1=[1], minHeap2=[2]
findMedian() → (1 + 2) / 2 = 1.5 ✓
addNum(3):
1. maxHeap1.offer(3) → maxHeap1=[3,1] (3 on top of max heap)
2. minHeap2.offer(maxHeap1.poll()) → move 3 to minHeap2
→ maxHeap1=[1], minHeap2=[2,3] (2 on top of min heap)
3. minHeap2.size()=2 > maxHeap1.size()+1? NO (2 ≤ 2)
Result: maxHeap1=[1], minHeap2=[2,3]
findMedian() → minHeap2.peek() = 2.0 ✓
Why the two-step insertion works:
The pattern maxHeap1.offer(num); minHeap2.offer(maxHeap1.poll()) ensures:
- Every element passes through maxHeap1 first
- Only the largest element from maxHeap1 moves to minHeap2
- This maintains the ordering invariant: all maxHeap1 elements \(\le\) all minHeap2 elements
- Then we rebalance sizes if needed
Key insight: By forcing every element through maxHeap1 first, we guarantee that minHeap2 only receives elements that are larger than everything currently in maxHeap1 (except during rebalancing).
9.3.5.2 Analysis
- Time Complexity:
addNum(int num): O(log n)maxHeap1.offer(num): O(log n) - one heap insertionmaxHeap1.poll(): O(log n) - one heap extractionminHeap2.offer(): O(log n) - one heap insertion- Conditional rebalancing (worst case): 2 more operations × O(log n)
- Total: at most 5 heap operations × O(log n) = O(log n)
findMedian(): O(1)- Just peek at heap roots (direct array access)
- Overall for n operations: O(n log n)
- Much better than naive O(n² log n) (sorting on every query)
- Space Complexity: O(n)
- Two heaps store all n elements
- No additional data structures needed
Why this is optimal: - You cannot do better than O(n) space (must store all elements) - O(log n) insertion is optimal for maintaining sorted order - O(1) median query is optimal (cannot be faster than constant time)
9.3.5.3 Implementation Steps
Initialization:
1. Create max heap for smaller half: PriorityQueue<>(Collections.reverseOrder())
2. Create min heap for larger half: PriorityQueue<>()
Adding a number:
3. Always insert into maxHeap1 first
4. Immediately move maxHeap1’s root to minHeap2 to maintain ordering
5. Rebalance if minHeap2 becomes too large:
- If minHeap2.size() > maxHeap1.size() + 1, move minHeap2’s root back to maxHeap1
Finding median:
6. If heaps are equal size: return average of both roots
7. Otherwise: return root of the larger heap (which is minHeap2 in our implementation)
Invariant maintenance:
- After addNum(), minHeap2.size() is either equal to or exactly 1 more than maxHeap1.size()
- All elements in maxHeap1 are \(\le\) all elements in minHeap2
9.3.5.4 Code - Java
class MedianFinder {
// Max heap: stores smaller half (largest of small half at root)
private PriorityQueue<Integer> maxHeap1 = new PriorityQueue<>(Collections.reverseOrder());
// Min heap: stores larger half (smallest of large half at root)
private PriorityQueue<Integer> minHeap2 = new PriorityQueue<>();
public MedianFinder() {
// No initialization needed
}
public void addNum(int num) {
// Always add to maxHeap1 first
maxHeap1.offer(num);
// Move largest element from maxHeap1 to minHeap2
// This maintains ordering: all maxHeap1 <= all minHeap2
minHeap2.offer(maxHeap1.poll());
// Rebalance: if minHeap2 gets too large, move smallest back
if (minHeap2.size() > maxHeap1.size() + 1) {
maxHeap1.offer(minHeap2.poll());
}
}
public double findMedian() {
// Equal sizes: median is average of both roots
if (maxHeap1.size() == minHeap2.size()) {
return (maxHeap1.peek() + minHeap2.peek()) / 2.0;
} else {
// minHeap2 has 1 extra element, its root is the median
return minHeap2.peek();
}
}
}