9.3 Find Median from Data Stream

9.3.1 Problem Metadata

9.3.2 Description

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

For example: - For arr = [2,3,4], the median is 3. - For arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within \(10^{-5}\) of the actual answer will be accepted.

9.3.3 Examples

Example 1:

Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr = [1, 2, 3]
medianFinder.findMedian(); // return 2.0

9.3.4 Constraints

  • \(-10^5 \le \text{num} \le 10^5\)
  • There will be at least one element in the data structure before calling findMedian.
  • At most \(5 \times 10^4\) calls will be made to addNum and findMedian.

9.3.5 Solution - Two Heaps

9.3.5.1 Walkthrough

This solution uses a dual-heap approach to maintain the median efficiently. Instead of sorting all elements on every query, we partition numbers into two balanced groups:

Core Strategy:

  1. Maintain two heaps:
    • Max heap (maxHeap1): Stores the smaller half of numbers (largest of the small half at root)
    • Min heap (minHeap2): Stores the larger half of numbers (smallest of the large half at root)
  2. Key invariants:
    • Ordering: All elements in maxHeap1 \(\le\) all elements in minHeap2
    • Balancing: |maxHeap1.size() - minHeap2.size()| $\le$ 1
  3. Why this works:
    • The median is always at the “boundary” between the two halves
    • If balanced equally: median = average of both heap roots
    • If one heap has 1 more: median = root of the larger heap
    • This gives O(1) median access!

Step-by-step execution for [1, 2, 3]:

addNum(1):
  1. maxHeap1.offer(1) → maxHeap1=[1]
  2. minHeap2.offer(maxHeap1.poll()) → maxHeap1=[], minHeap2=[1]
  3. No rebalancing needed
  Result: maxHeap1=[], minHeap2=[1]
  findMedian() → minHeap2.peek() = 1

addNum(2):
  1. maxHeap1.offer(2) → maxHeap1=[2]
  2. minHeap2.offer(maxHeap1.poll()) → maxHeap1=[], minHeap2=[1,2]
  3. minHeap2.size()=2 > maxHeap1.size()+1? YES (2 > 1)
  4. maxHeap1.offer(minHeap2.poll()) → move 1 back
  Result: maxHeap1=[1], minHeap2=[2]
  findMedian() → (1 + 2) / 2 = 1.5 ✓

addNum(3):
  1. maxHeap1.offer(3) → maxHeap1=[3,1] (3 on top of max heap)
  2. minHeap2.offer(maxHeap1.poll()) → move 3 to minHeap2
     → maxHeap1=[1], minHeap2=[2,3] (2 on top of min heap)
  3. minHeap2.size()=2 > maxHeap1.size()+1? NO (2 ≤ 2)
  Result: maxHeap1=[1], minHeap2=[2,3]
  findMedian() → minHeap2.peek() = 2.0 ✓

Why the two-step insertion works:

The pattern maxHeap1.offer(num); minHeap2.offer(maxHeap1.poll()) ensures: - Every element passes through maxHeap1 first - Only the largest element from maxHeap1 moves to minHeap2 - This maintains the ordering invariant: all maxHeap1 elements \(\le\) all minHeap2 elements - Then we rebalance sizes if needed

Key insight: By forcing every element through maxHeap1 first, we guarantee that minHeap2 only receives elements that are larger than everything currently in maxHeap1 (except during rebalancing).

9.3.5.2 Analysis

  • Time Complexity:
    • addNum(int num): O(log n)
      • maxHeap1.offer(num): O(log n) - one heap insertion
      • maxHeap1.poll(): O(log n) - one heap extraction
      • minHeap2.offer(): O(log n) - one heap insertion
      • Conditional rebalancing (worst case): 2 more operations × O(log n)
      • Total: at most 5 heap operations × O(log n) = O(log n)
    • findMedian(): O(1)
      • Just peek at heap roots (direct array access)
    • Overall for n operations: O(n log n)
      • Much better than naive O(n² log n) (sorting on every query)
  • Space Complexity: O(n)
    • Two heaps store all n elements
    • No additional data structures needed

Why this is optimal: - You cannot do better than O(n) space (must store all elements) - O(log n) insertion is optimal for maintaining sorted order - O(1) median query is optimal (cannot be faster than constant time)

9.3.5.3 Implementation Steps

Initialization: 1. Create max heap for smaller half: PriorityQueue<>(Collections.reverseOrder()) 2. Create min heap for larger half: PriorityQueue<>()

Adding a number: 3. Always insert into maxHeap1 first 4. Immediately move maxHeap1’s root to minHeap2 to maintain ordering 5. Rebalance if minHeap2 becomes too large: - If minHeap2.size() > maxHeap1.size() + 1, move minHeap2’s root back to maxHeap1

Finding median: 6. If heaps are equal size: return average of both roots 7. Otherwise: return root of the larger heap (which is minHeap2 in our implementation)

Invariant maintenance: - After addNum(), minHeap2.size() is either equal to or exactly 1 more than maxHeap1.size() - All elements in maxHeap1 are \(\le\) all elements in minHeap2

9.3.5.4 Code - Java

class MedianFinder {
    // Max heap: stores smaller half (largest of small half at root)
    private PriorityQueue<Integer> maxHeap1 = new PriorityQueue<>(Collections.reverseOrder());
    // Min heap: stores larger half (smallest of large half at root)
    private PriorityQueue<Integer> minHeap2 = new PriorityQueue<>();

    public MedianFinder() {
        // No initialization needed
    }

    public void addNum(int num) {
        // Always add to maxHeap1 first
        maxHeap1.offer(num);

        // Move largest element from maxHeap1 to minHeap2
        // This maintains ordering: all maxHeap1 <= all minHeap2
        minHeap2.offer(maxHeap1.poll());

        // Rebalance: if minHeap2 gets too large, move smallest back
        if (minHeap2.size() > maxHeap1.size() + 1) {
            maxHeap1.offer(minHeap2.poll());
        }
    }

    public double findMedian() {
        // Equal sizes: median is average of both roots
        if (maxHeap1.size() == minHeap2.size()) {
            return (maxHeap1.peek() + minHeap2.peek()) / 2.0;
        } else {
            // minHeap2 has 1 extra element, its root is the median
            return minHeap2.peek();
        }
    }
}