LearnProductTeam
Stacks and Queues
Find Median From Data Stream
Omer Goldberg
January 22, 2020
17 min

Start now! 🚀🚀

Join our beta to get free access to our full syllabus 🎉 🥳

Problem 🤔

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Design a data structure that supports the following two operations:

addNum(num: int) -> None

Add a integer number from the data stream to the data structure.

findMedian() -> double

Return the median of all elements so far.

Example 1

Input: [2,3,4]

Output: 3

Example 2

Input: [2,3]

Output: (2 + 3) / 2 = 2.5

Example 3

Real World Applications 🌎

For large scale systeam that stream a lot of data, we’d often want to keep track of the median observed. There are a lot of use cases for this! A simple one could be anomaly detection. So we could be monitoring a specific data type coming into the system and looking for large spikes or drops.

Anything out of the oridinary really

Also, fun story — this question was on one of my finals durng my undergrad degree. You can imagine our reactions…

I was definitley groaning out loud during this final

The Finer Details 🔎

In the example we can see a distinction in which the input is even or odd length.

Odd length - When the input is odd length, finding the median is easy, because our list has a middle.

Even length - When the input is even length, finding the median is simply the average of the “middle-most” items.

What would be the most naive way to do this?

We can store all items in a list, and simply sort it when getMedian is called. If the list is odd we return the middle elements, otherwise we return the average of the “two middle elements”.

Naive Solution 💡

Complexity Analysis 🧮

Time Complexity

addNum -> O(1)

findMedian -> O(n* log(n))

Space Complexity

O(n)

Optimized Solution

Sorted the list every time we call getMedian, seems sub-optimal to say the least. What can we observe from the naive implementation above? Well, we only need to really keep track of the middle elements!

Keeping track of middle elements… hmm.. sorta sounds similar to keeping track of min and max elements. This should remind us about using a heap!

But heaps are for min and max elements. So how can manipulate heaps to get the middle elements.

We have heaps for max and min. And we want the middle element.. So let’s keep two heaps!

1) lowerHalf will be a maxHeap to store the smaller numbers 2) higherHalf will be a minHeap to store the larger nuumbers

NOTE

  • Python only supports min heaps natively

  • To achieve a maxheap we’ll need to multiply each number by -1 on its way in and out! This is a bit hacky, but hey, life is short

This is how I excuse all my bad code, and my default answer for most of my character flaws

Our general apporach will be to always add an incoming number to the lowerHalf heap and then to check our validity conditions:

  • All elements in lowerHalf < all elements in higherHalf
  • If the difference in heap size is more than 1, than we balance them appropiatley

When getMedian is invoked we can simply return the average between the largest element in the maxHeap and the smallest element in the minHeap! If the amount of items observed until now is odd, than we can return the top element off the heap with the bigger size!

Takeaways 🎉🥳

  • Getting the middle element via 2 heaps

Resources 📚

  • Leetcode

Tags

#stacksqueues#leetcode

Omer Goldberg

Tech Lead

crafting mobile experiences @ instagram. tech Lead @ facebook. former algorithm and fullstack Teacher @ israel tech challenge. crypto crazy.

Expertise

Algorithms
System Design
Android
Blockchain

Social Media

instagramlinkedinwebsite

Related Posts
18

Implement A Stack Using Queues
Implement A Stack Using Queues
January 22, 2020
22 min
© 2024, All Rights Reserved.

Quick Links

HomeLearnProductInstructors

Social Media