Median of Data Stream
Jun 28, 2023
C++
Data Structures
Heaps
Multisets

Let's think of a question :-

Problem Statement

We are given a stream of integers that are being added and removed from the stream one at a time. We need to find the median of the integers at any given point of time.

Solution

Approach 1

The first approach that come to mind is to keep the integers in a list and sort the list every time we need to find the median. This approach is not efficient as sorting the list will take O(nlogn) time. Therefore the time complexity of this approach will be O(n^2logn).

Approach 2

The second approach (more optimised) is to use two heaps. But I prefer the use of multisets which basically serve the same purpose. One multi-set will store the integers less than the median and the other multi-set will store the integers greater than the median. Thus the complexity of adding and removing the integers will be O(logn) and finding the median will be O(1). Therefore the overall time complexity of this approach will be O(nlogn).

1#include <bits/stdc++.h>
2using namespace std;
3
4void addNumber(vector<int> &stream, int x){
5 stream.push_back(x);
6 sort(stream.begin(), stream.end());
7}
8
9void removeNumber(vector<int> &stream, int x){
10 stream.erase(find(stream.begin(), stream.end(), x));
11}
12
13int main(){
14 vector<int> stream;
15 int n;
16 cin >> n;
17 for(int i=0; i<n; i++){
18 int x;
19 cin >> x;
20 addNumber(stream, x);
21 cout << "Median : " << stream[stream.size()/2] << endl;
22 }
23 return 0;
24}
25

Some problems related to this question are :-

currently a placeholder for footer