Today, I was asked in an interview to build a data structure as follows:
Let there be some elements and some groups. Each element associated to 'exactly 1' group has a score. The data structure must support the following operations:
insert(el_id,grp_id,x)
: Insert element with idel_id
with a scorex
to group with group_idgrp_id
set(el_id,x)
: change the score of element with idel_id
tox
.set(grp_id,x)
: change the score of all elements in the group with idgrp_id
tox
.print(grp_id)
: print the max score element's id in that group. (Return any if multiple exist)
Constraints:
1 <= no_of_elements <= 1e6
1 <= no_of_groups <= 5
1 <= score <= 5
I couldn't solve it during the interview and also couldn't think of any solution later. Would someone please help?
Is it possible for one element to be in multiple groups? If so, does each element just have one score, or does each element-group association have its own score?
No that is not possible, each element will be associated to one group only.
It is possible to support each operation in amortized constant time with linear memory, even if the number of groups is much larger. (I'm treating the maximum score as a constant, since it's only 5.)
For each group, we'll maintain a bucket queue, which is a type of priority queue where each score has a "bucket", in this case a circular doubly linked list. Each bucket will start with a sentinel node to make implementation easier.
For
insert
, add the element's node to its group's bucket with the corresponding score.For the element version of
set
, remove the element's node from its group's bucket with the corresponding score.For the group version of
set
, splice all of the other buckets onto the end of the bucket with the corresponding score.For
print
, scan the buckets in reverse order until you find one that isn't empty and return its first element.Operations 3 and 4 are linear in the number of buckets, but again, we're treating that as a constant.
Here, I'm assuming
1 <= el_id <= no_of_elements
and1 <= grp_id <= no_of_groups
to be consistent withscore
. If this is not true, the code below will need to be changed slightly.For an interesting challenge, see if you can figure out how to achieve amortized logarithmic time complexity per operation even when both the number of groups and the maximum score can be large.
Thank you so much!
Auto comment: topic has been updated by roycf123 (previous revision, new revision, compare).
Not a very efficient approach but i think it should work: Think of all groups to be max heaps containing the elements. Also remember each elements index in its respective heap or group after each operation. Inserting element in group would simply be heap push operation O(logn), Changing score of element would be using the elements index in its heap and doing decreaseKey() operation in the respective heap 0(logn), changing all elements in a group would be changing each value in heap to O(group size) and printing max score of a group would be heap.top() O(1).
Sorry I read the comment in a hurry...
This is exactly what I did (except for the use of
std::set
instead of heaps, I performed search operation and changed for 2), But the set(grp_id,x) would take O(group_size) which can be O(1e6) in the worst case, so may be costly (in terms to time taken)...Is there any way to do it faster, that is any data structure that may support this form of group update fast?
P.S: I think there may be a better way to utilize the constraint
1 <= score <= 5
, although not sure...Timestamps for operations set(el_id,x) and set(grp_id,x). Store changed values of operation set(grp_id,x) for groups instead of elements. Changing all elements in a group could be done in O(1). Each element has two values: one belongs to itself (determined by operations insert(el_id,grp_id,x) and set(el_id,x)), the other belongs to its group (determined by operation set(grp_id,x)). Choose the one that has bigger timestamp when do other operations in the heap.
a self balancing binary tree? where each node represents a group which is a binary tree of elements, all of the operations can be done in O(log(n) + log(m)) ... set(grp_id,x) will take O(log(n) + m).
Yeah same as before, many people use std::multiset instead of std::priority_queue and it does no harm in most cases.