Ordered Set/Multiset Alternatives for Languages Without Built-In/Standard Library Support for Codeforces

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

#	User	Contrib.
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

I have been doing competitive programming for some months now. While attempting problems above 1800 in difficulty, I noticed that some of them required the storage of ordered sets/multisets that support lookup, addition and deletion of elements in O(log(n)) time. Their solutions usually involve the use of std::set or std::multiset in C++. However, I code primarily in Python and there are no built-in packages available on Codeforces. Normally outside of Codeforces I just import sortedcontainers.sortedlist which I use like an Order Statistic tree. Here I discuss some strategies which I have used to overcome these issues. I hope that it helps others who are stuck so that they do not give up on their favourite languages. For C++ users, this may not be as useful, but I hope that I can show you how you may use alternative data structures to solve problems.

The example I use here is CSES — Concert Tickets (for those with no CSES accounts, see here https://www.technicalkeeda.in/2020/10/concert-ticket-cses-problem-solution.html). This question requires the deletion of entries, and searching for the next largest available entry in sorted array arr smaller than a certain value x.

Method 1: Segment Trees A segment tree is a very versatile data structure that work for cumulative searches for binary functions with associative properties. Go read up on it if you don't know what it is.

For Concert Tickets, we need to find the largest non-deleted value closest to x. To do this, initialize a maximum segment tree with its values corresponding to its index value. i.e. the segmentTreeArray=[0,1,2,3,...n-1], and queries between [l,r] will give the largest value within the range [l,r]. Everytime we delete a value, we set the index corresponding to that value to -inf (or -1e9, if you prefer). This way, the index will not "interfere" with future query ranges that includes it. To find a value, first perform a binary search on the original array arr to find where the search should end such that arr[i]<x. To find the next largest available index, query for the largest from 0 to i.

If you want to search for next smallest to the right, use a minimum segment tree, set value to inf (1e9) on delete, and search from i to n-1 instead.

Complexity: O(log(n)) for search, insert, or delete. Adding elements is only possible if you know beforehand what are the elements you can add (and thereby allocate a "space" in your segment tree).

What the code might look like: ~~~~~ def main():

n,m=readIntArr()

actual=readIntArr()
actual.sort()
custMax=readIntArr()

availIndexes=getMaxSegTree(list(range(n)))
#if available, mark node as 1. Else, mark node as -inf
ans=[]
for cm in custMax:
    b=n
    i=-1
    while b>0: # binary search
        while i+b<n and actual[i+b]<=cm:
            i+=b
        b//=2
    if i==-1: # no smaller value
        ans.append(-1)
    else:
        idx=availIndexes.query(0,i)
        if idx==-inf: # no available smaller value
            ans.append(-1)
        else:
            ans.append(actual[idx])
            availIndexes.updateTreeNode(idx,-inf) # delete the node
multiLineArrayPrint(ans)

return


Method 2: Linked List with Path Compression (like DSU)
Note: From my (rather shallow) experience, this only works with item deletion.
Maintain an array **parent** in parallel to **arr** of same size. Originally **parent** looks like [0,1,2,...n-1]. Every time an item is deleted replace parent[deletedI] with deletedI-1 if you want to search for next smaller. If the search is for next larger, then replace with deletedI+1. When searching for a value, perform a binary search (like in segment tree example) for the index **i** such that arr[i]<x. Then, run a search for the parent of arr[i] with while(parent[i]!=i)i=parent[i]. Remember to perform a path compression after that, as this will make all searches faster.

Complexity: O(1) for deletion. O(log(n)) for search, because if the binary search. I don't know the complexity of path compression finding of parent (see https://mirror.codeforces.com/blog/entry/2381). This linked-list with path compression runs quickly. For my submission for this question, it runs close to 2x faster than segment tree.

What the code might look like:

def main():

n,m=readIntArr()
h=readIntArr() # n prices
p=[i for i in range(n)] #price parent
t=readIntArr() # m customers
h.sort()

ans=[]
for x in t:
    b=n
    i=-1
    while b>0: # binary search
        while i+b<n and h[i+b]<=x:
            i+=b
        b//=2
    if i==-1:
        ans.append(-1)
    else:
        eventualParent=i
        while eventualParent!=-1 and p[eventualParent]!=eventualParent:
            eventualParent=p[eventualParent]
        if eventualParent==-1: # no smaller item found
            ans.append(-1)
            eventualParent+=1
        else:
            ans.append(h[eventualParent])
        eventualParent-=1
        # path compression like DSU
        while i!=eventualParent:
            temp=p[i]
            p[i]=eventualParent
            i=temp
multiLineArrayPrint(ans)

return


Method 3: Actually using a self-implemented B-tree or Order Statistic Tree
This is straightforward: Just copy-paste ready written code. For me, I copy-pasted the import library of sortedcontainers.sortedlist which is 800+ lines of code after removing empty lines. I dislike this method because 1) it makes my code super long, 2) slows down my IDE which must parse all that code, 3) tends to have the largest overhead (usually slower than using segment tree). However, it doesn't require innovative thinking on how to use a segment tree etc. to perform equivalent functions. Time complexity: O(log(n)) for search, insert and delete.

What the code might look like:

def main():

n,m=readIntArr()
h=readIntArr() # n ticket prices
t=readIntArr() # m customers

h.sort()
ol=OrderedList(h)

ans=[]
for x in t:
    idx=ol.bisect_left(x)
    if idx==len(ol) or ol[idx]>x:
        idx-=1
    if idx==-1:
        ans.append(-1) #cant take anything
    else:
        ans.append(ol.pop(idx))
multiLineArrayPrint(ans)

return

~~~~~

Rev.	By	When	Δ	Comment
en16	YMSeah	2022-03-30 15:20:04	14	Minor grammatical correction
en15	YMSeah	2022-03-30 15:17:04	1086	Made a note about the time complexity of SortedList in Python. Also added another problem example.
en14	YMSeah	2021-03-31 14:26:00	6	Tiny change: ' element cnts (eg. how many ' -> ' element counts (i.e how many '
en13	YMSeah	2021-03-31 14:25:06	441
en12	YMSeah	2021-03-31 09:24:21	346
en11	YMSeah	2021-03-30 18:35:43	38
en10	YMSeah	2021-03-30 17:27:44	5
en9	YMSeah	2021-03-30 17:18:27	337
en8	YMSeah	2021-03-30 17:11:51	1	Tiny change: 'is deleted replace p' -> 'is deleted, replace p'
en7	YMSeah	2021-03-30 17:09:26	146	(published)
en6	YMSeah	2021-03-30 17:02:57	18
en5	YMSeah	2021-03-30 17:02:09	6	Tiny change: 'uired the storage of order' -> 'uired the use of order'
en4	YMSeah	2021-03-30 17:01:18	480
en3	YMSeah	2021-03-30 16:52:43	25
en2	YMSeah	2021-03-30 16:50:04	6
en1	YMSeah	2021-03-30 16:49:19	6698	Initial revision (saved to drafts)

History