unalive's blog

By unalive, 3 weeks ago, In English

title

looking for either one person or a small group (at a similar skill level, 2200-2300)

would just like to discuss after contests/share random problems/take VCs together for the next few months

also, im not an OIer but don’t mind grinding OI problems

edit: ok found some people, thanks everyone!

Full text and comments »

  • Vote: I like it
  • +54
  • Vote: I do not like it

By unalive, 2 months ago, In English

I am unfortunately not very good at writing code and can barely function without an easy way to debug said code. I therefore need a debug template at ICPC and spent some time reducing the length of the debug template I use normally. I think it's pretty short already, but it seems like it can be shortened further. I don't know how to do so, hence this blog.

Some considerations:

  1. Can only use features introduced in C++ 17 or earlier, as my region is weird.
  2. Need to be able to debug all STL containers, and any nested versions thereof.

Now, if C++ 20 were allowed, one could simply use the following:

Code

__print() works by repeatedly considering the elements that constitute x and calling __print() on them (whilst ensuring that the output of each __print() call is separated by ,) until << is defined for x by default.

Now, what's the problem with making this code compatible with C++17?

The problem is that there doesn't seem to be a short (in terms of code length) way in C++17 to differentiate between pairs and iterable containers.

I found two solutions, both of which aren't good at all:

1) Use is_literal_type_v to check if T is a pair
Code

This will work if we have pairs like std::pair<int, float> but not with something like std::pair<int, vector<int>>. This is a significant loss of functionality since we now cannot debug things like map<int, vector<int>> which are often used.

2) Just create a separate templated function for pairs
Code

This is also bad because:

  1. Much longer code.
  2. Notice that we now need to use a class/struct/namespace for the two __print() functions as they can call each other.

Can someone help me fix these issues and/or make the code shorter in general? Right now, I think the last one is the best. I can't believe I spent the last 3 hours shortening a piece of code by 5 lines.

For whatever reason, GPT and Claude seems to be unhelpful here. I ran out of access to 4o trying to get it to help me, despite purchasing GPT plus T_T

Full text and comments »

  • Vote: I like it
  • +16
  • Vote: I do not like it

By unalive, 2 months ago, In English
  • Vote: I like it
  • +72
  • Vote: I do not like it

By unalive, 3 months ago, In English

Disclaimer: It might have bugs, don't send me death threats if you FST.

I couldn't find a nice dynamic bitset template so I wrote one.

It can be found here.

It has additional functionality as compared to std::bitset (you can answer many kinds of range queries on it, for example: "Find $$$k$$$-th set bit in range $$$[l, r]$$$).

Some poor documentation

Efficiency:

Firstly, always use the following pragmas with it:

pragmas

They can reduce runtime by upto 50% (thanks to mr qmk for enlightening me on this).

I am too lazy to run any proper benchmarks, but I solved a few problems with it and it was always faster than std::bitset and tr2::dynamic_bitset. Here are some sets of submissions on the same problem with all 3:

1. Using &=, | and >>

  1. My bitset: 284156267
  2. std::bitset: 284156622
  3. tr2::dynamic_bitset: 284156883
Bitset Time Memory
My bitset 765 ms 944 KB
std::bitset 859 ms 1628 KB
tr2::dynamic_bitset 1077 ms 1240 KB

2. Using &=, >>=

edit: Redid these because apparently the server was under high load at the time of the initial submissions.

  1. My bitset: 284262107
  2. std::bitset: 284277251
  3. tr2::dynamic_bitset: 284267738
Bitset Time Memory
My bitset 343 ms 1124 KB
std::bitset 405 ms 1140 KB
tr2::dynamic_bitset 390 ms 844 KB

So it seems that my bitset is as good or slightly better in every manner. I have no idea why this is the case though, as there is nothing which seems particularly faster in my implementation.

Parting notes:

  1. If you use it and find some bugs, let me know.
  2. If you think it's missing some significant functionality, let me know.

Thanks for reading my blog.

Bitset Waifu

Full text and comments »

  • Vote: I like it
  • +113
  • Vote: I do not like it

By unalive, 4 months ago, In English

Notice the decline by 3 million last year? OpenAI kidnapped 3 million chinese kids and they are serving as the backend for GPT-o1 from a basement.

#FreeChineseKids

Full text and comments »

  • Vote: I like it
  • +630
  • Vote: I do not like it

By unalive, 5 months ago, In English

Outline

  • Problem
  • Algorithm
  • Complexity Analysis
  • Code
  • Example Problems

Hello, in this blog I'll share a funny way to construct a suffix tree in $$$O(n \log^2{n})$$$ time, for a given string $$$S$$$ of length $$$n$$$. I am going to call the underlying idea the "Leader Split Trick". It can probably be used to solve other problems too.

Problem

A suffix tree of a string $$$S$$$ is a radix tree constructed from all the suffixes of $$$S$$$. It's easy to see that it has $$$O(n)$$$ nodes. It can be constructed in $$$O(n)$$$ using this.

I am going to share a simple and practically useless way of building it in a worse time complexity, $$$O(n\log^2{n})$$$.

Algorithm

Notation

Initially, we start with an empty tree (with a virtual root node), and a set $$$G$$$ of all suffixes from $$$1$$$ to $$$n$$$, these suffixes will be stored in the form of their starting index.

It's easy to see that the paths from the root node to $$$l_u \forall (u \in G)$$$ will share some common prefix till an internal node $$$s_G$$$, after which these paths will split apart along some downward edges of the internal node. Let's define $$$d_G$$$ to be the longest common prefix across the paths $$$(\text{root}, l_u) \forall u \in G$$$.

Our algorithm will essentially do the following:

  1. Find $$$d_G$$$.

  2. Split apart $$$G$$$ into disjoint subsets $$$G'$$$ (each subset $$$G'$$$ will have suffixes whose leaves lie in the subtree of a unique child node of $$$s_G$$$).

  3. Solve the problem recursively for each subset, and add an edge in the suffix tree from $$$s_G$$$ to $$$s_{G'}$$$ for every $$$G'$$$.

Now, we define a recursive function $$$f(G, L, \text{dep}, \text{dis})$$$.

Definitions

In each call, $$$f(G, L, \text{dep}, \text{dis})$$$, we do the following:

  1. If the "Leader" element $$$L$$$ is undefined:

  2. Set $$$L$$$ to a random element of $$$G$$$.

  3. For every suffix $$$i \in G$$$, find $$$\text{dis[i]}$$$, the longest common prefix of the suffixes $$$i$$$ and $$$L$$$. This can be done in $$$O(\vert G \vert \cdot \log{n})$$$ using binary search + hashing. We store $$$\text{dis}$$$ in a sorted manner.

  4. Let $$$m$$$ be the minimum value in $$$\text{dis[]}$$$. It's easy to see that the internal node created from splitting $$$G$$$ will occur at depth $$$\text{dep} + m$$$. We create $$$s_G$$$, and add an edge corresponding to the substring $$$S[L + dep + 1, L + \text{dep} + m]$$$ from $$$s_{G_p}$$$ to $$$s_G$$$.

  5. Now, we delete all suffixes $$$i \in G : \text{dis[i]} = m$$$, from $$$G$$$ (and their corresponding elements from $$$\text{dis}$$$), and group them into disjoint subsets based on the character $$$S_{i + \text{dep} + m + 1}$$$ for suffix $$$i$$$ (basically the next character after the internal node).

  6. We call $$$f(G', \text{null}, \text{dep} + m, \text{null})$$$ for every newly created subset $$$G'$$$, and also call $$$f(G, L, \text{dep + m}, \text{dis})$$$ for the modified subset $$$G$$$.

Note: There might be some off-by-one errors.

Complexity Analysis

Consider the following problem:

We have $$$n$$$ singleton sets, and are given some set merge operations. When merging sets $$$A$$$ and $$$B$$$, we merge $$$B$$$ to $$$A$$$ with probability $$$\frac{\vert A \vert}{\vert A \vert + \vert B \vert}$$$ and $$$A$$$ to $$$B$$$ with the probability $$$\frac{\vert B \vert}{\vert A \vert + \vert B \vert}$$$.

The above problem is computationally equivalent to Randomized Quicksort, which has an expected time complexity of $$$O(n \log{n})$$$.

It's not difficult to see that our split operations are simply the operations that will occur in the above problem in a reversed manner (Formally, we can define a bijective relationship between the two sets of operations, such that related sets of operations will occur with the same probability) . Therefore, the time taken by all the split operations is $$$O(n \log{n})$$$.

However, every time we perform a split operation (merge in reverse), we also compute $$$\text{dis}$$$ for the child set $$$C$$$ (which gets merged into the parent set), and that takes $$$O(\vert C \vert \log{n})$$$ time. Thus, our entire algorithm has an expected time complexity of $$$O(n \log^2{n})$$$.

Code

My implementation can be found here.

Some thoughts

This trick seems to have some "online capability", as we can efficiently split a group of nodes into multiple groups (given that the information for query for a group can be processed mostly through a randomly chosen leader element). For example, consider the following problem:

Problem 1

You are given a tree on $$$n$$$ nodes. You also have a set containing all nodes, $$${1, 2, \dots , n}$$$. You have to process the following queries online:

  1. "$$$1\; m\; x\; v_1\; v_2\; \dots \; v_x$$$" : Remove the nodes $$$v_1, v_2 \dots, v_x$$$ from the set $$$S$$$ whose maximum element is $$$m$$$, and create a new set with these elements (it is guaranteed that there exists some set with maximum element $$$m$$$ and $$$v_i \in S \; \forall \; i$$$).
  2. "$$$2 \; m$$$" : Let the set whose maximum element is $$$m$$$ be $$$S$$$. Find some node $$$x \in S \mid \max_{y \in S}{\text{dis}(x, y)} = \max_{u, v \in S}{\text{dis}(u,v)} $$$.
Solution

Full text and comments »

  • Vote: I like it
  • +48
  • Vote: I do not like it