dmkz's blog

By dmkz, history, 6 months ago, translation, In English

Hello, codeforces!

Today, based on the article Triangulation of polygon from a book "Computational Geometry: Algorithms and applications", I implemented an algorithm of triangulation of any polygon without self intersections (non-convex polygons too) in C++.

$$$\text{ }$$$

Full text and comments »

  • Vote: I like it
  • +114
  • Vote: I do not like it

By dmkz, history, 11 months ago, In English

Hello, codeforces!

Do you have modern GPU and CPU and want to calculate something with them, but you don't know how to do it? Dont be upset, because you can read this blog and run your first GPU program on your AMD/Intel/NVIDIA GPU. I will describe an installation process and then we will dive into the C++ code. As an example, we will calculate a lot of mazes for Bug Game in parallel. For doing this in simple way, we will use library Boost.Compute, which is a C++ STL-style wrapper over OpenCL library.

Full text and comments »

  • Vote: I like it
  • +118
  • Vote: I do not like it

By dmkz, history, 12 months ago, In English

Hello Codeforces!

UPD. I_love_natalia crashed the checker by his maze with $$$10^9$$$ moves. The problem is resolved now and all of the solutions rejudged. Scoring is changed. More info

UPD 2: The official site https://buglab.ru/ was updated. See comment.

UPD 3: The checker has been speeded-up by mfv in $$$2.2$$$ times (in comparison with my checker. The new speed is equal to $$$4$$$ seconds for a $$$10^9$$$ moves in codeforces "Custom Invocation"). Current standings:

  1. $$$17\cdot 10^9$$$ — sas4eka;
  2. $$$14\cdot 10^9$$$ — I_love_natalia;
  3. $$$6.8 \cdot 10^9$$$ — maxplus (on the buglab: $$$11.3 \cdot 10^9$$$).

UPD 4: I updated the checker: now mazes up to $$$42$$$ billions are supported before checker got TL. Time limit of checker on codeforces platform is equal to $$$1$$$ minute. I submitted the maze with number of moves equal to $$$42.015.084.960$$$ and the result has been calculated in 59799 ms. Maybe someone can calculate the answer faster?

I'm happy to invite you to an unofficial mirror of Bug Game. In this game you need to generate a $$$21 \times 31$$$ maze with the maximum number of bug's moves to get out. The bug moves not optimally and you will see a description of algorithm of its movement in the statement of this problem. Based on given algorithm you will be able to create a maze and submit it.

Privacy: your solutions (your mazes) will be visible only for mfv.

Invitation link: click here

Date: April 18, 2023, 00:00 UTC+3

Duration: 2 weeks, then upsolving and virtual participation.

Scoring: if you will be able to generate the maze with $$$X$$$ moves to get out, then your solution will get $$$\frac{x}{10^5}$$$ points.

Official Russian site of Bug Game: click here

Full text and comments »

  • Vote: I like it
  • +76
  • Vote: I do not like it

By dmkz, history, 16 months ago, In English

Hello again!

We can use custom node_update structure with our __gnu_pbds::tree<...> (tree from set of policy-based data structures). For example, there is interval_node_update_policy, using of which allows us to build the Interval Tree. It is described here.

As you may already know, that tree_order_statistics_node_update stores a size of subtree for each node. Instead of it, we can store sum of keys in subtree for calculating sum of keys on segment in $$$O(\log{n})$$$. So, sum of keys on segment [L,R] is just set.order_of_key(R+1)-set.order_of_key(L).

Full text and comments »

  • Vote: I like it
  • +52
  • Vote: I do not like it

By dmkz, history, 16 months ago, translation, In English

Hi everyone!

I took an ordered_set in C++ (i.e. __gnu_pbds::tree<...> from GNU C++ Standard Template Library with implemented Policy-Based Data Structures) and added 6 new methods to derived class:

  1. lower_bound_with_order -> iterator to minimal element such >= than given key + its index in the set (its order);

  2. upper_bound_with_order -> iterator to minimal element such > than given key + its index in the set (its order);

  3. count_less -> number of elements which are less than given element;

  4. count_less_equal -> number of elements which are less or equal than given element;

  5. count_greater -> number of elements which are greater than given element;

  6. count_equal -> just a number of elements which are equal to given element.

Advantages of (1) and (2): it is implemented in a single traversal of a tree ($$$\log{N}$$$). If you will call lower_bound firstly, and call order_of_key secondly, then you will spend in two times more time for doing it ($$$\log{N} + \log{N}$$$) instead.

Source code
Result of testing

You can take it to your CP template, if you want. I will try to do same for a ordered_multiset. I'm trying ordered_set with a comparator std::less_equal instead of std::less...

Full text and comments »

  • Vote: I like it
  • +84
  • Vote: I do not like it

By dmkz, history, 2 years ago, In English

Hello! I want to show you the real life example where using of next features:

ios::sync_with_stdio(false);
cin.tie(0); cout.tie(0);

leads to slowdown of the program, written in C++, in two times.

Example is quite simple — just print binary representation of all numbers in range $$$[0,10^7)$$$:

#include <bits/stdc++.h>
using namespace std;
int main() {
    //ios::sync_with_stdio(false);
    //cin.tie(0); cout.tie(0);
    char buf[33]={0};
    buf[32] = '\n';
    for (int i = 0; i < 10'000'000; i++) {
        for (int bit = 30; bit >= 0; bit--) {
            buf[30-bit] = ((i >> bit) & 1) ? '1' : '0';
        }
        cout << buf;
    }
}

UPD. When I replaced buf[32] = '\n'; by buf[31] = '\n';, the situation wasn't changed.

If we run this example in codeforces custom invocation, then we will see Used: 1840 ms, 1184 KB. When we will uncomment speedup of C++ input/output, then we will see Used: 3759 ms, 1188 KB. It's a trap!

UPD2. AsGreyWolf suggested to change compiler to GNU G++20 11.2.0 (64 bit, winlibs) from GNU G++17 9.2.0 (64 bit, msys 2) and it resolved slowdown. Any another GNU compiler provides slowdown.

Full text and comments »

  • Vote: I like it
  • +33
  • Vote: I do not like it

By dmkz, history, 3 years ago, translation, In English

Hello, codeforces!

Everyone knows that operation a * b % mod is not calling div and mod assembler commands. Instead of it only integer multiplication (imul) and bit shift in right (sar) are used:

Link to this example

I remember from comments for one of the editorials on the codeforces that Intel and AMD have a special extended ASM command for it, that can calculate even faster for assumption that a and b are 32-bit integers and mod is constant that is known at compilation time.

Today I can't find this command, this editorial and this comment from red coder with his template where this command is used for modulo multiplication. Maybe someone knows this ASM command and can help with it?

Full text and comments »

  • Vote: I like it
  • +43
  • Vote: I do not like it

By dmkz, history, 4 years ago, translation, In English

In this blog you can found an efficient implementation of Karatsuba multiply of two polinomials, that sometimes can be used instead of FFT

Full text and comments »

  • Vote: I like it
  • +136
  • Vote: I do not like it

By dmkz, history, 4 years ago, translation, In English

To find an error in the code you just need...

Full text and comments »

  • Vote: I like it
  • +77
  • Vote: I do not like it

By dmkz, history, 4 years ago, In English

Hi! I created mashup, added mashup to private group as a contest (6 hours ago). In this group one user is the group manager (joined as a manager 6 hours ago). Why I can't set him as a contest manager? Error message: user is not a gym manager. Tried 20-30 times with different time interval.

Full text and comments »

  • Vote: I like it
  • +9
  • Vote: I do not like it

By dmkz, history, 4 years ago, In English

Hello everybody!

If you want to use binary search over array with known at compile time size, and this size can be extended to power of two, you can use next simple template:

template<int n, typename T>
int lowerBound(const T* __restrict arr, const T val) {
    if (n == 1) { return val > arr[0]; } 
    else {
        constexpr int mid = n / 2;
        if (val > arr[mid]) { return mid+lowerBound<n-mid, T>(arr+mid, val); }
        else { return lowerBound<mid, T>(arr,val); }
    }
};

Testing on ideone shows, that it is faster in 4 times in comparison with std::lower_bound.

Thanks for reading this.

UPD. Testing for doubles, in 3.5 times faster.

Full text and comments »

  • Vote: I like it
  • +34
  • Vote: I do not like it

By dmkz, history, 5 years ago, In English

Hi, I cant understand, why my solution to problem 1137C. Museums Tour is getting WA79. Testcase is large, I tried to found by brute force on small graphs for n <= 4 and d <= 4 and compare with other accepted solutions and did not found counter-test. Can anybody help with understanding why my solution got WA79?

UPD. Fixed. Looks like we can't use dynamic programming on topological order, where we will update dp like: dp[next] = std::max(dp[next], dp[curr] + value[next]) for all edges curr->next, we only allowed use dynamic programming like dp[curr] = std::max(dp[curr], dp[prev]+value[curr]) for all edges prev->curr. Accepted try, difference with wa79.

Full text and comments »

  • Vote: I like it
  • -23
  • Vote: I do not like it

By dmkz, history, 5 years ago, translation, In English

UPD. Everything that will be described below is already used by tourist for a long time, for example, in his last submission.

In ancient problems on russian competitive programming cites, memory constraints are often extremely small and equal to 16 MB, 32 MB, 64 MB. Most likely, for modern memory constraints of 256 MB and 512 MB, the method described below for achieving the minimum memory consumption for the segment tree is not relevant, but still consider a method that allows it to be achieved in the top-down implementation. We will use exactly n - 1 nodes, and number the vertices in the Euler Tour order:

                    0
         1                  8
   2          5        9         10
3     4    6     7

Then, if we are in the node v and this is not a leaf, then it has left and right subtrees. The vertex of the left subtree is v + 1, because, in the order of the Euler Tour, we will visit it immediately after the current node. As for the right subtree, we will go there only when we visit the current vertex and all the vertices of the left subtree. The number of vertices in a subtree depends only on the length of the segment corresponding to the tree node.

We assume that if we have a segment [l, r], then the left subtree corresponds to the segment [l, m], and the right one — [m + 1, r], where m = (l + r) / 2. Then the length of the left segment will be equal to m - l + 1, and the number of nodes in it is 2·(m - l + 1) - 1.

Finally, we find that for the node v corresponding to the segment [l, r], the left subtree in the numbering in the Euler Tour order will be v + 1, and the right one — the vertex v + 2·(m - l + 1), where m = (l + r) / 2 — the midpoint of the segment [l, r].

Full text and comments »

  • Vote: I like it
  • +141
  • Vote: I do not like it

By dmkz, history, 5 years ago, In English

Hi! Maybe someone wants to realize crazy ideas using uint128_t during contest. I implemented some functionality (<<,>>,-,+,*,%,/) and tested on problem 984C. Finite or not?. If there are any bugs or optimizations which can reduce runtime, please tell.

Full text and comments »

  • Vote: I like it
  • +34
  • Vote: I do not like it

By dmkz, history, 5 years ago, In English

Hello, I need some problems which I can solve with scanline + segment tree / fenwick. I want to study this approach and solve different problems but I can't easy find this problems. I hope that you can help me, thanks.

Full text and comments »

  • Vote: I like it
  • +3
  • Vote: I do not like it

By dmkz, history, 6 years ago, In English

Problem: same code submitted 5 times with different comments got different verdict on codeforces testing system: from accepted to tle. Link to original picture.

Screen

First group of equals submissions: 42063540, 42063549, 42063558, 42063574, 42063583

Second group of equals submissions: 42063251, 42063390, 42063403, 42063414, 42063421

You can compare the code and make sure it's the same, I'm scared and I do not know if it's possible in a real competition in codeforces rounds.

Is it dangerous to write comments to a solution in code? It looks like codeforces testing system spending time for reading my comments, so can codeforces testing system works slowly if I wrote too much comments?

UPD: This is the sixth submission from first group with time 3088 ms

Full text and comments »

  • Vote: I like it
  • +69
  • Vote: I do not like it

By dmkz, history, 6 years ago, translation, In English

I found two C# solutions: with multithreadingAccepted and without multithreadingTLE. The author of the solution uses Parallel.For in the functionMultiplyMatrixPow, due to which it achieves success. Question: how many cores can we load on codeforces servers when testing our solution?

UPD: In first program author takes remainder from integer division 3 times, in second program just 1 times. So, this is not multithreading hack.

Full text and comments »

  • Vote: I like it
  • +36
  • Vote: I do not like it