kimiyuki's blog

By kimiyuki, history, 3 years ago, In English

Hello Codeforces!

I'm developing a new tool, Jikka, a transpiler from Python to C++ developed to become an automatic solver of problems of competitive programming.

Jikka takes simple Python code as input and optimizes it to reduce its time complexity. i.e., you can obtain an AC solution from a naive TLE solution. For example, the below Python code, obviously O(N^2), but it becomes O(N log N) with Jikka.

def solve(n: int, c: int, h: List[int]) -> int:
    dp = [INF for _ in range(n)]
    dp[0] = 0
    for i in range(n):
        for j in range(i + 1, n):
            dp[j] = min(dp[j], dp[i] + (h[i] - h[j]) ** 2 + c)
    return dp[n - 1]

The current version of Jikka is still under development and requires naive solutions, so actually, it can solve only a few problems and is difficult to use in real contests yet. Also, you can do such easy rewritings by hand (right?). However, I think this is an interesting and dreaming project. I hope that Jikka will allow us to skip trivial implementation and focus only on more essential parts of algorithms in the future, as C++ compilers free us from assembly and machine languages.

Are you interested? Please join in development!

  • Vote: I like it
  • +158
  • Vote: I do not like it

»
3 years ago, # |
Rev. 2   Vote: I like it +16 Vote: I do not like it

This tool is really cool, but programmer need to know solution by himself, not by solver or optimiser

»
3 years ago, # |
Rev. 2   Vote: I like it +24 Vote: I do not like it

Could you elaborate on the best-case scenario here? That is, let's find the hardest Codeforces/Atcoder problem that can be solved in the optimal complexity in the following two steps:

  1. Write the brute force solution in Python
  2. Run AST transformations to get the best solution.

(I worked on AST transformations for solving math problems a long time ago, so I'm aware of how well it can work, and some limitations wrt harder problems.)

»
3 years ago, # |
  Vote: I like it +69 Vote: I do not like it

That feels like a very inefficient way to write a programming library

What is intended usage of this tool? Am i supposed to intentionally write a slow solution and then hope that your tool will be able optimize it? How do I then know that the complexity is good enough? How do I distinguish these 3 cases if it doesn't work:
1. The optimization is possible, but your tool didn't work because I had some useless stuff in my for-loops, or I wrote them in a weird way, or I calculate two dp-s at once, etc
2. The optimization is possible, just not yet available in your tool
3. The optimization is not possible and I need to come up with a new solution

And turns out that I actually need to know which optimizations your tool provides and when I can use them. At this point I can just go to a blog like this and don't limit myself with the format your tool can parse. Also I can read this list even before implementing my solution to make sure it can be optimized.

Some dp optimizations actually need a bunch of code, but then I would just create a library, if it's something like CHT, or matrix exponentiation.

About conversion from Python to C++. I tried using Jikka and it took me some time to write a code which doesn't give conversion errors. But even when it works, is it guaranteed to be working? Personally I wouldn't feel comfortable using a tool which converts a code to (possibly unknown to me) language with unreadable variable names and I can't be 100% sure that this conversion will be correct, because there is a possibility that I will have to fix it by hand. For example, integers in Python don't have a limit, and any usage of that will give wrong code in C++.

Also it seems like your tool is two independent parts: optimization and conversion. Why not separate them? Often a solution in Python with correct complexity is enough. And if you just don't like low-level C++, you're not the first one, there are languages like D or Rust.

Obviously it's still a very cool project, and I'd like to see it becoming better

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    Thank you for trying my tool!

    That feels like a very inefficient way to write a programming library

    Yeah, this is correct. I sometimes feel this too, when I'm porting my library into this tool...

    I intended to use this tool to skip some part of implementation after finding an algorithm of sufficiently small complexity. Slower solutions are often easier to implement than the optimal solution, so we want to write only a slower one and extract the optimal one from it. So I assume that we intentionally write a slow solution. And when the complexity is not enough after conversion (the complexity after conversion will be reported in the future version), we just need to implement it by ourselves.

    About the language conversion part: The current conversion from Python to C++ is mainly due to implementation convenience. Python has a simple syntax, is sufficiently expressive, and has many users (no one wants to learn a new own language for this tool). C++ is fast (no need to care about constant factors) and has many libraries but has complex syntax and is less expressive. As you said, it is more convenient if the input language and the output language can be the same. Allowing C++ as input and Python as output are both planned future tasks.

    Related to these topics, I have a plan to make it callable from IDE like VS Code. Rich IDEs have features to rewrite code, e.g. "insert type annotations here", "moving this local variable to global", etc. I will make Jikka a plugin and add features like "making this O(N^2) loop to O(N)" to IDEs.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +22 Vote: I do not like it

      Plugin idea sounds really awesome. Maybe insert some note with a description of an algorithm from the perspective of actually learning these optimizations (or at least a name/link)? Also when you only change a small piece of code, preserving variable names should be easier, and it will be possible to read and tweak the code after a conversion

»
3 years ago, # |
  Vote: I like it +8 Vote: I do not like it

This is really cool!

I have seen something similar before for just the matrix exponentiation part except they rewrite the python using a decorator instead of generating C++:

https://github.com/borzunov/cpmoptimize

https://kukuruku.co/post/automatic-algorithms-optimization-via-fast-matrix-exponentiation/

There's some interesting discussion about it in https://news.ycombinator.com/item?id=17592359 on why compilers (specifically GCC) don't already do these optimizations automatically and it basically boils down to being too rare to be worth it. But these sort of symbolic algebraic optimizations are certainly not rare in CP! There are so many problems that boils down to writing down a bruteforce and then looking it up in OEIS or simplifying it with wolfram alpha or other CAS.

Is the development of the project going to be in japanese? I would love to contribute but I can't read the docs. :(

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    Thank you for introducing the project! I didn't know this decorator, and its discussions have many useful references. As for what I found in the discussion, I like Herbie, a program to rewrite an expression of floating-point numbers so that they have less errors.

    Is the development of the project going to be in japanese? I would love to contribute but I can't read the docs. :(

    I'm going to do the development in both English and Japanese. English is fine for issues and pull requests. I haven't been able to translate some of old documents yet, but I will translate them in a few days.

»
3 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Regardless of whether this has practical applications, I think this is really cool as a "compiler research project".

For segment trees (to pick a data structure as an example), I wonder if the strategy of converting a "naive range data structure" into an "optimized segment tree class" might be better than converting a "naive algorithm" into an "optimized algorithm using segment trees"

Even for the standard python structures, there are "free" augmentations (in the sense that the new data structure is much faster on some operations, say append, but as fast / only log (N) slower than the old data structure on other). E.g. http://stutzbachenterprises.com/blist/blist.html https://github.com/ldct/cp/blob/master/templates/IndexingList.py (I use this class myself quite often).

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I agree with you, I feel the strategy of converting a naive data structure is easier to implement and more hopeful than the strategy of converting a naive algorithm. It is easy to recognize naive data structures (e.g. checking types of variables) to replace them with such augmented ones. On the other hand, recognizing and manipulating algorithms is a more difficult challenge.