Блог пользователя i_love_sqrt_decomp

Автор i_love_sqrt_decomp, история, 8 месяцев назад, По-английски

I have done a matrix multiplication program that runs ~50x faster than plain naive implementation and ~3x faster than IKJ-order with modular tricks in just 55 lines of code (see lines 256 to 310 in my submission).

Typically, to get this kind of speed (top 4 on Library Checker), you would have to spend 300+ lines of code for a Strassen implementation.

Here is the link: https://judge.yosupo.jp/submission/310249. A lot of the code was based on https://mirror.codeforces.com/blog/entry/101655, aside from the tmp part.

  • Проголосовать: нравится
  • +12
  • Проголосовать: не нравится

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится +6 Проголосовать: не нравится

Very cool, do you have advice on how to start programming for speed like this, like books to read or anything?

»
8 месяцев назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Nice work! I think this kind of speed mainly from your excellent kernel.

»
7 месяцев назад, скрыть # |
 
Проголосовать: нравится +5 Проголосовать: не нравится

An extension of this to FP32: https://judge.yosupo.jp/submission/314813. It gets about 91% efficiency ($$$102/112$$$ GFLOPS for $$$m=n=k=5440$$$, assuming multiplication and addition are 2 distinct operations), on a 3.5 GHz Zen 3.