i_love_sqrt_decomp's blog

By i_love_sqrt_decomp, history, 8 months ago, In English

I have done a matrix multiplication program that runs ~50x faster than plain naive implementation and ~3x faster than IKJ-order with modular tricks in just 55 lines of code (see lines 256 to 310 in my submission).

Typically, to get this kind of speed (top 4 on Library Checker), you would have to spend 300+ lines of code for a Strassen implementation.

Here is the link: https://judge.yosupo.jp/submission/310249. A lot of the code was based on https://mirror.codeforces.com/blog/entry/101655, aside from the tmp part.

  • Vote: I like it
  • +12
  • Vote: I do not like it

»
8 months ago, hide # |
 
Vote: I like it +6 Vote: I do not like it

Very cool, do you have advice on how to start programming for speed like this, like books to read or anything?

»
8 months ago, hide # |
 
Vote: I like it 0 Vote: I do not like it

Nice work! I think this kind of speed mainly from your excellent kernel.

»
7 months ago, hide # |
 
Vote: I like it +5 Vote: I do not like it

An extension of this to FP32: https://judge.yosupo.jp/submission/314813. It gets about 91% efficiency ($$$102/112$$$ GFLOPS for $$$m=n=k=5440$$$, assuming multiplication and addition are 2 distinct operations), on a 3.5 GHz Zen 3.