Fast matrix multiplication does not need to be hard

→ Обратите внимание

До соревнования
Educational Codeforces Round 189 (Rated for Div. 2)
3 дня
Зарегистрироваться »

→ Трансляции

aryanc403

До начала 27:59:25

Всё →

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	Benq	3792
2	VivaciousAubergine	3647
3	Kevin114514	3603
4	jiangly	3583
5	turmax	3559
6	tourist	3541
7	strapple	3515
8	ksun48	3461
9	dXqwq	3436
10	Otomachi_Una	3413

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	Qingyu	157
2	adamant	153
3	Um_nik	147
3	Proof_by_QED	147
5	Dominater069	145
6	errorgorn	141
7	cry	139
8	YuukiS	135
9	TheScrasse	134
10	chromate00	133

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя i_love_sqrt_decomp

Fast matrix multiplication does not need to be hard

Автор i_love_sqrt_decomp, история, 8 месяцев назад, По-английски

I have done a matrix multiplication program that runs ~50x faster than plain naive implementation and ~3x faster than IKJ-order with modular tricks in just 55 lines of code (see lines 256 to 310 in my submission).

Typically, to get this kind of speed (top 4 on Library Checker), you would have to spend 300+ lines of code for a Strassen implementation.

Here is the link: https://judge.yosupo.jp/submission/310249. A lot of the code was based on https://mirror.codeforces.com/blog/entry/101655, aside from the tmp part.

i_love_sqrt_decomp
8 месяцев назад
5

Комментарии (5)

Написать комментарий?

coordinatebash

8 месяцев назад, скрыть # |

Very cool, do you have advice on how to start programming for speed like this, like books to read or anything?

→ Ответить

i_love_sqrt_decomp

8 месяцев назад, скрыть # ^ |

← Rev. 2 →

I learned about things like this by reading articles and figure things out by myself. To get a performance estimate, https://uops.info/ has a lot of performance data for instructions. Also https://godbolt.org/ is another useful tool for inspecting the code.

→ Ответить

virinci

7 месяцев назад, скрыть # ^ |

https://en.algorithmica.org/hpc/ is an excellent resource by sslotin (the author of the blog linked in the post).

→ Ответить

QedDust413

8 месяцев назад, скрыть # |

Nice work! I think this kind of speed mainly from your excellent kernel.

→ Ответить

i_love_sqrt_decomp

7 месяцев назад, скрыть # |

An extension of this to FP32: https://judge.yosupo.jp/submission/314813. It gets about 91% efficiency ($$$102/112$$$ GFLOPS for $$$m=n=k=5440$$$, assuming multiplication and addition are 2 distinct operations), on a 3.5 GHz Zen 3.

→ Ответить

Соревнования по программированию 2.0

Время на сервере: 18.04.2026 09:30:35 (h1).

Десктопная версия, переключиться на мобильную.

При поддержке