I have been working on a RAM-only optimized k32 plotter that is lock-free and scales linearly with each thread. It is extremely cache friendly, and as you can see from the htop output, threads spend barely any time in kernel mode. There is a bit more room for improvement. Moving on to Phase 3 now with an optimization plan already in hand.
The video shown is running on an 64-thread ARM CPU.
Fx calculation is not using any SIMD. I did not see any significant gain (sometimes regression) on the NEON versions of blake3 for the way it is used in the plotting mechanism.
57 views
3388
1276
3 weeks ago 00:47:43 4
Страна с бесконечными деньгами. Секрет могущества США
3 weeks ago 00:19:54 1
Просто смешайте Капусту и 2 яйца! Ужин за 5 минут, который ПОКОРИЛ Мир! 🥰
3 weeks ago 00:03:18 1
Горячая ЗАКУСКА на Старый Новый год. Аля Жульен, цыганка готовит.
4 weeks ago 00:04:26 1
Не Шарлотка! Самый яблочный пирог, словно торт с кремом | Еда на любой вкус
4 weeks ago 00:14:06 1
lingWAVES Voice Protocol
4 weeks ago 00:01:56 2
ВКУСНЕЙШАЯ томленая говядина со сливочным маслом ☆
1 month ago 00:15:10 1
✔ Оцени! Деревянная лестница за 30 минут с помощью болгарки