Reverse-Engineering the RK3588 NPU: Hacking Memory Limits to run Vision Transformers
Reverse-engineering the Rockchip RK3588 NPU to run SmolVLM 15x faster by discovering hardware limits, defeating compiler optimizations, and building a custom sharding runtime