Jan 01, 2026

DeepSeek Introduces mHC Architecture to Improve Large Model Training

TLDR DeepSeek introduced Manifold-Constrained Hyper-Connections (mHC) to improve large-model training scalability and efficiency. The mHC method was tested on 3B, 9B, and 27B parameter models, showing stable performance without added computational cost. mHC builds on ByteDance’s 2024 hyper-connection architecture by adding a manifold constraint to reduce memory overhead. CEO Liang Wenfeng co-authored and uploaded the [...]

The post DeepSeek Introduces mHC Architecture to Improve Large Model Training appeared first on Blockonomi.

Source: Blockonomi →