Photo credit: arstechnica.com
Does size matter?
One significant benefit of reducing the complexity in a model’s internal weights is the lowered memory requirements. For instance, the BitNet b1.58 model operates efficiently with only 0.4GB of memory, whereas other open-weight models with comparable parameters generally require between 2 to 5GB.
This streamlined weighting approach not only conserves memory but also enhances efficiency during inference. The model predominantly utilizes simple addition operations, eliminating much of the reliance on more resource-intensive multiplication tasks. These adjustments result in BitNet b1.58 consuming between 85 to 96 percent less energy than comparable full-precision models, according to estimates from the researchers.
Observations from a demonstration of the BitNet b1.58 model on an Apple M2 CPU show its operational speed.
A demonstration of BitNet b1.58 operationally running on an Apple M2 CPU underscores its capabilities.
Employing a thoroughly optimized kernel crafted specifically for the BitNet structure, the b1.58 model can execute tasks at speeds that significantly surpass those of similar models operating on standard full-precision transformers. The efficiency is such that it can achieve speeds likened to human reading rates (5-7 tokens per second) with just a single CPU, according to the researchers. Interested users can download these optimized kernels for various ARM and x86 CPUs or experience them via this web demo.
Importantly, the researchers assert that these enhancements do not compromise performance on benchmarks assessing reasoning, mathematics, and general knowledge capabilities—although independent verification of this claim is still pending. In averaging multiple benchmark results, they found that BitNet “achieves capabilities nearly on par with leading models in its size class while offering dramatically improved efficiency.”
Even with a reduced memory requirement, BitNet maintains performance levels comparable to “full precision” weighted models across various benchmarks.
While the initial achievements of the BitNet model are noteworthy, the researchers admit that they still do not fully comprehend why the model performs as effectively as it does despite its simple weighting scheme. They note that exploring the theoretical foundations of effective 1-bit training at a large scale remains a crucial area for future research. There is also a need for ongoing investigations to enhance BitNet’s competitiveness regarding the extensive size and contextual understanding found in today’s largest models.
Nonetheless, this research highlights a promising alternative approach for AI models confronting rising hardware and energy expenditures associated with operating on costly, high-performance GPUs. There is a possibility that current “full precision” models are akin to high-performance vehicles that consume excessive energy and resources, whereas a well-designed, efficient model could achieve similar outcomes with considerably less input.
Source
arstechnica.com