AI
AI

Excessive Overtraining May Detrimentally Impact Large Language AI Models Despite Increased Data Usage

Photo credit: www.techradar.com

Researchers from top US universities warn extending pre-training can be detrimental to performanceToo much pre-training can deliver worse performance due to something akin to the butterfly effectThe more they are pre-trained, the more they become sensitive to small changes that could disrupt the end result

Researchers from Carnegie Mellon, Stanford, Harvard, and Princeton are questioning a long-held belief in the field of AI development: that an increase in pre-training data invariably enhances performance.

The study, highlighted by HPCwire, introduces the notion of “catastrophic overtraining,” which suggests that excessive pre-training can negatively impact a model’s performance during the fine-tuning stage.

In their findings, the research team evaluated two versions of the OLMo-1B model: one that underwent training on 2.3 trillion tokens and another trained on 3 trillion tokens. Contrary to expectations, the model exposed to a larger dataset exhibited performance declines of up to 3% on various benchmarks, including AlpacaEval and ARC.

Reaching the inflection point

The decrease in performance, according to the study, can be attributed to a concept referred to as “progressive sensitivity.”

With an increased number of tokens, the model appears to become more susceptible to minor alterations. Even slight modifications during the fine-tuning process, or the addition of noise, have the potential to negate earlier improvements.

The authors illustrated this by adding Gaussian noise to their pre-trained models, reporting more pronounced performance declines with extended training durations.

The phase at which training begins to hinder performance is termed the “inflection point.”

Upon reaching this threshold, the benefits of continued training are diminished by the increased risk of internal inconsistencies. The study indicates that this inflection point typically falls beyond 2.5 trillion tokens in smaller models like OLMo-1B.

“Catastrophic overtraining may be inevitable… especially when the pre-training and fine-tuning tasks are misaligned,” the authors note in their paper, which is accessible via the arXiv pre-print server.

While the researchers are not advocating for the cessation of pre-training altogether, they emphasize that developers should carefully evaluate the optimal amount of pre-training needed. The paper concludes with a call for a refreshed perspective on model scaling that encompasses the entirety of the training process.

For AI developers striving for scale, the takeaway is clear: in some cases, less may indeed be more.

You might also like

Source
www.techradar.com

Related by category

EA Allegedly Cancels Another Titanfall Game and Cuts Hundreds of Jobs

Photo credit: www.engadget.com The gaming sector is witnessing significant upheaval,...

A2 Hosting Unveils New Identity as Hosting.com

Photo credit: www.techradar.com New websiteNew panelNew productsHosting.com, formerly known as...

SpaceX Sends 23 Starlink Satellites into Orbit with Falcon 9 Rocket from Cape Canaveral

Photo credit: www.gadgets360.com SpaceX achieved a remarkable feat by launching...

Latest news

NASA Reaches New Heights in the First 100 Days of the Trump Administration

Photo credit: www.nasa.gov Today marks the 100th day of the...

CBS Evening News Plus: April 29 Edition

Photo credit: www.cbsnews.com Understanding Trump's Auto Tariff Modifications Recent shifts in...

Carême Review – A Sizzling French Adventure Featuring a Chef That’s Too Hot to Handle | Television & Radio

Photo credit: www.theguardian.com Exploring "Carême": A Culinary Journey Through the...

Breaking news