Photo credit: venturebeat.com
Deep Cogito, a San Francisco-based AI research startup, has officially launched its product, Cogito v1, after emerging from stealth mode. This new line of large language models (LLMs), derived from Meta’s Llama 3.2, introduces hybrid reasoning capabilities that enable rapid responses and a reflective mode akin to OpenAI’s “o” series and DeepSeek R1.
The primary goal of Deep Cogito is to extend AI capabilities beyond the limitations of current human oversight, allowing models to iteratively enhance and internalize their reasoning strategies. The company’s ultimate aim is the development of superintelligence—AI that surpasses human intelligence across all fields—while maintaining an open-source approach to all their models.
Drishan Arora, CEO and co-founder of Deep Cogito and former Senior Software Engineer at Google, asserted that their models are among the most robust at their scale, surpassing competitors like LLaMA, DeepSeek, and Qwen.
Initially, the company has released five models with varying sizes: 3 billion, 8 billion, 14 billion, 32 billion, and 70 billion parameters. These models can be accessed through platforms like Hugging Face, Ollama, and via APIs on Fireworks and Together AI.
The models are released under Llama licensing terms, which allow for commercial applications— enterprises can utilize them inFor commercial products as long as they do not exceed 700 million monthly users, after which a paid license from Meta is required.
Plans are underway to introduce even larger models, potentially reaching up to 671 billion parameters in the near future.
Arora detailed the company’s training methodology called iterated distillation and amplification (IDA), which presents an innovative alternative to traditional reinforcement learning from human feedback (RLHF) and teacher-model distillation. The essence of the IDA approach is to leverage more computational resources to allow a model to generate enhanced solutions, effectively creating a feedback loop for self-improvement.
The Cogito models are available for download on platforms like Hugging Face and Ollama, as well as through APIs from Fireworks AI and Together AI. Each model features both a standard mode for direct answers and a reasoning mode, allowing for deeper internal reflection before providing responses.
Benchmarks and Evaluations
The company released extensive evaluation data illustrating the performance of Cogito models against other open-source alternatives across various tasks such as general knowledge, mathematics, and multilingual capabilities. Key highlights include:
Cogito 3B (Standard) surpasses LLaMA 3.2 3B in MMLU performance by 6.7 percentage points (65.4% vs. 58.7%) and excels in Hellaswag by 18.8 points (81.1% vs. 62.3%).
In reasoning mode, Cogito 3B achieves 72.6% on MMLU and 84.2% on ARC, demonstrating an improvement from its standard performance due to the effect of IDA-based self-reflection.
Cogito 8B (Standard) earns a score of 80.5% on MMLU, exceeding LLaMA 3.1 8B by 12.8 points and achieving 88.7% on ARC.
In reasoning mode, Cogito 8B scores 83.1% on MMLU and 92.0% on ARC, outperforming DeepSeek R1 Distill 8B in most areas aside from the MATH benchmark, where it scores lower (60.2% vs. 80.6%).
Cogito 14B and 32B models collectively exceed Qwen2.5 counterparts by 2-3 percentage points on average benchmarks, with Cogito 32B (Reasoning) achieving an impressive 90.2% on MMLU and 91.8% on MATH.
Cogito 70B (Standard) surpasses LLaMA 3.3 70B on MMLU by 6.4 points (91.7% vs. 85.3%) and even outperforms LLaMA 4 Scout 109B on overall benchmarks (54.5% vs. 53.3%).
When compared with DeepSeek R1 Distill 70B, Cogito 70B (Reasoning) demonstrates stronger results across general and multilingual assessments, highlighted by a 91.0% score on MMLU and a 92.7% on MGSM.
Generally, the Cogito models exhibit superior performance in reasoning mode, though inconsistencies arise, particularly in mathematical tasks. For example, while the standard version of Cogito 70B performs comparably with peers in MATH and GSM8K, the reasoning version lags behind DeepSeek R1 by more than five percentage points (83.3% vs. 89.0%).
Tool Calling Built-In
Deep Cogito has also assessed the native tool-calling capabilities of its models, an increasingly vital feature for integrated systems operating within various applications.
Cogito 3B supports four types of tool-calling tasks, including simple, parallel, multiple, and parallel-multiple, which LLaMA 3.2 3B does not offer. Cogito 3B boasts a 92.8% success rate on simple tool calls and above 91% for multiple tool calls.
Meanwhile, Cogito 8B consistently achieves over 89% across all tool call types, marking a significant improvement over LLaMA 3.1 8B, which performs in the 35% to 54% range.
These advancements are attributed not only to the models’ architecture and training datasets but also to task-specific post-training methodologies that many baseline models currently lack.
Looking Ahead
Deep Cogito is set to introduce more expansive models in the upcoming months, including mixture-of-expert versions at 109B, 400B, and 671B parameter scales. The company will also continue to enhance its current model checkpoints through extended training efforts.
Deep Cogito’s IDA approach aims to provide a sustainable pathway for scalable self-improvement, reducing reliance on human intervention or fixed teacher models. Arora has emphasized the significance of real-world applicability and adaptability as crucial metrics for the effectiveness of these models, suggesting that the company is only at the beginning of a substantial scaling journey.
Collaborative partnerships with entities such as Hugging Face, RunPod, Fireworks AI, Together AI, and Ollama bolster Deep Cogito’s research and infrastructure. All models are available open source, encouraging further innovation and experimentation in the AI community.
Source
venturebeat.com