On NVIDIA, Open Source Model, Sam Altman, Google, Cursor and many more.. 

Signup  |  Advertise  |  Follow on X  |  Read on Web

AlphaSignal

.

Hey ,

Welcome to today's edition of AlphaSignal, a newsletter for developers by developers.

We identify and summarize the top 1% news, papers, models, and repos in the AI industry. 

IN TODAY'S SIGNAL

Read time: 6 min 32 sec

🎖️ Top News


đź“Ś Ray Summit

⚡️ Trending Signals

🧠 Top Papers


🧠 Lecture

If you're enjoying AlphaSignal please forward this email to a colleague. 

It helps us keep this content free.

TOP NEWS

LLM

NVIDIA just dropped Nemotron 51B - 220% faster and can handle 400% more workload than Llama 3.1 70B

⇧ 1,054 Likes

What's New

NVIDIA has introduced Llama-3.1-Nemotron-51B, a language model derived from Meta's Llama-3.1-70B. It achieves 2.2x faster inference and handles 4x larger workloads on a single GPU while maintaining comparable accuracy to its parent model.


Key highlights

  • Llama-3.1-Nemotron-51B is optimized with TensorRT-LLM engines and packaged as an NVIDIA NIM inference microservice for streamlined deployment. It's available through the NVIDIA AI API with free credits for testing. 

  • The model was trained on 40 billion tokens from FineWeb, Buzz-V1.2, and Dolma datasets.

  • Achieves best accuracy per dollar on the efficiency frontier.

  • Packaged as NVIDIA NIM inference microservice for easy deployment


Access and License

  • Available through NVIDIA AI API with free credits also in HuggingFace.

  • License: NVIDIA AI Foundation Models Community License Agreement. Non-production use allowed; production requires subscription

Core innovation in this model

  • The core innovation lies in the use of Neural Architecture Search (NAS) and block-wise knowledge distillation. 

  • NAS optimizes the model's architecture for efficient inference on specific GPUs.

Performance metrics demonstrate significant improvements

  • Superior throughput. Text generation 6,472 tokens/s/GPU vs 2,975 for Llama-3.1-70B. 

  • Summarization efficiency: 653 tokens/s/GPU vs 339 for Llama-3.1-70B

  • MT Bench score: 8.99 vs 8.93 for Llama-3.1-70B

  • MMLU accuracy: 80.2% vs 81.66% for Llama-3.1-70B

  • The model preserves 98-100% accuracy across various benchmarks compared to Llama-3.1-70B.

  • NVIDIA also created Llama-3.1-Nemotron-40B, prioritizing speed over accuracy, achieving a 3.2x speed increase compared to the parent model.

This approach demonstrates potential for creating multiple efficient models from a single reference model, each optimized for specific hardware and inference scenarios. The technique could be applied to other language models or architectures, opening possibilities for smaller, more efficient models suitable for edge devices

READ MORE

Ray Summit: The World’s Largest Gathering of Open Source AI Leaders is Only 8 Days Away

Ray Summit kicks off next week. Join developers and researchers from top companies to explore AI's cutting edge.


Topics include

  • LLM fine-tuning and inference at scale

  • Distributed computing challenges

  • Latest in AI infrastructure

Keynote speeches by:

  • Marc Andreessen (Andreessen Horowitz)

  • Mira Murati (CTO, OpenAI)

  • Anastasis Germanidis (Co-Founder & CTO , Runway)

  • And many more

GET 15% OFF

partner with us

TRENDING SIGNALS

LLM

Google introduces Michelangelo: long-context reasoning evaluation challenging frontier models beyond needle-in-haystack tasks

⇧ 1,000 Likes

Computer Vision

Researchers unveil Stable-Delight: open-source tool removes reflections from images and videos in real-time

⇧ 452 Likes

AI Code generation

Viral tweet reveals Cursor's AI efficiency hack: Extensive code commenting improves codegen accuracy

⇧ 1,800 Likes

Local LLM

Open-source project launches AI-powered file organizer using local LLMs, enhancing privacy and efficiency

⇧ 1,700 Likes

AGI

OpenAI CEO Sam Altman's new blog goes viral predicting potential superintelligence emergence within few thousand days.

⇧ 13,000 Likes

TOP PAPERS

LLM

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

⇧ 795 Likes

Problem
LLMs struggle with planning tasks, as demonstrated by poor performance on PlanBench. OpenAI's new o1 model claims to be a "Large Reasoning Model" (LRM) with improved planning abilities. This paper evaluates o1's performance on PlanBench compared to standard LLMs and classical planners.


Solution
The authors test o1-preview and o1-mini on PlanBench's Blocksworld and Mystery Blocksworld tasks, comparing results to top LLMs and the Fast Downward planner. They evaluate accuracy, efficiency, and ability to recognize unsolvable problems. The study also examines o1's performance on longer planning tasks and considers cost-effectiveness.


Results
O1-preview achieves 97.8% accuracy on 3-5 block Blocksworld problems and 52.8% on Mystery Blocksworld, vastly outperforming LLMs. However, performance degrades on longer tasks (23.63% for 6-20 blocks). O1 struggles with unsolvable problems and is significantly more expensive than LLMs or classical planners. While impressive, o1's planning abilities are not yet robust or cost-effective.

RLHF

Training Language Models to Self-Correct via Reinforcement Learning

⇧ 2,300 Likes

Problem
LLMs lack effective self-correction abilities, struggling to improve their own responses without external input. Existing approaches rely on multiple models or oracle supervision. The paper aims to develop a method for training LLMs to self-correct using only self-generated data.


Solution
The authors propose SCoRe, a two-stage multi-turn reinforcement learning approach:

Train a model initialization less prone to collapse by optimizing second-attempt performance while constraining first-attempt responses.
Run multi-turn RL with reward shaping to incentivize self-correction behavior.
SCoRe addresses challenges of distribution mismatch and mode collapse in supervised fine-tuning approaches.


Results
SCoRe achieves state-of-the-art self-correction performance on MATH and HumanEval benchmarks, improving base Gemini models by 15.6% and 9.1% respectively. It outperforms baselines in both direct generation and self-correction accuracy.

Transformers

Kolmogorov-Arnold Transformer

⇧ 1,170 Likes

Problem
LLMs rely heavily on MLPs for information mixing, but MLPs face limitations in modeling complex functions. Kolmogorov-Arnold Networks (KANs) offer a potentially more expressive alternative, but integrating KANs into transformers has been challenging due to scalability issues.


Solution
The authors introduce Kolmogorov–Arnold Transformer (KAT), which replaces MLPs with Group-Rational KAN (GR-KAN) layers. Key innovations include:

Rational activation functions for GPU efficiency
Group KAN to reduce parameters and computation
Variance-preserving initialization for stable training

These enhancements allow KAT to scale effectively to large models.


Results
KAT consistently outperforms traditional MLP-based transformers across various vision tasks:

ImageNet-1K: KAT-B achieves 82.3% accuracy, surpassing ViT-B by 3.2%
COCO object detection: KAT-S improves AP_box by 3.0 over ViTDet-S
ADE20K semantic segmentation: KAT-S achieves 2.6% higher mIoU than DeiT-S

LECTURE

Llama

Learn converting GPT to Llama 2 architecture by Sebastian Raschka.

⇧ 1,414 Likes

Sebastian Raschka's tutorial breaks down the key differences between GPT and Llama 2 architectures. It provides a step-by-step guide to convert a GPT model into Llama 2, highlighting the fundamental changes in model structure and components. This practical walkthrough helps you understand the inner workings of these large language models.


You'll learn about

  • Replacing LayerNorm with RMSNorm

  • Switching from GELU to SiLU activation

  • Implementing rotary position embeddings (RoPE)

  • Updating the FeedForward module with SwiGLU

  • Loading and using pretrained Llama 2 weights

  • Adapting the tokenizer for Llama 2

The tutorial includes code snippets and explanations for each modification, allowing you to follow along and implement the changes yourself.

READ MORE

Stop receiving emails here

214 Barton Springs Rd, Austin, Texas, 78704, United States of America