Signup | Advertise | Follow on X | Read on Web

AlphaSignal

Hey ,

Welcome to today's edition of AlphaSignal, a newsletter for developers by developers.

We identify and summarize the top 1% news, papers, models, and repos in the AI industry.

IN TODAY'S SIGNAL

Read time: 6 min 32 sec

🎖️ Top News

NVIDIA just dropped Nemotron 51B - 220% faster and can handle 400% more workload than Llama 3.1 70B.

📌 Ray Summit

Open source AI leaders gather in 8 days, expert talks on LLM tuning, scaling, inference, and distributed computing.

⚡️ Trending Signals

Google introduces Michelangelo: long-context reasoning evaluation challenging frontier models beyond needle-in-haystack tasks.
Researchers unveil Stable-Delight: open-source tool removes reflections from images and videos in real-time.
Viral tweet reveals Cursor's AI efficiency hack: Extensive code commenting improves codegen accuracy.
Open-source project launches AI-powered file organizer using local LLMs, enhancing privacy and efficiency.
OpenAI CEO Sam Altman's new blog goes viral predicting potential superintelligence emergence within few thousand days.

🧠 Top Papers

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench.
Training Language Models to Self-Correct via Reinforcement Learning.
Kolmogorov-Arnold Transformer, enhancing transformer capabilities across vision tasks.

🧠 Lecture

Learn converting GPT to Llama 2 architecture by Sebastian Raschka.

If you're enjoying AlphaSignal please forward this email to a colleague.

It helps us keep this content free.

TOP NEWS

LLM

NVIDIA just dropped Nemotron 51B - 220% faster and can handle 400% more workload than Llama 3.1 70B

⇧ 1,054 Likes

What's New

NVIDIA has introduced Llama-3.1-Nemotron-51B, a language model derived from Meta's Llama-3.1-70B. It achieves 2.2x faster inference and handles 4x larger workloads on a single GPU while maintaining comparable accuracy to its parent model.

Key highlights

Llama-3.1-Nemotron-51B is optimized with TensorRT-LLM engines and packaged as an NVIDIA NIM inference microservice for streamlined deployment. It's available through the NVIDIA AI API with free credits for testing.
The model was trained on 40 billion tokens from FineWeb, Buzz-V1.2, and Dolma datasets.
Achieves best accuracy per dollar on the efficiency frontier.
Packaged as NVIDIA NIM inference microservice for easy deployment

Access and License

Available through NVIDIA AI API with free credits also in HuggingFace.
License: NVIDIA AI Foundation Models Community License Agreement. Non-production use allowed; production requires subscription

Core innovation in this model

The core innovation lies in the use of Neural Architecture Search (NAS) and block-wise knowledge distillation.
NAS optimizes the model's architecture for efficient inference on specific GPUs.

Performance metrics demonstrate significant improvements

Superior throughput. Text generation 6,472 tokens/s/GPU vs 2,975 for Llama-3.1-70B.
Summarization efficiency: 653 tokens/s/GPU vs 339 for Llama-3.1-70B
MT Bench score: 8.99 vs 8.93 for Llama-3.1-70B
MMLU accuracy: 80.2% vs 81.66% for Llama-3.1-70B
The model preserves 98-100% accuracy across various benchmarks compared to Llama-3.1-70B.
NVIDIA also created Llama-3.1-Nemotron-40B, prioritizing speed over accuracy, achieving a 3.2x speed increase compared to the parent model.

This approach demonstrates potential for creating multiple efficient models from a single reference model, each optimized for specific hardware and inference scenarios. The technique could be applied to other language models or architectures, opening possibilities for smaller, more efficient models suitable for edge devices

Ray Summit: The World’s Largest Gathering of Open Source AI Leaders is Only 8 Days Away

Ray Summit kicks off next week. Join developers and researchers from top companies to explore AI's cutting edge.

Topics include

LLM fine-tuning and inference at scale
Distributed computing challenges
Latest in AI infrastructure

Keynote speeches by:

Marc Andreessen (Andreessen Horowitz)
Mira Murati (CTO, OpenAI)
Anastasis Germanidis (Co-Founder & CTO , Runway)
And many more

GET 15% OFF

partner with us

TRENDING SIGNALS

LLM

Google introduces Michelangelo: long-context reasoning evaluation challenging frontier models beyond needle-in-haystack tasks

⇧ 1,000 Likes

Computer Vision

Researchers unveil Stable-Delight: open-source tool removes reflections from images and videos in real-time

⇧ 452 Likes

AI Code generation

Viral tweet reveals Cursor's AI efficiency hack: Extensive code commenting improves codegen accuracy

⇧ 1,800 Likes

Local LLM

Open-source project launches AI-powered file organizer using local LLMs, enhancing privacy and efficiency

⇧ 1,700 Likes

AGI

OpenAI CEO Sam Altman's new blog goes viral predicting potential superintelligence emergence within few thousand days.

⇧ 13,000 Likes

TOP PAPERS

LLM

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

⇧ 795 Likes

Problem
LLMs struggle with planning tasks, as demonstrated by poor performance on PlanBench. OpenAI's new o1 model claims to be a "Large Reasoning Model" (LRM) with improved planning abilities. This paper evaluates o1's performance on PlanBench compared to standard LLMs and classical planners.

Solution
The authors test o1-preview and o1-mini on PlanBench's Blocksworld and Mystery Blocksworld tasks, comparing results to top LLMs and the Fast Downward planner. They evaluate accuracy, efficiency, and ability to recognize unsolvable problems. The study also examines o1's performance on longer planning tasks and considers cost-effectiveness.

Results
O1-preview achieves 97.8% accuracy on 3-5 block Blocksworld problems and 52.8% on Mystery Blocksworld, vastly outperforming LLMs. However, performance degrades on longer tasks (23.63% for 6-20 blocks). O1 struggles with unsolvable problems and is significantly more expensive than LLMs or classical planners. While impressive, o1's planning abilities are not yet robust or cost-effective.

RLHF

Training Language Models to Self-Correct via Reinforcement Learning

⇧ 2,300 Likes

Problem
LLMs lack effective self-correction abilities, struggling to improve their own responses without external input. Existing approaches rely on multiple models or oracle supervision. The paper aims to develop a method for training LLMs to self-correct using only self-generated data.

Solution
The authors propose SCoRe, a two-stage multi-turn reinforcement learning approach:

Train a model initialization less prone to collapse by optimizing second-attempt performance while constraining first-attempt responses.
Run multi-turn RL with reward shaping to incentivize self-correction behavior.
SCoRe addresses challenges of distribution mismatch and mode collapse in supervised fine-tuning approaches.

Results
SCoRe achieves state-of-the-art self-correction performance on MATH and HumanEval benchmarks, improving base Gemini models by 15.6% and 9.1% respectively. It outperforms baselines in both direct generation and self-correction accuracy.

Transformers

Kolmogorov-Arnold Transformer

⇧ 1,170 Likes

Problem
LLMs rely heavily on MLPs for information mixing, but MLPs face limitations in modeling complex functions. Kolmogorov-Arnold Networks (KANs) offer a potentially more expressive alternative, but integrating KANs into transformers has been challenging due to scalability issues.

Solution
The authors introduce Kolmogorov–Arnold Transformer (KAT), which replaces MLPs with Group-Rational KAN (GR-KAN) layers. Key innovations include:

Rational activation functions for GPU efficiency
Group KAN to reduce parameters and computation
Variance-preserving initialization for stable training

These enhancements allow KAT to scale effectively to large models.

Results
KAT consistently outperforms traditional MLP-based transformers across various vision tasks:

ImageNet-1K: KAT-B achieves 82.3% accuracy, surpassing ViT-B by 3.2%
COCO object detection: KAT-S improves AP_box by 3.0 over ViTDet-S
ADE20K semantic segmentation: KAT-S achieves 2.6% higher mIoU than DeiT-S

LECTURE

Llama

Learn converting GPT to Llama 2 architecture by Sebastian Raschka.

⇧ 1,414 Likes

Sebastian Raschka's tutorial breaks down the key differences between GPT and Llama 2 architectures. It provides a step-by-step guide to convert a GPT model into Llama 2, highlighting the fundamental changes in model structure and components. This practical walkthrough helps you understand the inner workings of these large language models.

You'll learn about

Replacing LayerNorm with RMSNorm
Switching from GELU to SiLU activation
Implementing rotary position embeddings (RoPE)
Updating the FeedForward module with SwiGLU
Loading and using pretrained Llama 2 weights
Adapting the tokenizer for Llama 2

The tutorial includes code snippets and explanations for each modification, allowing you to follow along and implement the changes yourself.

Stop receiving emails here

214 Barton Springs Rd, Austin, Texas 78704, USA