Llama 4 is out. πŸ‘€

Llama 4 Maverick (402B) and Scout (109B) - natively multimodal, multilingual and scaled to 10 MILLION context! BEATS DeepSeek v3πŸ”₯

Llama 4 Maverick:

> 17B active parameters, 128 experts, 400B total parameters > Beats GPT-4o & Gemini 2.0 Flash, competitive with DeepSeek v3 at half the active parameters > 1417 ELO on LMArena (chat performance). > Optimized for image understanding, reasoning, and multilingual tasks

Llama 4 Scout:

> 17B active parameters, 16 experts, 109B total parameters

> Best-in-class multimodal model for its size, fits on a single H100 GPU (with Int4 quantization)

> 10M token context window

> Outperforms Gemma 3, Gemini 2.0 Flash-Lite, Mistral 3.1 on benchmarks

Architecture & Innovations

> Mixture-of-Experts (MoE):

First natively multimodal Llama models with MoE

> Llama 4 Maverick: 128 experts, shared expert + routed experts for better efficiency.

Native Multimodality & Early Fusion:

> Jointly pre-trained on text, images, video (30T+ tokens, 2x Llama 3)

> MetaCLIP-based vision encoder, optimized for LLM integration

> Supports multi-image inputs (up to 8 tested, 48 pre-trained)

Long Context & iRoPE Architecture:

> 10M token support (Llama 4 Scout)

> Interleaved attention layers (no positional embeddings)

> Temperature-scaled attention for better length generalization

Training Efficiency:

> FP8 precision (390 TFLOPs/GPU on 32K GPUs for Behemoth)

> MetaP technique: Auto-tuning hyperparameters (learning rates, initialization)

Revamped Pipeline:

> Lightweight Supervised Fine-Tuning (SFT) β†’ Online RL β†’ Lightweight DPO

> Hard-prompt filtering (50%+ easy data removed) for better reasoning/coding

> Continuous Online RL: Adaptive filtering for medium/hard prompts

Reply to this note

Please Login to reply.

Discussion

Next level vibe coding.

Where can I use it, I’ll pay sats for queries. Haven’t been following the AI world

I don't know what half of this means but it sounds like Maverick is beating the shit out of the anti-freedom corporations πŸ€™

nostr:nevent1qqs9026acs6zdg4gznv8rr5crpy8taskqpjn5dsa7cjue4zzugcj0hspz4mhxue69uhhyetvv9ujuerpd46hxtnfduhsygyehd2erjg3vcq0s3gs05clndv79a78uzdpl7qzap836s762472vspsgqqqqqqska4apm

Can you make music like on chatgpt?

Dang that’s a big context window