Generative AI¶
Explorations into the frontier of Artificial Intelligence.
Core Concepts¶
- Transformer & Attention Foundations: The architecture that started it all. A mathematical and code-level deep dive into Self-Attention, Multi-Head Attention, and Positional Encodings.
SOTA Roadmap¶
We will cover the following cutting-edge topics:
1. Large Language Models (LLMs)¶
- Architectures: Mixture of Experts (MoE/Mixtral), Grouped Query Attention (GQA), Rotary Embeddings (RoPE).
- Optimization: FlashAttention-2, Memory-efficient Transformers.
- State-of-the-Art Models: Analysis of Llama 3, Gemini 1.5, Claude 3.5 Sonnet.
2. Alignment & Instruct Tuning¶
- RLHF Alternatives: Direct Preference Optimization (DPO), Identity Preference Optimization (IPO).
- Synthetic Data: Self-Instruct, Evol-Instruct, Constitutional AI.
3. Image & Video Generation¶
- Diffusion Evolution: Latent Diffusion (SDXL), Rectified Flow (Flux.1), Consistency Models.
- Video: Spacetime Patches (Sora), Masked Generative Transformers.
4. Reasoning & Agents¶
- Prompt Engineering: Chain of Thought (CoT), Tree of Thoughts (ToT), Reflection.
- Tool Use: ReAct, Toolformer, Function Calling patterns.
Key Resources¶
- Blog: Lilian Weng's Blog (Highly recommended for deep technical summaries).
- Visuals: The Illustrated Transformer (Jay Alammar).
- Video: Andrej Karpathy's Neural Networks: Zero to Hero.