This page is a comprehensive compilation of all the links and resources in the book “Building AI for Production: Enhancing LLM Abilities and Reliability with Fine-Tuning and RAG”. Here, you’ll find a collection of code notebooks, checkpoints, GitHub repositories, learning resources, and all other materials shared throughout the book. It is organized chapter-wise and presented in chronological order for easy access.
If you see discrepancies between the code in the book and the code in colab, or want to improve the colabs with new updates, please feel free to create a pull request in the GitHub.
Chapter | Update Description |
---|---|
Chapter 1 | Expanded to include the latest benchmarks, such as LMSYS, and a run-through of recent models like GPT-4 and Gemini 1.5 and technniques like Infinite Attention. |
Chapter 2 | Extended industry applications of LLMs in sectors like media, education, finance, and medicine, with a deeper dive into pecific use cases in each industry, such as autocompletion, code prediction, and debugging in technology and software. |
Chapter 3 | Minor restructuring to improve logical flow and progressive understanding of LLM challenges and solutions. |
Chapter 4 | A new section on prompt injection introduces this emerging security challenge, detailing its types, impact on reliability, with solutions such as guardrails and safeguards to protect LLM integrity. |
Chapter 5: RAG (Previously, Introduction to LangChain and LlamaIndex) | Includes a step-by-step guide to building a basic Retrieval-Augmented Generation (RAG) pipeline from scratch, covering essentials like embeddings, cosine similarity, and vector stores. This foundation equips you to apply modern frameworks like LlamaIndex and LangChain more efficiently or go on your own with custom implementations and prepares you for their evolution better. |
Chapter 6: Introduction to LangChain & LlamaIndex (Previously Prompting with LangChain) | Introduces foundational elements of LangChain as part of a complete system, providing a comprehensive understanding of how each component functions within a broader context. This structured overview acts as a roadmap, enabling a clearer grasp of RAG pipelines in the upcoming chapters. |
Chapter 7: Prompting with LangChain (Previously RAG) | Includes LangChain Chains, previously part of the RAG chapter, for clarity. |
Chapter 8: Indexes, Retrievers, and Data preparation (New Chapter) | Indexes, Retrievers, and Data Preparation are essential components of a RAG pipeline. While these concepts were introduced in the first edition, this updated edition includes a dedicated chapter that focuses on their foundational principles. This approach ensures that readers can effectively scale LLM applications, optimize performance, and enhance response quality. Additionally, by emphasizing the fundamentals, this edition allows readers to understand and implement RAG concepts independently, without relying exclusively on frameworks like LangChain. |
Chapter 9: Advanced RAG | Only structural updates |
Chapter 10: Agents | Only structural updates |
Chapter 11: Fine-tuning | Only structural updates |
Chapter 12: Deployment and Optimization | The updated version takes a deeper dive into essential techniques for LLM deployment and optimization, making it more practical and relevant for current AI development needs. For example, the book explores model distillation, a powerful technique to reduce inference costs and improve latency, with a detailed case study on Google’s Gemma 2, demonstrating its real-world impact. With open-source LLMs growing in popularity, this edition also covers the deployment of LLMs on various cloud platforms, including Together AI, Groq, Fireworks AI, and Replicate. This broader approach helps readers find cost-effective and scalable solutions for real-world applications. |
LLMs are advancing rapidly, but the core skills and tools covered in this book—like fine-tuning, prompt engineering, and retrieval-augmented generation— will remain essential for adapting next-generation models to specific data, workflows, and industries. These principles will stay relevant across models, even as some specific libraries evolve.
For seamless code execution, we’ve included a requirements file for library versions. If you’re running notebooks on Google Colab, be aware that libraries like “pytorch” and “Transformers” are pre-installed. Should compatibility issues arise, try uninstalling these libraries in Colab and reinstalling the specified versions from the requirements file.
Switching to newer LLMs is straightforward. For instance, with OpenAI models, you can update the model simply by changing its name in the code. We recommend using GPT-4o-mini over “GPT-3.5 Turbo” in the book examples. Regularly checking documentation for Langchain, LlamaIndex, and OpenAI is also encouraged to stay aligned with updates and best practices.
This approach ensures your skills remain applicable in the dynamic LLM field.
No Notebooks.