The AI-Ready Data Engineer

Chapter 2: The 48-Hour Crash Course

Learn AI essentials in a weekend. Build your first RAG app, understand LLMs and prompt engineering, explore AI agents, vector databases, and the Model Context Protocol.

Day 1: Morning — Core concepts#

Get your bearings (4 hours)

Start with Chip Huyen's "LLM Engineering in Production" blog. It's a reality check on what actually matters in production. Then watch Andrej Karpathy's "Deep Dive into LLMs like ChatGPT" on YouTube (3h31m, February 2025). It covers the full modern stack including RLHF, reasoning models, and tool use. His follow-up "How I Use LLMs" is worth watching after, for the practical workflows.

For prompt engineering, Google has a surprisingly good guide. The DAIR.AI Prompt Engineering Guide on GitHub is also solid. Just skim the quick start.

Day 1: Afternoon — RAG basics#

Retrieval-Augmented Generation basics (4 hours)

AWS has a decent "What is RAG?" explainer. After that, check out the free DeepLearning.AI RAG overview video It's actually pretty good.

For hands-on stuff, Pixegami has an excellent tutorial on GitHub. You'll build your first RAG app end-to-end. Good learning experience.

Day 1: Evening — AI agents#

Understanding AI Agents (4 hours)

LangChain's agent tutorial is the best starting point. You'll build a ReAct agent that can actually use tools. Pretty cool when you see it work for the first time.

IBM has a good explainer on ReAct agents if you want the theory. Then watch Sam Witteveen's "Master CrewAI Tutorial" on YouTube. He's good at explaining multi-agent concepts and CrewAI is useful for multi-agent systems.

If you want something more production-oriented from day one, the OpenAI Agents SDK is worth a look alongside LangChain. It's more opinionated about structure, which turns out to be a feature when you're productionizing.

Day 2: Morning — Vector databases#

Vector databases (4 hours)

Shakudo published a March 2026 comparison of the top 9 vector databases, including pgvector, which is worth reading first — it covers the tradeoffs honestly and includes the "just use Postgres" option. Then just pick one (Pinecone or Qdrant are good starting points) and do their quickstart: Pinecone, Qdrant.

Daily Dose of DS has a practical RAG course that's worth checking out too.

Day 2: Afternoon — Model Context Protocol#

Model Context Protocol (4 hours)

Model Context Protocol is Anthropic's standard for how AI agents connect to external tools and data sources. The "USB-C port for AI" analogy is apt: one protocol, any tool. Check out the introduction and DataCamp's tutorial on using it with Claude Desktop. The MCP servers repository has pre-built integrations for most tools you're already using.

For a hands-on example of MCP applied to data engineering, explore Altimate Code, an open-source agentic harness that uses MCP to connect AI agents to dbt, SQL, and cloud warehouses with 100+ pre-built tools.

MCP solves the connectivity layer: with 10,000+ public servers, any agent can now reach any tool through a standardized interface. What it doesn't solve is the organizational context layer. The agent can call the function, but it doesn't know why your team made the architectural decisions it did, which downstream models depend on the output, or what a schema change will cost against your specific query patterns. Every MCP harness hits this wall eventually. It's not a protocol gap. It's an organizational knowledge gap, and it compounds as your data estate grows.

Day 2: Evening — Production basics#

Production basics (4 hours)

ZenML compiled 1,200 LLMOps production deployments and published the patterns. Skim the summary for what's actually working in production. Then set up Helicone for monitoring (literally one line of code for cost tracking). Then deploy your RAG app with a Streamlit UI for demos.

Resources to bookmark#

Ben's Bites newsletter — best daily AI roundup
r/LocalLLaMA subreddit — where the real discussions happen
DataTalks.Club Slack — great community, super helpful
Hugging Face — model hub and datasets
Anthropic Claude docs — excellent MCP documentation
Ollama — run LLMs locally on your machine

Previous1. Why AI Matters for Data Engineers Next3. The One-Month Program