Field notes on AI & infrastructure
Practical writing on applied AI, GPU compute, platforms and the craft of shipping software that survives contact with production.
Coming soon
This is Admin, a brand new site by Aleksandr Khaikin that's just getting started. Things will be up and running here shortly, but you can subscribe in the meantime if you'd like to stay up to date and receive emails when new content is published!
Building RAG Systems That Don't Hallucinate
Retrieval-augmented generation is easy to demo and hard to trust. Here is what separates a toy from a system you can put in front of customers.
LLM Agents Beyond the Demo: What Production Actually Looks Like
Agent demos run a perfect path once. Production agents face the other 200 paths. Here is how to design for the ones the demo never showed.
Vector Databases, Explained Without the Hype
Do you need a dedicated vector database, or is your existing one enough? A practical look at what these systems actually do and when they earn their keep.
Fine-Tuning vs Prompting: Choosing the Cheaper Path
Fine-tuning feels like the serious option. Most of the time it is the expensive answer to a question prompting already solved. Here is how to tell them apart.
How to Actually Evaluate an LLM Feature
You cannot ship what you cannot measure. Evaluating generative systems is harder than traditional software testing — and skipping it is how good demos become bad products.
The Real Economics of Running GPUs
GPU sticker prices get the headlines, but the bill that matters is utilisation, power, and idle time. A field guide to what AI compute really costs.
Small Language Models and the Quiet Shift to the Edge
The race for ever-larger models grabbed the headlines. The more consequential trend may be the opposite: small models good enough to run on a phone.
MLOps Foundations: From Notebook to Reliable Service
A model that works in a notebook is a science project. A model that serves real traffic reliably is an engineering system. Bridging the two is what MLOps is for.
Prompt Injection and the New AI Attack Surface
When your application takes instructions in plain language, attackers can write instructions too. Prompt injection is the vulnerability class that traditional security never prepared us for.
Model Quantization: Smaller, Faster, Almost as Good
Quantization shrinks a model by storing its numbers with less precision. Done well, it cuts memory and cost dramatically while barely touching quality. Here is the intuition and the tradeoffs.