Ask ten senior AI engineers at Indian product companies what stack they're running and you'll get eleven different opinions and a twenty-minute argument. Good. That means the field is actually moving.
The debates inside India's best AI teams right now are sharper and more opinionated than anything you'll find in a vendor blog or a conference talk. This is an attempt to lay out the live ones: the questions without clean answers yet, where genuinely smart people are landing in different places.
Build vs Buy vs Fine-tune
Everyone assumed this question would be settled by now. It isn't.
The "just use the API" camp is still strong for good reason. Frontier model capabilities are moving fast enough that two months of fine-tuning work often gets leapfrogged by the next API release. The ROI math is genuinely hard to justify in a lot of cases.
But the "we need to own the model" camp has better arguments than it did a year ago:
- Latency and cost at production scale make frontier APIs painful for high-volume applications
- Data residency requirements are a real constraint for enterprise AI in India, not a theoretical one
- Fine-tuning smaller models on narrow, well-defined tasks is working better than most people expected
- The quality gap between open-source and frontier models has closed a lot
Camp A: Fine-tuning is a trap. You're optimising for yesterday's ceiling. Camp B: Prompt engineering your way to production reliability at scale is a fantasy. At some point you have to own what you ship.
The honest answer nobody says out loud is that it depends on your use case, your team's actual ML depth, the quality of your data, and how stable your requirements are over the next year. The people worth listening to on this are the ones who've shipped both and can tell you specifically where each one broke.
RAG: Simple vs Sophisticated
RAG went from exciting research concept to production standard to overengineered mess in about eighteen months. Classic AI build cycle.
The basic pipeline: chunk documents, embed them, retrieve relevant chunks at query time, put them in context. It works. Works well enough that most teams built it, shipped it, and called it done. Then they hit production and found out that "works" and "works reliably with real user queries at scale" are very different things.
What teams are actually arguing about
- Chunking strategy: fixed-size vs semantic vs hierarchical. Sounds boring. Produces massive quality differences in practice.
- Hybrid search: dense retrieval alone misses lexical matches that sparse search catches. Most teams are running hybrid now but the weighting is still largely tribal knowledge.
- Re-ranking: cross-encoder re-ranking after initial retrieval helps quality a lot. It also adds latency. Where do you draw the line?
- Agentic RAG: letting the model decide what to retrieve instead of a fixed pipeline. More capable, much harder to debug and control.
"Simple RAG is easy to build and hard to make good. Complex RAG is hard to build and hard to debug. Pick your suffering."
The teams shipping the best RAG in India right now share one thing: they've put serious work into evaluation infrastructure. They know exactly what their system gets wrong, how often, and why. That feedback loop is what separates production-grade from demo-grade. The architecture choices matter less than most people think.
The Agent Question
The agent debate is the loudest one in the Indian AI builder community right now, and part of why it's so noisy is that the term means completely different things to different people.
For some teams it's a ReAct loop where a model reasons and uses tools. For others it's multi-agent orchestration with specialised sub-agents. For others it's a marketing label on a function-calling wrapper.
The questions that actually matter:
- When does an agentic setup add real value versus just adding failure modes and latency?
- How do you evaluate agent behaviour at scale when outputs are non-deterministic?
- What's the right amount of human-in-the-loop for your specific risk profile?
- Which orchestration framework, if any, is actually worth the abstraction cost?
Most "agent" systems in production are not actually agentic in any real sense. They're deterministic pipelines with LLM steps dressed up with agent vocabulary. And that's fine. A reliable pipeline that does one thing well will beat a flaky agent trying to do everything, every time.
Observability: The Unglamorous Problem That Kills Products
The most underrated debate in the Indian AI builder community has nothing to do with models or architectures. It's about observability.
Traditional software observability (metrics, logs, traces) is necessary but nowhere near sufficient for AI systems. You can have perfect infrastructure monitoring and still have no idea why your LLM app is producing bad outputs for 8% of queries.
The teams that have shipped AI to production and kept it alive all have some version of the same practices:
- Log inputs, outputs, and intermediate steps for every request. Not sampled. Every request.
- Build eval datasets from real failure cases, not synthetic benchmarks
- Track output quality metrics over time alongside infrastructure metrics
- Set up human review for the edge cases automated eval misses
Nobody gets excited about this work. There's no interesting architecture to show off. It's just engineering discipline applied to a non-deterministic system. And it's what separates a demo that impresses a conference from a product that users actually trust.
The Indian Context
The cost constraint runs through all of these debates in the Indian context. By necessity, India's AI builders are more cost-conscious than teams at well-funded US startups. That constraint is producing real engineering creativity.
The teams doing the most interesting work on inference optimisation, on smaller fine-tuned models, on caching strategies that cut API costs by 60 to 70 percent: a lot of them are Indian teams who couldn't afford to build the expensive way and had to find the smarter way instead.
That's not a handicap. It's a forcing function that produces leaner, more defensible systems. The Indian AI stack coming out of this period is going to look different from the American one. More cost-opinionated, more pragmatic, and probably more resilient for it.
These are the conversations happening inside the Cabal. If you're deep in any of this, you should be in the room.