Repo-Specific Post-Training
We adapt open coding models to your internal APIs, architecture boundaries, and engineering conventions using synthetic repo tasks.
Outcome
Learns your patterns
Enterprise AI Reliability
Noam Brown shared that coding agents helped him iterate faster, but they also made confident, repeated mistakes that required expert intervention to resolve.
In a poker-solver build, agent outputs looked plausible but were still wrong in key cases, showing how overconfidence becomes risky in specialized technical work.
Open sourceEnterprise teams are moving from copilots to agents, but generic agents break down on large private codebases. They miss hidden repo rules, generate plausible-but-wrong changes, and increase review and security burden. What's missing is a repo-aligned adaptation and verification layer that turns output into trusted, shippable changes.
Best fit
1,000+ engineer organizations
Codebase
Long-lived, high-coupling repositories
Deployment
VPC or on-prem control boundary
A three-layer system that adapts to your codebase, verifies every change, and deploys safely inside enterprise controls.
We adapt open coding models to your internal APIs, architecture boundaries, and engineering conventions using synthetic repo tasks.
Outcome
Learns your patterns
Every output is evaluated against CI, tests, quality checks, and security controls so trust is based on evidence, not optimism.
Outcome
Proof attached to each change
Deploy in customer VPC/on-prem with governance, audit logs, and phased rollout so adoption scales safely across teams.
Outcome
Governance built in
Aryabhata 1.0 proves that our process isn't just a theory. It is a 7B parameter model built on these exact principles which successfully outperformed the world’s most advanced frontier models in elite mathematical reasoning. (Aryabhata 1.0)
| Model | In-distribution | Out-of-distribution | |||
|---|---|---|---|---|---|
| JEE Main Jan 2025 Accuracy | JEE Main Apr 2025 Accuracy | Avg. Tokens per Response | MATH 500 | GSM8K | |
Aryabhata 1.0 | 86.0 | 90.2 | ~2K | 83.6 | 94.8 |
Gemini 2.5 Flash | 83.0 | 83.5 | ~1.5K | 93.6 | 85.1 |
GPT-4.1 | 75.0 | 80.0 | ~1.8K | 86.6 | 94.0 |
GPT-4o | 46.5 | 44.0 | <1K | 69.2 | 94.6 |
Bigger models do not always mean better, and prompting alone does not always work. Aryabhata 1.0 beat frontier models because we tuned the model for this exact use case.
The same process behind Aryabhata 1.0 can be applied to enterprise coding agents. We need to expose the agents to the workflow the same way we onboard human engineers and let the agents learn through feedback, not through prompts.
Bring us your repositories and workflows and we can discuss how custom agents can solve your specific challenges.
We are a team of passionate builders focused on making AI coding more reliable in real enterprise environments. Our goal is to improve efficacy of the models, and reduce the risk and friction of adoption for engineering teams.
Sachin has 10+ years of experience in training and build models using RL. Here are a few highlights over the decade:
Previously at
Rohith is a deep learning researcher and engineer with a track record of shipping SOTA models to production. Here are some highlights:
Previously at
Paper · 1/3
Our 7B parameter Aryabhata model outperforms OpenAI's O4 mini and Google's Gemini Flash 2.5 on mathematics benchmarks—designed to serve millions of students at scale.
Open paperSession on how reward signals and RL techniques are used to shape language-model reasoning behavior.
Open talkTechnical deep dive on experimental learning loops for reasoning model training and evaluation.
Open talkQuick answers on security, deployment, and measurable outcomes.
AthenaAgent is designed for private deployment in customer-controlled environments with strict access boundaries and auditable activity logs.
Every generated change runs through a verification harness that checks CI, tests, policy rules, and security tooling before it is trusted.
We start with a scoped workflow and expand gradually as measurable quality and cycle-time metrics demonstrate sustained gains.
Target outcomes include lower review burden, faster issue resolution, fewer regressions, and clearer governance over agent behavior.
Yes. AthenaAgent is designed to integrate with existing CI/CD, security scanners, and review workflows rather than replacing them.