AI Is Getting Expensive. Here's How Developers Should Adapt.

📅 June 5, 2026📖 7 min read

Token costs are rising and rate limits are real. Here's how to build a sustainable AI workflow using smarter context management, agents.md files, and local models.

For many developers, AI has become as essential as an IDE.

Whether it's ChatGPT, Claude, Copilot, Cursor, or other AI assistants, these tools are now deeply integrated into everyday workflows. The problem is that as AI capabilities improve, usage limits and costs are becoming increasingly noticeable.

Many developers have already experienced it.

You're in the middle of solving a problem, refining an architecture, or debugging a complex issue when suddenly you hit a rate limit or message cap.

The challenge isn't that AI is becoming less useful. The challenge is learning how to get more value from every interaction.

Stop Treating AI Like a Chat App

One of the biggest reasons developers burn through tokens is excessive back-and-forth conversations. Instead of gradually feeding the AI context across ten messages, include everything upfront in a single well-structured prompt:

Project overview
Tech stack and constraints
Expected outcome
Relevant code snippets

A single well-structured prompt can often replace ten smaller messages. As AI usage becomes more expensive, prompt quality matters more than prompt quantity.

But doing this manually every time is tedious and error-prone. That's where agents.md comes in.

Use an agents.md File for Persistent Context

Most AI coding agents (including OpenCode, Cursor, and others) support a special file called agents.md (sometimes called AGENTS.md or CLAUDE.md depending on the tool). This file sits at the root of your project and is automatically injected into every session.

Instead of repeatedly explaining your project to the AI, write it once:

# Project: MyApp API

## Stack

- Node.js 22, TypeScript, Fastify
- PostgreSQL with Drizzle ORM
- Deployed on Railway

## Architecture

- REST API with JWT auth
- Services layer handles business logic
- Repositories layer handles all DB access

## Conventions

- Always use async/await, never callbacks
- Errors are thrown, not returned
- All endpoints must have Zod validation schemas

## Out of Scope

- Do not modify anything in /legacy
- Do not suggest switching to a different ORM

Now every prompt you send starts with full context — no wasted tokens re-explaining the stack, no hallucinated assumptions about your architecture.

Make Your agents.md Dynamic

A single static agents.md works well for small projects, but larger codebases have different contexts for different areas of the code. Loading everything upfront wastes tokens and clutters the model's context window.

The better approach is to have a lean root agents.md that only contains project-wide conventions, and then local agents.md files in subdirectories for module-specific context.

For example:

/agents.md              ← global conventions, stack overview
/src/api/agents.md      ← API-specific patterns, endpoint structure
/src/db/agents.md       ← DB schema, migration conventions
/src/auth/agents.md     ← Auth flow, token handling notes

When the agent is working inside /src/db/, it picks up the database-specific context automatically — without loading the API or auth context it doesn't need.

This keeps every session lean and focused. The model gets what's relevant, not everything at once.

Structure Your Prompts Like a Spec, Not a Chat

Even with a good agents.md, the way you frame individual prompts matters.

Instead of:

"Can you help me add pagination to my users endpoint?"

Try:

"Add cursor-based pagination to GET /users. Use the existing PaginatedResponse<T> type from src/types/pagination.ts. The cursor should encode the user's id. Follow the same pattern used in GET /posts."

You've given the model the type to use, the pattern to follow, and the field to cursor on. The response will be accurate on the first try instead of requiring three rounds of corrections — each of which burns more tokens.

Use Local Models for Smaller Tasks

Not every task requires the most advanced cloud model. Simple activities such as:

Refactoring a function
Generating unit tests
Explaining what a piece of code does
Writing inline documentation
Reviewing small files for obvious issues

can often be handled just as well by a local model running through Ollama or LM Studio, at zero cost per token.

With modern open-source models like qwen2.5-coder and deepseek-coder becoming increasingly capable, many routine tasks no longer require consuming expensive cloud-based tokens.

If you want to set up a local AI coding workflow, we've written a step-by-step guide on configuring OpenCode with Ollama and LM Studio. It covers both the free cloud option and the fully local setup.

Learn to Choose the Right Model for the Right Task

One emerging skill for developers is model selection.

Using the most powerful model for every task is like using a production Kubernetes cluster to host a static HTML page. It works — but it's rarely the most efficient option.

The goal isn't to use more AI. The goal is to use the right AI.

The Future May Be More Open Than Closed

For the last few years, the AI conversation has largely been dominated by proprietary models. That may not remain true forever.

Open-source models continue to improve rapidly. Smaller models are becoming more efficient. Mixture-of-Experts (MoE) architectures are reducing computational costs. On-device AI is becoming increasingly practical.

The future AI stack for developers may look like:

Cloud models for complex reasoning and large context tasks
Local models for day-to-day coding work
Specialized models for specific workflows (SQL, documentation, testing)
Hybrid systems that route tasks automatically based on complexity

Rather than relying entirely on a single provider, developers may soon have access to a toolbox of AI systems optimized for different use cases.

Efficiency Is Becoming a Competitive Advantage

The conversation around AI often focuses on model capabilities.

A more important question may be: "How efficiently can you work with the tools available?"

The developers who thrive over the next few years won't necessarily be those with the largest AI budgets. They'll be the ones who know how to combine cloud models, local models, reusable context via agents.md, and focused prompts into a sustainable workflow.

As token costs rise and usage limits become more common, efficient AI usage is becoming just as important as efficient code.

And that may be one of the most valuable developer skills of the next decade.

For more guides on developer tooling, self-hosted AI workflows, and modern software development, visit our blog at madishtech.com/blog.