The best folder structures for AI projects in 2025

1775164654

The best folder structures for AI projects in 2025

**Why your project structure changed — and what to do now** The original 2024 post showed two classic structures: Layered Modularization and Feature-based Modularization. Both are still valid. But with autonomous agents, RAG, function calling, and LLM pipelines in the stack, new folders have emerged that simply didn't exist before — and ignoring them is the most common mistake teams make when scaling AI products. --- ## Numbers that explain the shift - **73%** of new repositories opened in 2025 include some LLM component - **+440%** YoY growth in public repos with an `/agents` folder on GitHub - **77%** of teams in production use a unified LLM client in `shared/` to swap models without rewriting code - **84%** version prompts in git as code — no longer scattered in loose environment variables - **58%** run automated LLM quality evaluations in CI before any prompt deployment --- ## Structure 1 — Layered AI Monolith Ideal for small teams (1–5 people) where AI is a feature, not the core product. It's the direct evolution of the classic layered structure, with three mandatory additions: `agents/`, `prompts/`, and `pipelines/`. ``` project-root/ │ ├── config/ │ ├── database.ts │ ├── server.ts │ └── ai.ts ← API keys, models, token limits │ ├── src/ │ ├── controllers/ │ ├── models/ │ ├── routes/ │ ├── services/ │ │ └── llmService.ts ← wrapper for LLM calls (OpenAI/Anthropic/Gemini) │ │ │ ├── agents/ ← autonomous agents with memory and tools │ │ ├── plannerAgent.ts │ │ └── researchAgent.ts │ │ │ ├── tools/ ← functions the LLM can invoke (function calling) │ │ ├── searchTool.ts │ │ ├── dbQueryTool.ts │ │ └── index.ts │ │ │ ├── prompts/ ← prompts as code, versioned in git │ │ ├── system.ts │ │ ├── summarize.ts │ │ └── extract.ts │ │ │ ├── pipelines/ ← RAG flow: ingestion → chunking → embedding → retrieval │ │ ├── ingest.ts │ │ ├── embed.ts │ │ └── retrieve.ts │ │ │ └── utils/ │ └── tokenCounter.ts │ ├── data/ │ └── vectorstore/ ← local index (ChromaDB, FAISS, Weaviate) │ ├── tests/ │ ├── agents/ │ ├── pipelines/ │ └── evals/ ← LLM response quality evaluations │ └── index.ts ``` **When to use:** early-stage product, lean team, AI integrated into an existing backend. **Main limitation:** as modules grow, agents and pipelines get coupled to the app core, making isolated testing and per-feature model swapping harder. --- ## Structure 2 — Modular AI-Native (recommended for AI products) Each domain module encapsulates its own agents, tools, prompts, and pipelines. This is the pattern that distributed teams and fast-growing AI startups converged on in 2025. ``` project-root/ │ ├── config/ │ ├── ai.ts ← global model configuration │ └── db.ts │ ├── modules/ │ │ │ ├── auth/ │ │ ├── authController.ts │ │ ├── authService.ts │ │ └── authRoutes.ts │ │ │ ├── search/ ← semantic search + reranking │ │ ├── searchController.ts │ │ ├── embeddings.ts │ │ ├── retriever.ts │ │ └── prompts/ │ │ └── searchPrompt.ts │ │ │ ├── agents/ ← agent orchestration module │ │ ├── orchestrator.ts ← decides which agent to trigger and with what context │ │ ├── tools/ │ │ │ ├── webSearch.ts │ │ │ ├── codeRunner.ts │ │ │ └── fileReader.ts │ │ ├── memory/ │ │ │ ├── shortTerm.ts ← conversation context (token window) │ │ │ └── longTerm.ts ← per-user vector store │ │ └── prompts/ │ │ └── systemPrompt.ts │ │ │ ├── documents/ ← full RAG pipeline │ │ ├── ingest.ts │ │ ├── chunk.ts │ │ ├── embed.ts │ │ └── prompts/ │ │ └── qaPrompt.ts │ │ │ └── analytics/ │ ├── metricsService.ts │ └── llmUsage.ts ← tracks tokens, cost, and latency per module │ ├── shared/ │ ├── llm/ │ │ ├── client.ts ← unified client: swap OpenAI/Anthropic/local here │ │ └── retry.ts ← exponential backoff for API failures │ ├── vectordb/ │ │ └── client.ts ← abstraction over Pinecone/Weaviate/Chroma │ └── utils/ │ ├── tests/ │ ├── modules/ │ └── evals/ ← automated AI quality evaluations │ ├── faithfulness.ts ← is the answer grounded in the retrieved context? │ ├── relevance.ts │ └── datasets/ │ └── index.ts ``` **When to use:** product where AI is the core, team larger than 5, multiple domains with distinct AI behaviors. **Main advantage:** each module can use a different model, have its own prompts, and be tested in complete isolation. --- ## What changed from 2024 to 2025 <table> <thead> <tr> <th>Folder</th> <th>Status in 2024</th> <th>Status in 2025</th> </tr> </thead> <tbody> <tr> <td><code>controllers/</code></td> <td>required</td> <td>required</td> </tr> <tr> <td><code>models/</code></td> <td>required</td> <td>required</td> </tr> <tr> <td><code>services/</code></td> <td>required</td> <td>required</td> </tr> <tr> <td><code>agents/</code></td> <td>rare</td> <td>industry standard</td> </tr> <tr> <td><code>tools/</code></td> <td>nonexistent</td> <td>required in LLM projects</td> </tr> <tr> <td><code>prompts/</code></td> <td>environment variable</td> <td>versioned code in git</td> </tr> <tr> <td><code>pipelines/</code></td> <td>standalone script</td> <td>structured module</td> </tr> <tr> <td><code>evals/</code></td> <td>nonexistent</td> <td>modern equivalent of <code>tests/</code></td> </tr> <tr> <td><code>shared/llm/</code></td> <td>hardcoded in service</td> <td>mandatory abstraction</td> </tr> </tbody> </table> --- ## Why `prompts/` is a folder, not an environment variable Prompts affect product behavior just as much as any business function. When a prompt changes and response quality drops, you need to know exactly what changed, when, and by whom — the same thing you want to know about any other piece of code. Versioning prompts in git gives you: change history, review via pull request, immediate rollback on regression, and traceability in audits. Teams that treat prompts as code report 3x fewer incidents related to quality degradation in production (LLM in Prod Survey, Scale AI 2025). --- ## Why `evals/` is the new `tests/` Testing an LLM is not the same as testing a deterministic function. The same input can produce different outputs. What matters is whether the response is good enough — and that requires specific metrics: - **Faithfulness** — is the response grounded in the retrieved context, or is the model hallucinating? - **Answer relevancy** — does the response actually answer the question asked? - **Context precision** — are the chunks retrieved by RAG the most relevant ones available? Frameworks like DeepEval, Ragas, and PromptFoo automate these evaluations. Mature teams run evals in CI on every prompt or retrieval pipeline change, before any deployment. --- ## The three-contract rule Every AI module needs three explicit contracts in the code: 1. **Input/output contract** — the schema of what goes into the prompt and what is expected back, validated via Zod or Pydantic 2. **Fallback contract** — what happens if the LLM fails, returns invalid JSON, or exceeds the timeout 3. **Evaluation contract** — how to measure whether the response is good enough to go to production Without these three contracts, the AI module is a black box that fails silently. --- ## Adoption trends in 2025 - 84% — prompts versioned in git - 77% — unified LLM client in `shared/` - 71% — dedicated `/agents` folder - 69% — vector DB separate from the relational database - 58% — automated evals in CI - 52% — cost tracking per feature --- ## Which one to choose Use **Layered AI Monolith** if you're just starting out, the team is small, and AI is a secondary feature of the product. Use **Modular AI-Native** if AI is the product, the team will grow, or you need distinct AI behaviors per domain — each module with its own prompts, tools, and evaluations. In both cases, the most important shift is not which structure you pick — it's stopping to treat prompts, agents, and pipelines as "implementation details" and starting to treat them with the same rigor as any other production code. --- *Based on: State of AI Engineering 2025 (Pragmatic Engineer), GitHub Octoverse 2025, LLM in Production Survey (Scale AI), Anthropic Developer Docs. This post is an evolution of the original ["Two best folder structures for a web application"](https://chat-to.dev/post?id=N043SE1xbUZ2MzBBK01TWTl1cXVCQT09) published on chat-to.dev.*

(0) Comments

Welcome to Chat-to.dev, a space for both novice and experienced programmers to chat about programming and share code in their posts.