Agentic Engineering: Building Human-AI Synergy

Agentic Engineering is about combining human craftsmanship with AI tools to build better software. Quality remains the engineer's responsibility, but the craft now includes learning to work effectively with stochastic tools, mastering rapid feedback loops, parallel agent conversations, and efficient review of AI-suggested changes.

The Harness-First Approach

Large language models can feel incredibly smart one moment and frustratingly unreliable the next. The real issue isn’t the model’s intelligence, it’s everything around it. Breaking down complex tasks, managing state, checking key steps, and recovering from mistakes are what actually decide whether an AI system works consistently.

Here’s how to fix it with three practical layers:

1. Prompt Engineering

Instead of just giving orders, you shape a clear “probability space” so the model focuses on the right ideas and produces useful output. It’s about making sure the model truly understands what you want.

The catch? It hits a wall fast. Great prompts can’t fix missing knowledge like internal docs, latest product configs, code styles, or complicated multi-tool workflows. In short, it improves communication, not the underlying information or long-term execution.

2. Context Engineering

This covers multi-turn chats, tool calls, updating from partial results, handling feedback, and keeping track of history and state. It’s about feeding the model the right information at the right time.

Classic examples include RAG (retrieval-augmented generation), memory injection, and progressive disclosure in agents, start with the big picture, then reveal details only when needed. Agent Skills can be used to implement progressive disclosure by first providing high-level information, then progressively supplying deeper, more detailed guidance based on context and requirements, in a layered and on-demand manner.

Like Prompt Engineering, it sharpens the input and thinking process, but it still doesn’t supervise the full execution or handle recovery when things drift.

3. Harness Engineering

You define clear “done” criteria, use checklists at every checkpoint, document changes, and automatically reset agents before they wander off-track. It’s about making the model perform reliably over long, repeated tasks.

Popular setups include Planner + Generator + Evaluator agents, grading rubrics, real testing loops, observability for self-diagnosis, and strong recovery rules.

Think of it as layered scaffolding: context management, tooling, execution flow, state/memory, evaluation, and constraints/recovery. This is the part OpenAI’s approach and LangChain’s Deep Agents have been quietly perfecting.

What's Next: Self-Evolving Harness

In the long run, as Anthropic has pointed out, the scaffolding around the model will matter less. Smarter models will solve many of these problems by default, and much of this harness will become built-in. Until that day arrives, building a solid harness remains the smartest way to ship dependable AI applications.

Expanded Flexible Approaches

Here are the flexible approaches I apply in practice, adapting them to the specific scenario rather than following a rigid sequence:

Spec Engineering

Use lightweight specifications to foster mutual understanding between human and AI. Tools have matured significantly (e.g., Kiro, GitHub spec-kit, or Cursor's plan feature). There is no need to adhere strictly to any particular SDD format. You can start with Cursor's planning mode for initial ideas. It asks clarifying questions, keeps things lightweight, and produces predictable, efficient results. When you already understand your codebase well, instruct the IDE to plan ahead, answer clarification questions, and generate a concise Markdown file. This "mini-spec" is single-purpose, highly readable, and easy to refine. It is currently the most effective way to promote shared understanding.

Knowledge Engineering

Combat hallucinations through a combination of techniques and tools, including AGENTS.md for project-specific instructions, defining Agent Skills (with overviews and modular capabilities), proper memory handling, Model Context Protocol (MCP), llms.txt for LLM-friendly website documentation, frameworks like LangChain for function calls and tools, structured semantic schemas, IDE-based context selection tools, and context compression techniques. These efforts collectively ensure the AI receives accurate, relevant, and efficiently managed information.

Skills Engineering

To make AI follow your preferred style and existing practices, the best method is to let it learn from clear examples. Similar to the constitution file in SDD, create an AGENTS.md that documents key project rules, style guides, and coding conventions. Modern LLMs and IDEs already index the codebase effectively. To further improve accuracy, provide one well-defined example, such as the structure of a oRPC endpoint and its interactions with subsystems. The AI can then extend new features consistently with existing patterns.

Vercel's Agent Skills takes this further by packaging domain expertise into installable skill sets. Their react-best-practices skill, for example, encapsulates 10+ years of React optimization knowledge (40+ rules across 8 categories) that agents can reference when reviewing code or suggesting fixes.

Prototype Engineering

Adopt a Prototype-first approach using specialized generation tools to isolate scope and enable early preview and testing. For UI-related features, start in tools like v0.dev, Lovable, or Pencil, iterate on required business logic until stable, then integrate. This avoids polluting the main codebase and allows verification of assumptions in a preview-friendly environment without full hot-reload cycles. For a more connected workflow, Paper provides an HTML/CSS canvas where designers and agents collaborate on the same surface — syncing design tokens, styles, and components between the canvas and codebase, eliminating the traditional design-to-code translation gap. Google's Stitch takes a different angle by generating full-stack prototypes directly from design mockups and natural language, producing deployable code that can be iterated on immediately. Integration of refined generated code is now straightforward. I have found this extremely effective, for example, when building and validating a seating plan system or a full document management system with all relevant views and business cases before merging.

Schema Engineering

Favor declarative over imperative approaches to UI generation. When AI produces components through structured schemas rather than freeform code, the output becomes predictable, serializable, and cross-platform by default. Tools like json-render constrain AI output to a predefined component catalog with guarded actions, rendering progressively as JSON streams from the model. Google's A2UI protocol takes this further as an open standard: agents emit declarative component descriptions that clients render using their own native widgets across Angular, Flutter, React, or mobile — no executable code crosses trust boundaries. CopilotKit's AG-UI approaches this from the agent side, defining a streaming event protocol that lets any AI agent push UI updates, tool calls, and state changes to any frontend in real time. Microsoft's Adaptive Cards takes the platform-neutral route: a single JSON schema renders natively across Teams, Outlook, Windows, and web — no custom rendering code required. Tambo bridges this into existing React applications by letting developers register their own components so agents render real UI with proper styling and logic, not generic markup.

The foundation underneath is type-safe schema definition: Zod for runtime validation, openapi-typescript for generating typed clients from API contracts, and tools like Drizzle or Prisma for database schema inference. When component structure, props, state transitions, and data bindings are all expressed as schemas, the entire frontend becomes a serializable artifact that both humans and AI can reason about, diff, and validate. State handling shifts from scattered imperative hooks to declared bindings within the component tree. This is where agentic UI generation becomes truly reliable: the AI never invents components or behaviors outside the catalog, and every interaction follows a contract the developer controls.

Workflow Engineering

Use orchestration tools to enable automation, context sharing, sub-agent assignment, and autonomous decision-making. These typically follow a Perception-Reasoning-Action (PRA) cycle and are evolving toward multi-agent systems.

For code-first approaches, developers can leverage frameworks like AI SDK, agent-browser, LangGraph.js, Mastra, Motia, or AutoGen to build custom agent workflows with full control. For no-code or low-code solutions, platforms like n8n, Dify, and Lindy enable rapid workflow automation accessible to broader teams. Enterprise-grade options include CrewAI for multi-agent orchestration and IBM watsonx Agents for domain-specific deployments. Infrastructure tools like Klavis provide MCP server management and tooling to connect agents with external services at scale, while OpenRouter offers a unified API for accessing multiple LLM providers with automatic fallbacks and cost optimization.

Agents Engineering

Beyond workflow orchestration lies the emerging practice of building personal AI infrastructure: systems that know you, remember your context, and operate continuously on your behalf. This goes beyond using AI assistants to actually engineering persistent, personalized AI systems.

Projects like Personal AI Infrastructure (PAI) provide frameworks for building goal-oriented AI systems with persistent memory, custom skills, and continuous learning. PAI emphasizes user-centricity over tooling, where the system captures signals from every interaction to improve over time. Similarly, OpenClaw enables AI agents to execute real tasks through messaging platforms like WhatsApp and Telegram, running 24/7 on local hardware or in the cloud.

Infrastructure-wise, this space is evolving rapidly and not yet mature. However, developers can already start building by leveraging their own hardware (PC or Mac mini) for local execution, implementing memory systems for context persistence, addressing security and permission handling, and exploring sandboxing for safe agent execution.

Most importantly, build around actual use cases and seek repeatable patterns. The goal is not to build infrastructure for its own sake, but to solve real problems: automating email triage, managing schedules, processing recurring reports, or coordinating complex workflows. Start with a specific need, validate it works, then generalize.

The Future

The future of agentic engineering lies in flexibility and balance: leveraging AI to accelerate progress while preserving human judgment, accountability, and empathy. As tools continue to evolve, staying adaptable and scenario-driven will be key to achieving the best outcomes.