Agentic AI: From LLMs to Autonomous Agents with MCP and Docker

This blog explains how AI models are no able to use tools and automate tasks which seemed impossible a couple of years ago.

Dec 10, 2025 - 12:50
Dec 11, 2025 - 14:18
Agentic AI: From LLMs to Autonomous Agents with MCP and Docker

Introduction

LLMs are impressive but limited. They predict text. They don't run scanners. They can't query databases. They can't create tickets or call APIs. They're powerful pattern-matching engines trapped in a sandbox.

Agentic AI breaks out of that sandbox by wrapping LLMs in a decision loop where they can plan, choose tools, execute actions, and adapt based on results. Instead of "ChatGPT explains how to scan a network," you get "an agent that autonomously runs reconnaissance, triages results, and drafts reports."

But there's a critical problem: LLMs can't directly use tools. This is where MCP (Model Context Protocol) comes in-a standardized interface that safely wires LLMs to the tools and systems they need. Combined with Docker and the MCP Catalog, you can spin up production-grade agentic systems in minutes.

This comprehensive guide explains why direct LLM-to-tool integration fails, how MCP solves it, walks through hands-on demos, and shows you real-world security patterns.


Part 1: LLMs Are Amazing But Isolated

What LLMs Can Do

LLMs excel at:

  • Natural language understanding - Parse requests, infer intent, understand context
  • Pattern completion - Generate code, write configs, draft reports, complete sequences
  • Reasoning over descriptions - Analyze architectural diagrams, vulnerability assessments, compliance policies (all described in text)
  • Multi-step planning - Break complex tasks into sequences, identify dependencies

In practice:

  • "Explain SQL injection and how to test for it" → LLM provides a detailed tutorial
  • "Write a Python script to find open S3 buckets" → LLM generates code with explanation
  • "Summarize security implications of this design doc" → LLM produces analysis with risk matrix

These are useful, but they're all text-in, text-out. The LLM is a consultant; it advises but doesn't execute.

What LLMs Cannot Do (By Design)

Out of the box, LLMs have no ability to:

  • Execute commands - Can't run nmap, curl, shell scripts, system tools
  • Access external data - Can't query databases, APIs, or file systems in real-time
  • Maintain state - Each request is independent; no memory of previous tool outputs or session context
  • Interact with services - Can't authenticate to Jira, GitHub, cloud providers, or internal systems
  • Guarantee accuracy - Hallucinate plausible-sounding but false information

ℹ️ Note: This isn't a bug; it's a security feature. You don't want an LLM with direct shell access or cloud credentials baked in.

The Disconnect

Imagine a user says: "Scan 10.0.0.0/24 for open ports and tell me which hosts are running outdated Apache."

An LLM responds: "You should run `nmap -p- -sV 10.0.0.0/24`. Then parse the output looking for 'Apache/2.2' or older versions."

But the LLM can't actually run the scan. It can't see the results. It can't correlate findings with threat intel. It can't be an autonomous agent; it's just a consultant giving instructions.


Part 2: Why LLMs Can't Directly Use Tools

There are three fundamental problems if you naively wire tools directly to an LLM:

Problem 1: No Standard Interface

Every tool is different:

  • Nmap outputs XML, text, or JSON-parsing is fragile and inconsistent
  • Jira API uses REST with OAuth and complex query syntax requiring authentication
  • PostgreSQL requires SQL commands and connection pooling management
  • AWS CLI has hundreds of commands with conflicting flag semantics
  • Custom scripts have no standard calling convention or parameter format

If you hardcode each tool into the LLM's prompt, you get:

  • Brittle, ad-hoc JSON templates for each tool (scales poorly)
  • The LLM makes mistakes in formatting (missing quotes, wrong nesting, syntax errors)
  • Adding a new tool requires rewriting prompts and retraining LLM intuitions

This doesn't scale. You're forced to maintain brittle glue code for every tool.

Problem 2: No Dynamic Discovery

How does the LLM know what tools exist right now?

You could hardcode a list in the system prompt:

"You have access to these tools: nmap, jira_create_ticket, query_db, get_slack_messages"

But this breaks when:

  • You add a new tool (requires prompt engineering and model redeployment)
  • You use different tools in different environments (dev vs. staging vs. prod)
  • You want to grant different agents different permissions (one can scan, another can only read logs)
  • Tools are added/removed at runtime

The LLM has no way to discover capabilities dynamically or adapt to what's available. It's locked into a static prompt.

Problem 3: No Security Boundary

Letting the LLM directly invoke tools is dangerous:

  • Credential leakage - API keys, passwords, tokens end up in model context or logs
  • No access control - LLM could call destructive tools (delete files, drop databases, terminate instances)
  • No audit trail - Hard to track what the model did and why (compliance nightmare)
  • No rate limiting - Model could spam API calls, overwhelming services
  • Injection attacks - User input flows directly into tool invocation; easy to manipulate the model into calling unintended operations

⚠️ You need a controlled security boundary where you enforce:

  • Authentication and authorization
  • Logging and audit trails for compliance
  • Rate limits and resource caps
  • Input validation and sanitization

Part 3: The MCP Solution - A Standardized Protocol

Model Context Protocol (MCP) standardizes how LLM-based applications ("clients") discover and invoke capabilities from tool backends ("servers"). Think of it as a standardized plugin protocol for AI agents.

The Architecture

┌──────────────────────────┐
│   LLM / Agent Runtime    │
│   (Claude, your app)     │
└────────────┬─────────────┘
             │ MCP Client
             │ (Discovers tools, invokes them)
             │
    ┌────────┴────────────────────┐
    │                             │
┌───▼──────────────┐   ┌──────────▼────────┐
│  MCP Server #1   │   │  MCP Server #2    │
│  (Filesystem,    │   │  (GitHub,         │
│   Git, etc.)     │   │   Jira, etc.)     │
└──────────────────┘   └───────────────────┘

Key Concepts

MCP Server - Exposes specific tools (e.g., list_files, read_file, create_issue)

  • Runs as a separate process or container (isolated from LLM)
  • Implements your actual logic (APIs, CLIs, SDKs, databases)
  • Enforces auth, rate limits, and audit logging before execution

MCP Client - The host application that wants to use tools

  • Connects to one or more MCP servers via TCP/stdio
  • Queries "what tools are available?" via tools/list
  • Invokes tools via tools/call with structured parameters
  • Receives structured results back (JSON, not raw text)

Protocol - Standard operations:

  • tools/list - "Tell me what you can do" → returns tool names, descriptions, input/output schemas
  • tools/call - "Run this tool with these arguments" → returns result or error

Why MCP Solves the Three Problems

1. Standard Interface

All tools speak the same MCP language:

  • Tool descriptions follow a schema (name, parameters, return type)
  • All invocations use tools/call with JSON
  • All results are structured data (JSON, not raw text or variadic output)

So the LLM doesn't need special logic for each tool; it just learns: "when I need to do X, look for a tool with purpose X, and call it with these params."

2. Dynamic Discovery

At startup (or anytime), the MCP client queries each connected server:

Client: "What tools do you have?"
Server: [
  { name: "scan_host", params: ["target", "ports"], ... },
  { name: "parse_results", params: ["scan_data"], ... }
]

The LLM learns about tools at runtime. Different environments can wire different servers. No prompting needed. Add/remove tools on the fly.

3. Security Boundary

MCP servers are separate processes with their own:

  • Authentication - Server validates credentials before accepting calls
  • Authorization - Server checks RBAC policies (user/agent can call certain tools only)
  • Audit logging - Every tools/call is logged for compliance
  • Input validation - Server sanitizes arguments before execution
  • Rate limiting - Server throttles requests per agent/user
  • Isolation - Server runs in a container with restricted filesystem/network

The LLM never sees credentials or infra details; it just says "call tool X with arg Y," and the MCP server handles the rest safely.


Part 4: How the Loop Works - LLM + MCP in Action

Here's a concrete example that shows the entire agentic loop:

Scenario

User: "Find all open SSH ports in 10.0.0.0/24 and tell me which ones allow root login."

Behind the Scenes

  1. LLM receives the request and sees it needs scanning + credential testing
    MCP client has discovered tools: scan_network, check_ssh_auth, summarize_findings
  2. LLM plans the workflow:
    ○ Call scan_network(target="10.0.0.0/24", ports="22")
    ○ For each responsive host, call check_ssh_auth(host, user="root")
    ○ Aggregate results into summarize_findings(...)
  3. First tool call: LLM decides → MCP client routes it to the appropriate server:
    tools/call {
      tool: "scan_network",
      arguments: { target: "10.0.0.0/24", ports: "22" }
    }
  4. MCP server executes:
    ○ Validates authentication (agent has permission to scan)
    ○ Runs actual nmap inside a container
    ○ Parses results into structured JSON
    ○ Returns: { responsive_hosts: ["10.0.0.5", "10.0.0.17", ...], banners: {...} }
    ○ Logs: "Agent X called scan_network with args Y at timestamp Z"
  5. LLM observes results: Context now includes scan data. LLM continues with credential checks
  6. Repeat for each host, then call summarize_findings(...)
  7. Final output: LLM produces a human-readable report with findings and recommendations

Key insight: At no point does the LLM see credentials, execute shell commands, or access the network directly. Everything is mediated through MCP servers you control. This is the security magic of MCP.


Part 5: MCP + Docker - Practical Deployment

MCP servers can run anywhere, but Docker is ideal because it:

  • Isolates dependencies - Each server has its own Python, Node, CLI tools
  • Enforces resource limits - Cap CPU, memory, disk per server
  • Simplifies distribution - One image works on Linux/macOS/Windows/cloud
  • Enables Docker MCP Catalog - Prebuilt, signed, audited server images

What's the Docker MCP Catalog?

Docker, Anthropic, and others maintain a curated registry of MCP servers under the mcp/ namespace on Docker Hub.

Examples:

  • mcp/filesystem - Secure file operations (list, read, write within allowed paths)
  • mcp/git - Inspect Git repositories (branches, commits, diffs)
  • mcp/github - Interact with GitHub (list issues, create PRs, read repos)
  • mcp/postgres - Query PostgreSQL databases (with prepared statements)
  • mcp/elasticsearch - Search and analyze logs

Each image includes:

  • Tool definitions (schemas, descriptions, examples)
  • Audit logs / compliance guarantees
  • Version history and security updates

Part 6: Hands-On Demo - Setting Up an MCP Server with Docker

Let's walk through a concrete, minimal example that gets you running MCP quickly.

Step 1: Run a Prebuilt MCP Server from the Catalog

Start with the filesystem server as a safe, instructive example:

docker run --rm \
  -e MCP_ALLOWED_PATHS=/workspace \
  -p 3000:3000 \
  mcp/filesystem:latest

What this does:

  • Pulls the official mcp/filesystem image from Docker Hub
  • Grants the server access only to /workspace (sandboxed)
  • Exposes MCP server on localhost:3000
  • --rm cleans up the container when it exits

Inside the container, the MCP server starts and is ready to accept MCP client connections. It can expose tools like:

  • list_directory(path) - List files under /workspace
  • read_file(path) - Read a file's contents
  • write_file(path, contents) - Create/update a file
  • search_files(pattern) - Search for files matching a pattern

Step 2: Configure an MCP Client

On the "LLM side," you need an application that speaks MCP. The easiest is Claude Desktop:

Configuration file: ~/.claude/claude_desktop_config.json

Add entry for the MCP server:

{
  "mcpServers": {
    "filesystem": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3000
    }
  }
}

Restart Claude Desktop, and it will:

  • Connect to localhost:3000
  • Query tools/list and learn about list_directory, read_file, etc.
  • Make those tools available to Claude in the chat UI

Step 3: Try a Task

Once Claude Desktop (or your app) is connected:

You: "Audit the /workspace/config directory and tell me if there are any hardcoded credentials in the files."

Claude (LLM):

  1. Calls list_directory("/workspace/config")
  2. Sees files like database.conf, api.cfg, .env
  3. For each file, calls read_file(file_path)
  4. Scans content for patterns like password=, API_KEY=, secret:
  5. Generates a report with findings and recommendations

Result: You get an automated security audit with a natural-language report, all without the LLM touching your system directly.


Part 7: Building Your Own MCP Server

After testing prebuilt servers, you can build custom ones. Here's the conceptual structure:

Server Code (Python Example)

# mcp_server.py
from mcp.server import Server
from mcp.types import Tool, TextContent
import subprocess

server = Server("nmap-scanner")

@server.tool()
def scan_host(target: str, ports: str = "1-1000") -> str:
    """Run nmap scan on a target"""
    # Validate inputs (prevent injection)
    if not is_valid_ip(target):
        return f"Error: invalid target {target}"
    
    # Execute nmap in container
    result = subprocess.run(
        ["nmap", "-p", ports, "-sV", target],
        capture_output=True,
        text=True
    )
    
    # Parse and return structured result
    return parse_nmap_output(result.stdout)

# Start MCP server (listens on TCP 3000)
server.run()

Dockerfile

FROM python:3.12-slim

WORKDIR /app

# Install nmap and other tools
RUN apt-get update && apt-get install -y nmap && rm -rf /var/lib/apt/lists/*

# Copy server code
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY mcp_server.py .

# Expose MCP port
EXPOSE 3000

CMD ["python", "mcp_server.py"]

Build and Run

# Build
docker build -t my-nmap-mcp-server .

# Run with resource limits
docker run --rm \
  -p 3000:3000 \
  --memory="512m" \
  --cpus="1" \
  my-nmap-mcp-server

# Now connect an MCP client to localhost:3000

Part 8: Composing Multiple MCP Servers

Real-world agents often need multiple tools from different domains. You can compose them easily:

Multi-Server Setup

# Terminal 1: Filesystem server
docker run -p 3000:3000 mcp/filesystem:latest

# Terminal 2: GitHub server
docker run -p 3001:3001 mcp/github:latest

# Terminal 3: PostgreSQL server
docker run -p 3002:3002 mcp/postgres:latest \
  -e DATABASE_URL=postgresql://user:[email protected]/mydb

Configure Client to Connect to All

Claude Desktop config:

{
  "mcpServers": {
    "filesystem": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3000
    },
    "github": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3001
    },
    "database": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3002
    }
  }
}

Now Claude can orchestrate across all tools:

  • Query the database for vulnerability scan results
  • Read the GitHub repo to get the codebase
  • Write a detailed security report to the filesystem

All orchestrated by the LLM, safely isolated via MCP.


Part 9: Security & Guardrails

Agentic AI is powerful but risky without constraints. Here's how MCP helps:

1. Least Privilege Tool Design

Don't expose raw bash or exec. Instead, expose specific, bounded operations:

Dangerous: ❌

@server.tool()
def run_command(cmd: str):
    return subprocess.run(cmd, shell=True, capture_output=True)

Safe: ✅ 

@server.tool()
def scan_host(target: str, ports: str):
    # Whitelist inputs, run only nmap
    validate_ip(target)
    validate_ports(ports)
    return subprocess.run(["nmap", "-p", ports, target], ...)

2. Audit Logging

Every tools/call should be logged:

@server.middleware()
def audit_log(tool_name, arguments, result):
    log.info({
        "timestamp": now(),
        "agent_id": context.agent_id,
        "tool": tool_name,
        "args": arguments,
        "status": "success" or "failure"
    })

3. Rate Limiting

Prevent tool abuse:

@rate_limit(calls=10, period=60)  # 10 calls per minute
@server.tool()
def expensive_scan(target):
    ...

4. Container Isolation

Run MCP servers with resource limits:

docker run --rm \
  --memory="256m" \
  --cpus="0.5" \
  --read-only \
  --cap-drop=ALL \
  my-mcp-server

5. Input Validation

Sanitize all arguments:

@server.tool()
def query_db(sql: str):
    # Prevent SQL injection
    if not is_safe_sql(sql):
        return "Error: query rejected for safety"
    return db.execute(sql)

Part 10: Real-World Patterns for Security Teams

Pattern 1: Vulnerability Triage Agent

MCP servers:

  • vulnerability_scanner - Run Nessus, OpenVAS, or Trivy
  • asset_database - Query CMDB for host metadata
  • threat_intelligence - Look up CVE severity and exploitability
  • ticketing - Create Jira tickets for findings

Workflow:

  1. Agent runs scan → gets list of vulns
  2. Enriches with asset metadata (owner, environment, criticality)
  3. Checks threat intel (is CVE actively exploited?)
  4. Prioritizes and creates tickets with context
  5. Sends Slack notification with summary

Pattern 2: Log Analysis Agent

MCP servers:

  • siem_query - Query Splunk / ELK for logs
  • alert_api - Fetch alerts from your SOAR
  • ioc_lookup - Check if IPs/domains are known malicious
  • case_management - Create/update incident cases

Workflow:

  1. Agent polls for new alerts
  2. Queries SIEM for related logs
  3. Correlates events to identify attack patterns
  4. Checks IoC databases
  5. Drafts incident summary and escalation recommendation

Pattern 3: Compliance Checker Agent

MCP servers:

  • source_code_repo - Clone/inspect Git repos
  • config_scanner - Parse cloud configs (Terraform, CloudFormation)
  • policy_engine - Check against compliance policies
  • report_generator - Produce audit reports

Workflow:

  1. Agent scans repo for hardcoded secrets
  2. Checks IAM policies for least-privilege violations
  3. Verifies encryption at rest/transit
  4. Generates compliance report with remediation steps

Conclusion

LLMs alone are consultants; agentic AI with MCP makes them autonomous workers.

The journey:

  • LLMs: "Here's how to solve this"
  • Agentic AI: "I can help you solve this"
  • MCP: The safe, standardized bridge between LLMs and your tools
  • Docker: Makes MCP servers portable, scalable, and isolated

For security teams, this opens new possibilities:

  • Automate repetitive analysis (triage, enrichment, reporting)
  • Reduce time-to-insight by 10-100
  • Standardize investigation playbooks via agents
  • Scale security operations without hiring proportionally

The key is control: MCP keeps the LLM in a sandbox, your tools safe, and the boundary clear.

Start small (test with filesystem or GitHub servers), understand the pattern, then wire up real security tools. In a few Docker commands and config changes, you'll have an agent doing your team's routine work.

The future of cybersecurity isn't more humans staring at dashboards-it's agents doing the legwork, humans making decisions.


References & Further Reading:

Ready to start? Download Claude Desktop, run your first MCP server, and begin automating security workflows. The future of cybersecurity is autonomous agents-make it happen.

flatline I am a cybersecurity enthusiast and an ethical hacker who's passionate about technology, loves to learn and now here I am sharing what I've learned so far. I have been tinkering around with these digital thingies for a long time. I have experience in Information Security, Computer & Network Administration, Operating Systems, Web Development, Programming, Graphic Designing, Video Editing and so on. I've always been keen to know that how things work, most importantly how they break!!