Agentic AI: From LLMs to Autonomous Agents with MCP and Docker

This blog explains how AI models are no able to use tools and automate tasks which seemed impossible a couple of years ago.

flatline

Dec 10, 2025 - 12:50

Dec 11, 2025 - 14:18

Agentic AI: From LLMs to Autonomous Agents with MCP and Docker

Introduction

LLMs are impressive but limited. They predict text. They don't run scanners. They can't query databases. They can't create tickets or call APIs. They're powerful pattern-matching engines trapped in a sandbox.

Agentic AI breaks out of that sandbox by wrapping LLMs in a decision loop where they can plan, choose tools, execute actions, and adapt based on results. Instead of "ChatGPT explains how to scan a network," you get "an agent that autonomously runs reconnaissance, triages results, and drafts reports."

But there's a critical problem: LLMs can't directly use tools. This is where MCP (Model Context Protocol) comes in-a standardized interface that safely wires LLMs to the tools and systems they need. Combined with Docker and the MCP Catalog, you can spin up production-grade agentic systems in minutes.

This comprehensive guide explains why direct LLM-to-tool integration fails, how MCP solves it, walks through hands-on demos, and shows you real-world security patterns.

Part 1: LLMs Are Amazing But Isolated

What LLMs Can Do

LLMs excel at:

Natural language understanding - Parse requests, infer intent, understand context
Pattern completion - Generate code, write configs, draft reports, complete sequences
Reasoning over descriptions - Analyze architectural diagrams, vulnerability assessments, compliance policies (all described in text)
Multi-step planning - Break complex tasks into sequences, identify dependencies

In practice:

"Explain SQL injection and how to test for it" → LLM provides a detailed tutorial
"Write a Python script to find open S3 buckets" → LLM generates code with explanation
"Summarize security implications of this design doc" → LLM produces analysis with risk matrix

These are useful, but they're all text-in, text-out. The LLM is a consultant; it advises but doesn't execute.

What LLMs Cannot Do (By Design)

Out of the box, LLMs have no ability to:

Execute commands - Can't run nmap, curl, shell scripts, system tools
Access external data - Can't query databases, APIs, or file systems in real-time
Maintain state - Each request is independent; no memory of previous tool outputs or session context
Interact with services - Can't authenticate to Jira, GitHub, cloud providers, or internal systems
Guarantee accuracy - Hallucinate plausible-sounding but false information

ℹ️ Note: This isn't a bug; it's a security feature. You don't want an LLM with direct shell access or cloud credentials baked in.

The Disconnect

Imagine a user says: "Scan 10.0.0.0/24 for open ports and tell me which hosts are running outdated Apache."

An LLM responds: "You should run `nmap -p- -sV 10.0.0.0/24`. Then parse the output looking for 'Apache/2.2' or older versions."

But the LLM can't actually run the scan. It can't see the results. It can't correlate findings with threat intel. It can't be an autonomous agent; it's just a consultant giving instructions.

Part 2: Why LLMs Can't Directly Use Tools

There are three fundamental problems if you naively wire tools directly to an LLM:

Problem 1: No Standard Interface

Every tool is different:

Nmap outputs XML, text, or JSON-parsing is fragile and inconsistent
Jira API uses REST with OAuth and complex query syntax requiring authentication
PostgreSQL requires SQL commands and connection pooling management
AWS CLI has hundreds of commands with conflicting flag semantics
Custom scripts have no standard calling convention or parameter format

If you hardcode each tool into the LLM's prompt, you get:

Brittle, ad-hoc JSON templates for each tool (scales poorly)
The LLM makes mistakes in formatting (missing quotes, wrong nesting, syntax errors)
Adding a new tool requires rewriting prompts and retraining LLM intuitions

This doesn't scale. You're forced to maintain brittle glue code for every tool.

Problem 2: No Dynamic Discovery

How does the LLM know what tools exist right now?

You could hardcode a list in the system prompt:

"You have access to these tools: nmap, jira_create_ticket, query_db, get_slack_messages"

But this breaks when:

You add a new tool (requires prompt engineering and model redeployment)
You use different tools in different environments (dev vs. staging vs. prod)
You want to grant different agents different permissions (one can scan, another can only read logs)
Tools are added/removed at runtime

The LLM has no way to discover capabilities dynamically or adapt to what's available. It's locked into a static prompt.

Problem 3: No Security Boundary

Letting the LLM directly invoke tools is dangerous:

Credential leakage - API keys, passwords, tokens end up in model context or logs
No access control - LLM could call destructive tools (delete files, drop databases, terminate instances)
No audit trail - Hard to track what the model did and why (compliance nightmare)
No rate limiting - Model could spam API calls, overwhelming services
Injection attacks - User input flows directly into tool invocation; easy to manipulate the model into calling unintended operations

⚠️ You need a controlled security boundary where you enforce:

Authentication and authorization
Logging and audit trails for compliance
Rate limits and resource caps
Input validation and sanitization

Part 3: The MCP Solution - A Standardized Protocol

Model Context Protocol (MCP) standardizes how LLM-based applications ("clients") discover and invoke capabilities from tool backends ("servers"). Think of it as a standardized plugin protocol for AI agents.

The Architecture

┌──────────────────────────┐
│   LLM / Agent Runtime    │
│   (Claude, your app)     │
└────────────┬─────────────┘
             │ MCP Client
             │ (Discovers tools, invokes them)
             │
    ┌────────┴────────────────────┐
    │                             │
┌───▼──────────────┐   ┌──────────▼────────┐
│  MCP Server #1   │   │  MCP Server #2    │
│  (Filesystem,    │   │  (GitHub,         │
│   Git, etc.)     │   │   Jira, etc.)     │
└──────────────────┘   └───────────────────┘

Key Concepts

MCP Server - Exposes specific tools (e.g., list_files, read_file, create_issue)

Runs as a separate process or container (isolated from LLM)
Implements your actual logic (APIs, CLIs, SDKs, databases)
Enforces auth, rate limits, and audit logging before execution

MCP Client - The host application that wants to use tools

Connects to one or more MCP servers via TCP/stdio
Queries "what tools are available?" via tools/list
Invokes tools via tools/call with structured parameters
Receives structured results back (JSON, not raw text)

Protocol - Standard operations:

tools/list - "Tell me what you can do" → returns tool names, descriptions, input/output schemas
tools/call - "Run this tool with these arguments" → returns result or error

Why MCP Solves the Three Problems

1. Standard Interface

All tools speak the same MCP language:

Tool descriptions follow a schema (name, parameters, return type)
All invocations use tools/call with JSON
All results are structured data (JSON, not raw text or variadic output)

So the LLM doesn't need special logic for each tool; it just learns: "when I need to do X, look for a tool with purpose X, and call it with these params."

2. Dynamic Discovery

At startup (or anytime), the MCP client queries each connected server:

Client: "What tools do you have?"
Server: [
  { name: "scan_host", params: ["target", "ports"], ... },
  { name: "parse_results", params: ["scan_data"], ... }
]

The LLM learns about tools at runtime. Different environments can wire different servers. No prompting needed. Add/remove tools on the fly.

3. Security Boundary

MCP servers are separate processes with their own:

Authentication - Server validates credentials before accepting calls
Authorization - Server checks RBAC policies (user/agent can call certain tools only)
Audit logging - Every tools/call is logged for compliance
Input validation - Server sanitizes arguments before execution
Rate limiting - Server throttles requests per agent/user
Isolation - Server runs in a container with restricted filesystem/network

The LLM never sees credentials or infra details; it just says "call tool X with arg Y," and the MCP server handles the rest safely.

Part 4: How the Loop Works - LLM + MCP in Action

Here's a concrete example that shows the entire agentic loop:

Scenario

User: "Find all open SSH ports in 10.0.0.0/24 and tell me which ones allow root login."

Behind the Scenes

LLM receives the request and sees it needs scanning + credential testing
MCP client has discovered tools: scan_network, check_ssh_auth, summarize_findings
LLM plans the workflow:
○ Call scan_network(target="10.0.0.0/24", ports="22")
○ For each responsive host, call check_ssh_auth(host, user="root")
○ Aggregate results into summarize_findings(...)

First tool call: LLM decides → MCP client routes it to the appropriate server:

tools/call {
  tool: "scan_network",
  arguments: { target: "10.0.0.0/24", ports: "22" }
}

MCP server executes:
○ Validates authentication (agent has permission to scan)
○ Runs actual nmap inside a container
○ Parses results into structured JSON
○ Returns: { responsive_hosts: ["10.0.0.5", "10.0.0.17", ...], banners: {...} }
○ Logs: "Agent X called scan_network with args Y at timestamp Z"
LLM observes results: Context now includes scan data. LLM continues with credential checks
Repeat for each host, then call summarize_findings(...)
Final output: LLM produces a human-readable report with findings and recommendations

Key insight: At no point does the LLM see credentials, execute shell commands, or access the network directly. Everything is mediated through MCP servers you control. This is the security magic of MCP.

Part 5: MCP + Docker - Practical Deployment

MCP servers can run anywhere, but Docker is ideal because it:

Isolates dependencies - Each server has its own Python, Node, CLI tools
Enforces resource limits - Cap CPU, memory, disk per server
Simplifies distribution - One image works on Linux/macOS/Windows/cloud
Enables Docker MCP Catalog - Prebuilt, signed, audited server images

What's the Docker MCP Catalog?

Docker, Anthropic, and others maintain a curated registry of MCP servers under the mcp/ namespace on Docker Hub.

Examples:

mcp/filesystem - Secure file operations (list, read, write within allowed paths)
mcp/git - Inspect Git repositories (branches, commits, diffs)
mcp/github - Interact with GitHub (list issues, create PRs, read repos)
mcp/postgres - Query PostgreSQL databases (with prepared statements)
mcp/elasticsearch - Search and analyze logs

Each image includes:

Tool definitions (schemas, descriptions, examples)
Audit logs / compliance guarantees
Version history and security updates

Part 6: Hands-On Demo - Setting Up an MCP Server with Docker

Let's walk through a concrete, minimal example that gets you running MCP quickly.

Step 1: Run a Prebuilt MCP Server from the Catalog

Start with the filesystem server as a safe, instructive example:

docker run --rm \
  -e MCP_ALLOWED_PATHS=/workspace \
  -p 3000:3000 \
  mcp/filesystem:latest

What this does:

Pulls the official mcp/filesystem image from Docker Hub
Grants the server access only to /workspace (sandboxed)
Exposes MCP server on localhost:3000
--rm cleans up the container when it exits

Inside the container, the MCP server starts and is ready to accept MCP client connections. It can expose tools like:

list_directory(path) - List files under /workspace
read_file(path) - Read a file's contents
write_file(path, contents) - Create/update a file
search_files(pattern) - Search for files matching a pattern

Step 2: Configure an MCP Client

On the "LLM side," you need an application that speaks MCP. The easiest is Claude Desktop:

Configuration file: ~/.claude/claude_desktop_config.json

Add entry for the MCP server:

{
  "mcpServers": {
    "filesystem": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3000
    }
  }
}

Restart Claude Desktop, and it will:

Connect to localhost:3000
Query tools/list and learn about list_directory, read_file, etc.
Make those tools available to Claude in the chat UI

Step 3: Try a Task

Once Claude Desktop (or your app) is connected:

You: "Audit the /workspace/config directory and tell me if there are any hardcoded credentials in the files."

Claude (LLM):

Calls list_directory("/workspace/config")
Sees files like database.conf, api.cfg, .env
For each file, calls read_file(file_path)
Scans content for patterns like password=, API_KEY=, secret:
Generates a report with findings and recommendations

Result: You get an automated security audit with a natural-language report, all without the LLM touching your system directly.

Part 7: Building Your Own MCP Server

After testing prebuilt servers, you can build custom ones. Here's the conceptual structure:

Server Code (Python Example)

# mcp_server.py
from mcp.server import Server
from mcp.types import Tool, TextContent
import subprocess

server = Server("nmap-scanner")

@server.tool()
def scan_host(target: str, ports: str = "1-1000") -> str:
    """Run nmap scan on a target"""
    # Validate inputs (prevent injection)
    if not is_valid_ip(target):
        return f"Error: invalid target {target}"
    
    # Execute nmap in container
    result = subprocess.run(
        ["nmap", "-p", ports, "-sV", target],
        capture_output=True,
        text=True
    )
    
    # Parse and return structured result
    return parse_nmap_output(result.stdout)

# Start MCP server (listens on TCP 3000)
server.run()

Dockerfile

FROM python:3.12-slim

WORKDIR /app

# Install nmap and other tools
RUN apt-get update && apt-get install -y nmap && rm -rf /var/lib/apt/lists/*

# Copy server code
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY mcp_server.py .

# Expose MCP port
EXPOSE 3000

CMD ["python", "mcp_server.py"]

Build and Run

# Build
docker build -t my-nmap-mcp-server .

# Run with resource limits
docker run --rm \
  -p 3000:3000 \
  --memory="512m" \
  --cpus="1" \
  my-nmap-mcp-server

# Now connect an MCP client to localhost:3000

Part 8: Composing Multiple MCP Servers

Real-world agents often need multiple tools from different domains. You can compose them easily:

Multi-Server Setup

# Terminal 1: Filesystem server
docker run -p 3000:3000 mcp/filesystem:latest

# Terminal 2: GitHub server
docker run -p 3001:3001 mcp/github:latest

# Terminal 3: PostgreSQL server
docker run -p 3002:3002 mcp/postgres:latest \
  -e DATABASE_URL=postgresql://user:[email protected]/mydb

Configure Client to Connect to All

Claude Desktop config:

{
  "mcpServers": {
    "filesystem": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3000
    },
    "github": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3001
    },
    "database": {
      "transport": "tcp",
      "host": "localhost",
      "port": 3002
    }
  }
}

Now Claude can orchestrate across all tools:

Query the database for vulnerability scan results
Read the GitHub repo to get the codebase
Write a detailed security report to the filesystem

All orchestrated by the LLM, safely isolated via MCP.

Part 9: Security & Guardrails

Agentic AI is powerful but risky without constraints. Here's how MCP helps:

1. Least Privilege Tool Design

Don't expose raw bash or exec. Instead, expose specific, bounded operations:

Dangerous: ❌

@server.tool()
def run_command(cmd: str):
    return subprocess.run(cmd, shell=True, capture_output=True)

Safe: ✅

@server.tool()
def scan_host(target: str, ports: str):
    # Whitelist inputs, run only nmap
    validate_ip(target)
    validate_ports(ports)
    return subprocess.run(["nmap", "-p", ports, target], ...)

2. Audit Logging

Every tools/call should be logged:

@server.middleware()
def audit_log(tool_name, arguments, result):
    log.info({
        "timestamp": now(),
        "agent_id": context.agent_id,
        "tool": tool_name,
        "args": arguments,
        "status": "success" or "failure"
    })

3. Rate Limiting

Prevent tool abuse:

@rate_limit(calls=10, period=60)  # 10 calls per minute
@server.tool()
def expensive_scan(target):
    ...

4. Container Isolation

Run MCP servers with resource limits:

docker run --rm \
  --memory="256m" \
  --cpus="0.5" \
  --read-only \
  --cap-drop=ALL \
  my-mcp-server

5. Input Validation

Sanitize all arguments:

@server.tool()
def query_db(sql: str):
    # Prevent SQL injection
    if not is_safe_sql(sql):
        return "Error: query rejected for safety"
    return db.execute(sql)

Part 10: Real-World Patterns for Security Teams

Pattern 1: Vulnerability Triage Agent

MCP servers:

vulnerability_scanner - Run Nessus, OpenVAS, or Trivy
asset_database - Query CMDB for host metadata
threat_intelligence - Look up CVE severity and exploitability
ticketing - Create Jira tickets for findings

Workflow:

Agent runs scan → gets list of vulns
Enriches with asset metadata (owner, environment, criticality)
Checks threat intel (is CVE actively exploited?)
Prioritizes and creates tickets with context
Sends Slack notification with summary

Pattern 2: Log Analysis Agent

MCP servers:

siem_query - Query Splunk / ELK for logs
alert_api - Fetch alerts from your SOAR
ioc_lookup - Check if IPs/domains are known malicious
case_management - Create/update incident cases

Workflow:

Agent polls for new alerts
Queries SIEM for related logs
Correlates events to identify attack patterns
Checks IoC databases
Drafts incident summary and escalation recommendation

Pattern 3: Compliance Checker Agent

MCP servers:

source_code_repo - Clone/inspect Git repos
config_scanner - Parse cloud configs (Terraform, CloudFormation)
policy_engine - Check against compliance policies
report_generator - Produce audit reports

Workflow:

Agent scans repo for hardcoded secrets
Checks IAM policies for least-privilege violations
Verifies encryption at rest/transit
Generates compliance report with remediation steps

Conclusion

LLMs alone are consultants; agentic AI with MCP makes them autonomous workers.

The journey:

LLMs: "Here's how to solve this"
Agentic AI: "I can help you solve this"
MCP: The safe, standardized bridge between LLMs and your tools
Docker: Makes MCP servers portable, scalable, and isolated

For security teams, this opens new possibilities:

Automate repetitive analysis (triage, enrichment, reporting)
Reduce time-to-insight by 10-100
Standardize investigation playbooks via agents
Scale security operations without hiring proportionally

The key is control: MCP keeps the LLM in a sandbox, your tools safe, and the boundary clear.

Start small (test with filesystem or GitHub servers), understand the pattern, then wire up real security tools. In a few Docker commands and config changes, you'll have an agent doing your team's routine work.

The future of cybersecurity isn't more humans staring at dashboards-it's agents doing the legwork, humans making decisions.

References & Further Reading:

MCP Official Docs: https://modelcontextprotocol.io
Docker MCP Catalog: https://hub.docker.com/r/mcp (search mcp/ namespace)
Claude Desktop Setup: https://claude.ai/download
Network Chuck's MCP Tutorial: https://youtu.be/GuTcle5edjk
Building MCP Servers: https://github.com/modelcontextprotocol/servers (reference implementations)

Ready to start? Download Claude Desktop, run your first MCP server, and begin automating security workflows. The future of cybersecurity is autonomous agents-make it happen.

Tags:

flatline I am a cybersecurity enthusiast and an ethical hacker who's passionate about technology, loves to learn and now here I am sharing what I've learned so far. I have been tinkering around with these digital thingies for a long time. I have experience in Information Security, Computer & Network Administration, Operating Systems, Web Development, Programming, Graphic Designing, Video Editing and so on. I've always been keen to know that how things work, most importantly how they break!!

Agentic AI: From LLMs to Autonomous Agents with MCP and Docker

This blog explains how AI models are no able to use tools and automate tasks which seemed impossible a couple of years ago.

Introduction

Part 1: LLMs Are Amazing But Isolated

What LLMs Can Do

What LLMs Cannot Do (By Design)

The Disconnect

Part 2: Why LLMs Can't Directly Use Tools

Problem 1: No Standard Interface

Problem 2: No Dynamic Discovery

Problem 3: No Security Boundary

Part 3: The MCP Solution - A Standardized Protocol

The Architecture

Key Concepts

Why MCP Solves the Three Problems

Part 4: How the Loop Works - LLM + MCP in Action

Scenario

Behind the Scenes

Part 5: MCP + Docker - Practical Deployment

What's the Docker MCP Catalog?

Part 6: Hands-On Demo - Setting Up an MCP Server with Docker

Step 1: Run a Prebuilt MCP Server from the Catalog

Step 2: Configure an MCP Client

Step 3: Try a Task

Part 7: Building Your Own MCP Server

Server Code (Python Example)

Dockerfile

Build and Run

Part 8: Composing Multiple MCP Servers

Multi-Server Setup

Configure Client to Connect to All

Part 9: Security & Guardrails

1. Least Privilege Tool Design

2. Audit Logging

3. Rate Limiting

4. Container Isolation

5. Input Validation

Part 10: Real-World Patterns for Security Teams

Pattern 1: Vulnerability Triage Agent

Pattern 2: Log Analysis Agent

Pattern 3: Compliance Checker Agent

Conclusion

Tags:

Related Posts

Popular Posts

Follow Us

Recommended Posts

Popular Tags