Agentic AI: From LLMs to Autonomous Agents with MCP and Docker
This blog explains how AI models are no able to use tools and automate tasks which seemed impossible a couple of years ago.
Introduction
LLMs are impressive but limited. They predict text. They don't run scanners. They can't query databases. They can't create tickets or call APIs. They're powerful pattern-matching engines trapped in a sandbox.
Agentic AI breaks out of that sandbox by wrapping LLMs in a decision loop where they can plan, choose tools, execute actions, and adapt based on results. Instead of "ChatGPT explains how to scan a network," you get "an agent that autonomously runs reconnaissance, triages results, and drafts reports."
But there's a critical problem: LLMs can't directly use tools. This is where MCP (Model Context Protocol) comes in-a standardized interface that safely wires LLMs to the tools and systems they need. Combined with Docker and the MCP Catalog, you can spin up production-grade agentic systems in minutes.
This comprehensive guide explains why direct LLM-to-tool integration fails, how MCP solves it, walks through hands-on demos, and shows you real-world security patterns.
Part 1: LLMs Are Amazing But Isolated
What LLMs Can Do
LLMs excel at:
- Natural language understanding - Parse requests, infer intent, understand context
- Pattern completion - Generate code, write configs, draft reports, complete sequences
- Reasoning over descriptions - Analyze architectural diagrams, vulnerability assessments, compliance policies (all described in text)
- Multi-step planning - Break complex tasks into sequences, identify dependencies
In practice:
"Explain SQL injection and how to test for it"→ LLM provides a detailed tutorial"Write a Python script to find open S3 buckets"→ LLM generates code with explanation"Summarize security implications of this design doc"→ LLM produces analysis with risk matrix
These are useful, but they're all text-in, text-out. The LLM is a consultant; it advises but doesn't execute.
What LLMs Cannot Do (By Design)
Out of the box, LLMs have no ability to:
- Execute commands - Can't run
nmap,curl, shell scripts, system tools - Access external data - Can't query databases, APIs, or file systems in real-time
- Maintain state - Each request is independent; no memory of previous tool outputs or session context
- Interact with services - Can't authenticate to Jira, GitHub, cloud providers, or internal systems
- Guarantee accuracy - Hallucinate plausible-sounding but false information
ℹ️ Note: This isn't a bug; it's a security feature. You don't want an LLM with direct shell access or cloud credentials baked in.
The Disconnect
Imagine a user says: "Scan 10.0.0.0/24 for open ports and tell me which hosts are running outdated Apache."
An LLM responds: "You should run `nmap -p- -sV 10.0.0.0/24`. Then parse the output looking for 'Apache/2.2' or older versions."
But the LLM can't actually run the scan. It can't see the results. It can't correlate findings with threat intel. It can't be an autonomous agent; it's just a consultant giving instructions.
Part 2: Why LLMs Can't Directly Use Tools
There are three fundamental problems if you naively wire tools directly to an LLM:
Problem 1: No Standard Interface
Every tool is different:
- Nmap outputs XML, text, or JSON-parsing is fragile and inconsistent
- Jira API uses REST with OAuth and complex query syntax requiring authentication
- PostgreSQL requires SQL commands and connection pooling management
- AWS CLI has hundreds of commands with conflicting flag semantics
- Custom scripts have no standard calling convention or parameter format
If you hardcode each tool into the LLM's prompt, you get:
- Brittle, ad-hoc JSON templates for each tool (scales poorly)
- The LLM makes mistakes in formatting (missing quotes, wrong nesting, syntax errors)
- Adding a new tool requires rewriting prompts and retraining LLM intuitions
This doesn't scale. You're forced to maintain brittle glue code for every tool.
Problem 2: No Dynamic Discovery
How does the LLM know what tools exist right now?
You could hardcode a list in the system prompt:
"You have access to these tools: nmap, jira_create_ticket, query_db, get_slack_messages"
But this breaks when:
- You add a new tool (requires prompt engineering and model redeployment)
- You use different tools in different environments (dev vs. staging vs. prod)
- You want to grant different agents different permissions (one can scan, another can only read logs)
- Tools are added/removed at runtime
The LLM has no way to discover capabilities dynamically or adapt to what's available. It's locked into a static prompt.
Problem 3: No Security Boundary
Letting the LLM directly invoke tools is dangerous:
- Credential leakage - API keys, passwords, tokens end up in model context or logs
- No access control - LLM could call destructive tools (delete files, drop databases, terminate instances)
- No audit trail - Hard to track what the model did and why (compliance nightmare)
- No rate limiting - Model could spam API calls, overwhelming services
- Injection attacks - User input flows directly into tool invocation; easy to manipulate the model into calling unintended operations
⚠️ You need a controlled security boundary where you enforce:
- Authentication and authorization
- Logging and audit trails for compliance
- Rate limits and resource caps
- Input validation and sanitization
Part 3: The MCP Solution - A Standardized Protocol
Model Context Protocol (MCP) standardizes how LLM-based applications ("clients") discover and invoke capabilities from tool backends ("servers"). Think of it as a standardized plugin protocol for AI agents.
The Architecture
┌──────────────────────────┐
│ LLM / Agent Runtime │
│ (Claude, your app) │
└────────────┬─────────────┘
│ MCP Client
│ (Discovers tools, invokes them)
│
┌────────┴────────────────────┐
│ │
┌───▼──────────────┐ ┌──────────▼────────┐
│ MCP Server #1 │ │ MCP Server #2 │
│ (Filesystem, │ │ (GitHub, │
│ Git, etc.) │ │ Jira, etc.) │
└──────────────────┘ └───────────────────┘
Key Concepts
MCP Server - Exposes specific tools (e.g., list_files, read_file, create_issue)
- Runs as a separate process or container (isolated from LLM)
- Implements your actual logic (APIs, CLIs, SDKs, databases)
- Enforces auth, rate limits, and audit logging before execution
MCP Client - The host application that wants to use tools
- Connects to one or more MCP servers via TCP/stdio
- Queries "what tools are available?" via
tools/list - Invokes tools via
tools/callwith structured parameters - Receives structured results back (JSON, not raw text)
Protocol - Standard operations:
tools/list- "Tell me what you can do" → returns tool names, descriptions, input/output schemastools/call- "Run this tool with these arguments" → returns result or error
Why MCP Solves the Three Problems
1. Standard Interface
All tools speak the same MCP language:
- Tool descriptions follow a schema (name, parameters, return type)
- All invocations use
tools/callwith JSON - All results are structured data (JSON, not raw text or variadic output)
So the LLM doesn't need special logic for each tool; it just learns: "when I need to do X, look for a tool with purpose X, and call it with these params."
2. Dynamic Discovery
At startup (or anytime), the MCP client queries each connected server:
Client: "What tools do you have?"
Server: [
{ name: "scan_host", params: ["target", "ports"], ... },
{ name: "parse_results", params: ["scan_data"], ... }
]
The LLM learns about tools at runtime. Different environments can wire different servers. No prompting needed. Add/remove tools on the fly.
3. Security Boundary
MCP servers are separate processes with their own:
- Authentication - Server validates credentials before accepting calls
- Authorization - Server checks RBAC policies (user/agent can call certain tools only)
- Audit logging - Every
tools/callis logged for compliance - Input validation - Server sanitizes arguments before execution
- Rate limiting - Server throttles requests per agent/user
- Isolation - Server runs in a container with restricted filesystem/network
The LLM never sees credentials or infra details; it just says "call tool X with arg Y," and the MCP server handles the rest safely.
Part 4: How the Loop Works - LLM + MCP in Action
Here's a concrete example that shows the entire agentic loop:
Scenario
User: "Find all open SSH ports in 10.0.0.0/24 and tell me which ones allow root login."
Behind the Scenes
- LLM receives the request and sees it needs scanning + credential testing
MCP client has discovered tools:scan_network,check_ssh_auth,summarize_findings - LLM plans the workflow:
○ Callscan_network(target="10.0.0.0/24", ports="22")
○ For each responsive host, callcheck_ssh_auth(host, user="root")
○ Aggregate results intosummarize_findings(...) - First tool call: LLM decides → MCP client routes it to the appropriate server:
tools/call { tool: "scan_network", arguments: { target: "10.0.0.0/24", ports: "22" } } - MCP server executes:
○ Validates authentication (agent has permission to scan)
○ Runs actualnmapinside a container
○ Parses results into structured JSON
○ Returns:{ responsive_hosts: ["10.0.0.5", "10.0.0.17", ...], banners: {...} }
○ Logs: "Agent X called scan_network with args Y at timestamp Z" - LLM observes results: Context now includes scan data. LLM continues with credential checks
- Repeat for each host, then call
summarize_findings(...) - Final output: LLM produces a human-readable report with findings and recommendations
Key insight: At no point does the LLM see credentials, execute shell commands, or access the network directly. Everything is mediated through MCP servers you control. This is the security magic of MCP.
Part 5: MCP + Docker - Practical Deployment
MCP servers can run anywhere, but Docker is ideal because it:
- Isolates dependencies - Each server has its own Python, Node, CLI tools
- Enforces resource limits - Cap CPU, memory, disk per server
- Simplifies distribution - One image works on Linux/macOS/Windows/cloud
- Enables Docker MCP Catalog - Prebuilt, signed, audited server images
What's the Docker MCP Catalog?
Docker, Anthropic, and others maintain a curated registry of MCP servers under the mcp/ namespace on Docker Hub.
Examples:
mcp/filesystem- Secure file operations (list, read, write within allowed paths)mcp/git- Inspect Git repositories (branches, commits, diffs)mcp/github- Interact with GitHub (list issues, create PRs, read repos)mcp/postgres- Query PostgreSQL databases (with prepared statements)mcp/elasticsearch- Search and analyze logs
Each image includes:
- Tool definitions (schemas, descriptions, examples)
- Audit logs / compliance guarantees
- Version history and security updates
Part 6: Hands-On Demo - Setting Up an MCP Server with Docker
Let's walk through a concrete, minimal example that gets you running MCP quickly.
Step 1: Run a Prebuilt MCP Server from the Catalog
Start with the filesystem server as a safe, instructive example:
docker run --rm \
-e MCP_ALLOWED_PATHS=/workspace \
-p 3000:3000 \
mcp/filesystem:latest
What this does:
- Pulls the official
mcp/filesystemimage from Docker Hub - Grants the server access only to
/workspace(sandboxed) - Exposes MCP server on
localhost:3000 --rmcleans up the container when it exits
Inside the container, the MCP server starts and is ready to accept MCP client connections. It can expose tools like:
list_directory(path)- List files under/workspaceread_file(path)- Read a file's contentswrite_file(path, contents)- Create/update a filesearch_files(pattern)- Search for files matching a pattern
Step 2: Configure an MCP Client
On the "LLM side," you need an application that speaks MCP. The easiest is Claude Desktop:
Configuration file: ~/.claude/claude_desktop_config.json
Add entry for the MCP server:
{
"mcpServers": {
"filesystem": {
"transport": "tcp",
"host": "localhost",
"port": 3000
}
}
}
Restart Claude Desktop, and it will:
- Connect to
localhost:3000 - Query
tools/listand learn aboutlist_directory,read_file, etc. - Make those tools available to Claude in the chat UI
Step 3: Try a Task
Once Claude Desktop (or your app) is connected:
You: "Audit the /workspace/config directory and tell me if there are any hardcoded credentials in the files."
Claude (LLM):
- Calls
list_directory("/workspace/config") - Sees files like
database.conf,api.cfg,.env - For each file, calls
read_file(file_path) - Scans content for patterns like
password=,API_KEY=,secret: - Generates a report with findings and recommendations
Result: You get an automated security audit with a natural-language report, all without the LLM touching your system directly.
Part 7: Building Your Own MCP Server
After testing prebuilt servers, you can build custom ones. Here's the conceptual structure:
Server Code (Python Example)
# mcp_server.py
from mcp.server import Server
from mcp.types import Tool, TextContent
import subprocess
server = Server("nmap-scanner")
@server.tool()
def scan_host(target: str, ports: str = "1-1000") -> str:
"""Run nmap scan on a target"""
# Validate inputs (prevent injection)
if not is_valid_ip(target):
return f"Error: invalid target {target}"
# Execute nmap in container
result = subprocess.run(
["nmap", "-p", ports, "-sV", target],
capture_output=True,
text=True
)
# Parse and return structured result
return parse_nmap_output(result.stdout)
# Start MCP server (listens on TCP 3000)
server.run()
Dockerfile
FROM python:3.12-slim
WORKDIR /app
# Install nmap and other tools
RUN apt-get update && apt-get install -y nmap && rm -rf /var/lib/apt/lists/*
# Copy server code
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY mcp_server.py .
# Expose MCP port
EXPOSE 3000
CMD ["python", "mcp_server.py"]
Build and Run
# Build
docker build -t my-nmap-mcp-server .
# Run with resource limits
docker run --rm \
-p 3000:3000 \
--memory="512m" \
--cpus="1" \
my-nmap-mcp-server
# Now connect an MCP client to localhost:3000
Part 8: Composing Multiple MCP Servers
Real-world agents often need multiple tools from different domains. You can compose them easily:
Multi-Server Setup
# Terminal 1: Filesystem server
docker run -p 3000:3000 mcp/filesystem:latest
# Terminal 2: GitHub server
docker run -p 3001:3001 mcp/github:latest
# Terminal 3: PostgreSQL server
docker run -p 3002:3002 mcp/postgres:latest \
-e DATABASE_URL=postgresql://user:[email protected]/mydb
Configure Client to Connect to All
Claude Desktop config:
{
"mcpServers": {
"filesystem": {
"transport": "tcp",
"host": "localhost",
"port": 3000
},
"github": {
"transport": "tcp",
"host": "localhost",
"port": 3001
},
"database": {
"transport": "tcp",
"host": "localhost",
"port": 3002
}
}
}
Now Claude can orchestrate across all tools:
- Query the database for vulnerability scan results
- Read the GitHub repo to get the codebase
- Write a detailed security report to the filesystem
All orchestrated by the LLM, safely isolated via MCP.
Part 9: Security & Guardrails
Agentic AI is powerful but risky without constraints. Here's how MCP helps:
1. Least Privilege Tool Design
Don't expose raw bash or exec. Instead, expose specific, bounded operations:
Dangerous: ❌
@server.tool()
def run_command(cmd: str):
return subprocess.run(cmd, shell=True, capture_output=True)
Safe: ✅
@server.tool()
def scan_host(target: str, ports: str):
# Whitelist inputs, run only nmap
validate_ip(target)
validate_ports(ports)
return subprocess.run(["nmap", "-p", ports, target], ...)
2. Audit Logging
Every tools/call should be logged:
@server.middleware()
def audit_log(tool_name, arguments, result):
log.info({
"timestamp": now(),
"agent_id": context.agent_id,
"tool": tool_name,
"args": arguments,
"status": "success" or "failure"
})
3. Rate Limiting
Prevent tool abuse:
@rate_limit(calls=10, period=60) # 10 calls per minute
@server.tool()
def expensive_scan(target):
...
4. Container Isolation
Run MCP servers with resource limits:
docker run --rm \
--memory="256m" \
--cpus="0.5" \
--read-only \
--cap-drop=ALL \
my-mcp-server
5. Input Validation
Sanitize all arguments:
@server.tool()
def query_db(sql: str):
# Prevent SQL injection
if not is_safe_sql(sql):
return "Error: query rejected for safety"
return db.execute(sql)
Part 10: Real-World Patterns for Security Teams
Pattern 1: Vulnerability Triage Agent
MCP servers:
vulnerability_scanner- Run Nessus, OpenVAS, or Trivyasset_database- Query CMDB for host metadatathreat_intelligence- Look up CVE severity and exploitabilityticketing- Create Jira tickets for findings
Workflow:
- Agent runs scan → gets list of vulns
- Enriches with asset metadata (owner, environment, criticality)
- Checks threat intel (is CVE actively exploited?)
- Prioritizes and creates tickets with context
- Sends Slack notification with summary
Pattern 2: Log Analysis Agent
MCP servers:
siem_query- Query Splunk / ELK for logsalert_api- Fetch alerts from your SOARioc_lookup- Check if IPs/domains are known maliciouscase_management- Create/update incident cases
Workflow:
- Agent polls for new alerts
- Queries SIEM for related logs
- Correlates events to identify attack patterns
- Checks IoC databases
- Drafts incident summary and escalation recommendation
Pattern 3: Compliance Checker Agent
MCP servers:
source_code_repo- Clone/inspect Git reposconfig_scanner- Parse cloud configs (Terraform, CloudFormation)policy_engine- Check against compliance policiesreport_generator- Produce audit reports
Workflow:
- Agent scans repo for hardcoded secrets
- Checks IAM policies for least-privilege violations
- Verifies encryption at rest/transit
- Generates compliance report with remediation steps
Conclusion
LLMs alone are consultants; agentic AI with MCP makes them autonomous workers.
The journey:
- LLMs: "Here's how to solve this"
- Agentic AI: "I can help you solve this"
- MCP: The safe, standardized bridge between LLMs and your tools
- Docker: Makes MCP servers portable, scalable, and isolated
For security teams, this opens new possibilities:
- Automate repetitive analysis (triage, enrichment, reporting)
- Reduce time-to-insight by 10-100
- Standardize investigation playbooks via agents
- Scale security operations without hiring proportionally
The key is control: MCP keeps the LLM in a sandbox, your tools safe, and the boundary clear.
Start small (test with filesystem or GitHub servers), understand the pattern, then wire up real security tools. In a few Docker commands and config changes, you'll have an agent doing your team's routine work.
The future of cybersecurity isn't more humans staring at dashboards-it's agents doing the legwork, humans making decisions.
References & Further Reading:
- MCP Official Docs: https://modelcontextprotocol.io
- Docker MCP Catalog: https://hub.docker.com/r/mcp (search
mcp/namespace) - Claude Desktop Setup: https://claude.ai/download
- Network Chuck's MCP Tutorial: https://youtu.be/GuTcle5edjk
- Building MCP Servers: https://github.com/modelcontextprotocol/servers (reference implementations)
Ready to start? Download Claude Desktop, run your first MCP server, and begin automating security workflows. The future of cybersecurity is autonomous agents-make it happen.