Skip to content

wonkday/sre-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SRE Agent

A sophisticated multi-agent system for Site Reliability Engineering (SRE) operations with an interactive chat interface. This application enables engineers to query infrastructure, Kubernetes clusters, and monitoring systems using natural language.

📋 Summary

This project is a multi-agent AI system that provides a conversational interface for SRE operations. It consists of:

  • SRE Agent: A main orchestrating agent that intelligently routes queries to specialized agents
  • Kubectl Agent: Handles Kubernetes-related queries through kubectl-ai MCP server (pods, deployments, services, logs, etc.)
  • Prometheus Agent: Manages metrics and monitoring queries (HTTP request metrics, error rates, performance data)
  • Web Interface: A modern React-based chat UI for interacting with the agents

The system is built with LangGraph for agent orchestration and supports multiple LLM providers. It provides real-time insights into infrastructure health, application performance, and helps correlate issues across different systems.

Key Features

  • 🤖 Intelligent Query Routing: Automatically determines which specialized agents to invoke based on user queries
  • 🔍 Multi-Agent Coordination: Correlates data from multiple sources (Kubernetes, Prometheus) for unified insights
  • 📊 Rich Visualizations: Displays metrics, pod status, and error rates in formatted tables and summaries
  • 🌐 Web UI: Modern, responsive chat interface for seamless interaction
  • 🔄 Graceful Degradation: Falls back to mock data when external services are unavailable

🚀 Getting Started

Prerequisites

  • Node.js 20 or higher
  • npm 11.2.1 or higher
  • Access to Kubernetes cluster (optional, for kubectl agent)
  • Prometheus endpoint (optional, for metrics queries)

Initialization Steps

1. Clone the Repository

git clone <repository-url>
cd SRE-Agent

2. Install Dependencies

Install all project dependencies (this will install dependencies for both the root workspace and all apps):

npm install

3. Environment Configuration

Create a .env file in the project root with the following configuration:

# Google Gemini API Key (required)
GOOGLE_API_KEY=your-gemini-api-key-here

# Kubectl-AI MCP Server Configuration (optional)
# Note: Default port in start-kubectl-ai-mcp.sh is 8180
KUBECTL_AI_MCP_ENDPOINT=http://localhost:8180
KUBECTL_AI_API_KEY=your_api_key_here

# Prometheus Configuration (optional)
# Use localhost:9090 when using port forwarding
PROMETHEUS_ENDPOINT=http://localhost:9090
PROMETHEUS_API_KEY=your_prometheus_api_key

# Default LLM Model (optional)
DEFAULT_MODEL=google/gemini-2.0-flash-lite

# Proxy Configuration (if behind corporate firewall)
# HTTP_PROXY=http://your-proxy-server:port
# HTTPS_PROXY=http://your-proxy-server:port
# NO_PROXY=localhost,127.0.0.1,.local,.internal

Note:

  • Refer to .env.example file for full config. Copy contents to .env and update with relevant values
  • See KUBECTL_AI_SETUP.md for kubectl-ai MCP server setup and Prometheus port forwarding instructions
  • See PROXY_SETUP.md for detailed proxy configuration if you're behind a corporate firewall

4. Start the Development Servers

You have two options to run the application:

Option A: Run both Agent and Web together (Recommended)

npm run dev

This command starts both:

  • Agent Server: Runs on port 2024 (default)
  • Web UI: Runs on port 5173 (default Vite port)

Option B: Run with simplified agent setup

npm run dev:simple

This uses a simplified agent configuration without full LangGraph Studio integration.

5. Access the Application

Once the servers are running:

  • Web UI: Open your browser and navigate to http://localhost:5173
  • Agent API: Available at http://localhost:2024

Starting Components Individually

If you need to start components separately:

Start Agent Server Only

cd apps/agents
npm run dev

Or with simplified mode:

cd apps/agents
npm run dev:simple

Start Web UI Only

cd apps/web
npm run dev

The web UI will be available at http://localhost:5173.

Using the Application

Before starting the agent, ensure:

  1. kubectl-ai MCP Server is running (if using Kubernetes features):

    ./scripts/start-kubectl-ai-mcp.sh
  2. Prometheus port forwarding is active (if using metrics features):

    ./scripts/accessP8sViaPortFwd.sh
    # Or run in background: nohup ./scripts/accessP8sViaPortFwd.sh &

See KUBECTL_AI_SETUP.md for detailed setup instructions.

Then:

  1. Access the Web UI: Open http://localhost:5173 in your browser
  2. Configure Connection: Enter the following:
    • Deployment URL: http://localhost:2024 (for local development)
    • Assistant/Graph ID: sre_agent
    • LangSmith API Key: (Optional, only required for deployed servers)
  3. Start Chatting: Begin asking questions about your infrastructure!

Example Queries

Kubernetes Queries:

  • "What's the status of the user-service deployment?"
  • "Show me pods with high CPU usage"
  • "List all pods in the default namespace"
  • "What's the version of metrics-demo microservice?"

Prometheus Queries:

  • "What's the current HTTP request rate for the API?"
  • "Show me the error rate for metrics-demo in the last hour"
  • "What's the success/failure rate for my application?"

Multi-Agent Queries:

  • "Check ms version, pod health, success/failure metrics for metrics-demo ms in default namespace"
  • "Is the high error rate related to pod failures?"

📸 Sample Run Outputs

Example Query and Response

Input Query - Example #1 :

check ms version, pod health, success/failure metrics for metrics-demo ms in default namespace

Response output #1:

Sample Output

Input Query - Example #2 :

perform same check again

Response output #2: Sample Output 2

The agent provides a comprehensive analysis including:

  1. 📊 Summary Section
  2. 📋 Detailed Analysis
  3. 🎯 Recommendations
  4. 🔍 Suggested Next Steps

See SampleOutput.md for the complete formatted output example.

🏗️ Project Structure

SRE-Agent/
├── apps/
│   ├── agents/          # LangGraph agents (SRE Agent, Research Agent)
│   │   └── src/
│   │       ├── sre-agent/    # Main SRE agent with kubectl & prometheus tools
│   │       └── research-agent/ # Research agent for document retrieval
│   └── web/             # React-based web UI
│       └── src/
│           ├── components/   # UI components
│           ├── providers/   # LangGraph client providers
│           └── hooks/        # React hooks
├── langgraph.json       # LangGraph configuration
├── package.json         # Root workspace configuration
└── .env                 # Environment variables (create this)

📚 Additional Documentation

  • Agent Documentation: See apps/agents/src/sre-agent/README.md for detailed SRE agent architecture
  • Web UI Documentation: See apps/web/README.md for web interface details
  • kubectl-ai Setup: See KUBECTL_AI_SETUP.md for kubectl-ai MCP server setup and configuration
  • Proxy Setup: See PROXY_SETUP.md for corporate firewall configuration
  • Sample Output: See SampleOutput.md for a complete example response

🔧 Development

Building the Project

npm run build

Linting

npm run lint
npm run lint:fix  # Auto-fix linting issues

Formatting

npm run format

🐛 Troubleshooting

Common Issues

  1. Agent Server Won't Start

    • Verify .env file exists and contains required API keys
    • Check that port 2024 is not already in use
    • Review logs in apps/agents/logs/sre-agent.log
  2. Web UI Can't Connect to Agent

    • Ensure agent server is running on port 2024
    • Check that the Deployment URL in web UI is http://localhost:2024
    • Verify CORS configuration if accessing from different origin
  3. MCP Server Connection Failed

    • Verify kubectl-ai MCP server is running
    • Check KUBECTL_AI_MCP_ENDPOINT in .env
    • Agent will use mock data if MCP server is unavailable
  4. Prometheus Connection Failed

    • Verify Prometheus server is accessible
    • Check PROMETHEUS_ENDPOINT in .env
    • Ensure proper authentication if required

📝 License

Private project - All rights reserved

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors