☁️ Cloud Engine
The Cloud Engine is Vespera's infrastructure operations brain. It combines AI-generated recommendations, a RAG knowledge base of cloud best practices, compliance guardrails, and direct Terraform CLI validation into a single Discord workflow.
Overview
graph LR
A[User: /cloud-advise] --> B[CloudAIAdvisor]
B --> C[RAG Knowledge Base\n186 best practices]
B --> D[Guardrails\nBudget · Security · Compliance]
B --> E{Model Router}
E -->|Fast| F[Groq Llama 3.3]
E -->|Complex| G[Gemini Pro 1.5]
F & G --> H[JSON Recommendation]
H --> I[Discord Embed]
AI Model Routing
| Task | Model | Reason |
|---|---|---|
| General recommendations | Groq Llama 3.3 | Ultra-fast (~1–2s), sufficient reasoning |
| Complex doc analysis | Gemini Pro 1.5 | 1M+ token context window for log/state files |
| Balanced quality | Groq Mixtral | Trade-off between speed and depth |
Key Features
RAG Knowledge Base
- 186 best practices ingested from Markdown across GCP, AWS, and Azure
- Hybrid search: keyword matching + metadata category filtering
- Categories: Security (48), Performance (47), Cost (46), Reliability (45)
Guardrails System
Runs before any AI generation to block or warn on bad requests:
| Guardrail | Action |
|---|---|
| GPU workload + Low budget | ⛔ BLOCKED |
| Healthcare data + Standard security | ⚠️ WARNING |
| HIPAA compliance missing audit logging | ⛔ BLOCKED |
| Overly complex architecture for beginners | ⚠️ WARNING |
Chain-of-Thought Reasoning
For every recommendation, the model is forced through a structured internal [Reasoning] block before generating output:
- Requirement Analysis
- Constraints Identification
- Best Practice Matching
- Trade-off Analysis (Cost / Simplicity / Security / Scalability)
This prevents the "commit-then-justify" bias where an LLM picks an answer first and rationalizes it second.
Terraform Validation Pipeline
graph LR
A[Session Created] --> B[terraform fmt]
B --> C[terraform init]
C --> D[terraform validate]
D --> E[terraform plan]
E --> F[terraform show -json]
F --> G[Discord Report]
Discord Commands
/cloud-advise
Get AI-powered cloud infrastructure recommendations.
Parameters:
use_case (required) What are you building?
provider GCP | AWS | Azure | Any (compare all)
budget Low (<$100/mo) | Medium | High (>$1000/mo)
security_level Standard | Enhanced | Compliance (HIPAA/GDPR/PCI)
ai_model Groq Llama 3.3 | Groq Mixtral | Gemini Pro | Gemini Flash
/cloud-validate
Runs the full Terraform validation pipeline against a session's generated configuration.
/cloud-deploy
Generates Terraform code from an advisory session and creates a deployment session ID.
Technical Highlight — Lazy Glossary Injection
~90% Token Reduction
Rather than injecting the entire Cloud or D&D glossary into every prompt, the system scans the user's input for known keywords first. Only terms that appear in the input are injected into the prompt.
- Input:
"I need a Kubernetes cluster for my microservices" - Injected: only the
Kubernetesandmicroservicesglossary entries - Result: token consumption reduced ~90%, with higher accuracy (no irrelevant noise in context)
Performance Metrics
| Model | Avg Response | Notes |
|---|---|---|
| Groq Llama 3.3 | 1–2s | Recommended default |
| Groq Mixtral | 2–3s | Higher quality |
| Gemini Pro | 3–5s | Large context tasks |
| Gemini Flash | 1–3s | Fast Google alternative |
| Operation | Time |
|---|---|
| RAG hybrid search | < 50ms |
| Terraform fmt | < 1s |
| Terraform init (cached) | 1–3s |
| Terraform plan | 3–10s |