Thanos Consulting &
Enterprise Support Services
Procedure is a software engineering consultancy based in Mumbai and San Francisco that provides Thanos consulting, implementation, and commercial support for teams scaling Prometheus with long-term storage, high availability, and multi-cluster observability.
Prefer to write first? Contact us
Trusted by engineering teams at
Key Capabilities
Everything you need to build production-grade solutions
Thanos Implementation & Setup
We deploy Thanos alongside your existing Prometheus - Sidecar or Receiver mode, Store Gateway, Compactor, Query, and Query Frontend. Object storage integration with S3, GCS, MinIO, or Azure Blob. Production-ready in weeks.
Multi-cluster Observability Setup
Federate metrics from multiple Prometheus instances into a single Thanos Query endpoint. We configure cross-cluster discovery, deduplication, tenant isolation, and unified Grafana dashboards for global infrastructure visibility.
Migration & Storage Optimization
Moving from expensive metric storage or a single overloaded Prometheus? We handle data migration to object storage, configure retention and downsampling policies, and tune Compactor settings for cost-efficient long-term storage.
Thanos Commercial Support & SLA
Production incidents don't wait for business hours. We provide enterprise support - incident response within SLA, managed upgrades, Compactor health monitoring, capacity planning, and security patching for your Thanos deployment.
Who We Work With
Teams Outgrowing Prometheus Storage
Engineering teams hitting Prometheus retention limits who need long-term metric storage without re-architecting their entire monitoring stack.
Multi-cluster Kubernetes Operations
Organizations running multiple Kubernetes clusters that need a unified PromQL endpoint across all their Prometheus instances.
Teams Needing HA Monitoring
Companies where monitoring downtime means missed incidents - needing high availability and replica deduplication for their Prometheus setup.
With engineering leadership across India and a presence in San Francisco, we support teams running Thanos at global scale.
Why Engineering Teams Deploy Thanos
Prometheus wasn't built for months of data
Prometheus stores metrics locally with a default 15-day retention. Need six months of data for capacity planning or compliance? You're either burning disk or losing history. Thanos offloads metrics to cheap object storage like S3 or GCS.
Multi-cluster visibility is a real problem
Ten Kubernetes clusters means ten separate Prometheus instances with no shared view. Thanos Query federates them into one PromQL endpoint, so your team gets a single pane of glass without building custom glue code or scripts.
Prometheus downtime means monitoring gaps
A single Prometheus server is a single point of failure. When it restarts or crashes, you lose in-flight metrics. Thanos Sidecar plus replica deduplication gives you high availability without re-architecting your entire stack.
Object storage is 10-50x cheaper than SSDs
Storing terabytes of metrics on local SSDs is expensive and doesn't scale. Thanos uses S3, GCS, or Azure Blob as a storage backend - with automatic downsampling to keep long-range queries fast while reducing storage cost significantly.
Should You Use Thanos? An Honest Assessment
Thanos is a good fit when
- You're running multiple Prometheus instances and need a unified query layer
- You need metric retention beyond 15-30 days (compliance, capacity planning, trend analysis)
- Your Prometheus storage costs are climbing and you want to offload to object storage
- You need high availability for monitoring - single Prometheus is a SPOF
- You're already on Kubernetes and using the Prometheus Operator
- You want to keep the Prometheus query language (PromQL) across everything
You might not need Thanos when
- You have a single small Prometheus instance with modest retention needs
- Your team is happy with 15-30 days of local retention and doesn't query historical data
- You're already using Grafana Mimir or Cortex for long-term storage
- You want a fully managed solution with zero operational overhead - consider Amazon Managed Prometheus or Grafana Cloud
- Your monitoring stack isn't Prometheus-based (Thanos only works with Prometheus)
The common architecture (what production teams actually run)
Prometheus with Thanos Sidecar in each cluster, metrics shipped to S3/GCS via Thanos Store Gateway, Compactor handling downsampling and retention, and Thanos Query sitting in front of everything for a unified PromQL endpoint. Grafana dashboards point at Thanos Query instead of individual Prometheus instances.
Our Process
A predictable process built for high-quality delivery
Assessment
We review your current Prometheus setup - how many instances, retention policies, storage costs, query patterns, and pain points. You get a written recommendation on whether Thanos is the right fit, what deployment mode works best (Sidecar vs Receiver), and a rough cost model.
Architecture Design
Thanos component topology, object storage backend selection, retention and downsampling policies, multi-cluster discovery setup, Query Frontend caching strategy, and Grafana integration plan. Documented so your team can review and challenge it.
Implementation
Deploy Thanos components alongside your existing Prometheus instances. Configure object storage, set up Compactor, connect Query and Query Frontend, build unified Grafana dashboards. We work in your infrastructure, with your team, using your CI/CD pipelines.
Knowledge Transfer
Runbooks for each Thanos component, PromQL training for querying across clusters, Compactor troubleshooting guides, capacity planning docs. The goal: your team operates independently after we leave.
Ongoing Support (optional)
We stay on for production support - Compactor health monitoring, storage cost optimization, Thanos version upgrades, capacity planning, and incident response. Engagement model based on your needs.
Thanos Components We Deploy & Support
| Component | What It Does |
|---|---|
| Thanos Sidecar | Ships metrics from Prometheus to object storage, exposes StoreAPI for real-time queries |
| Thanos Receiver | Alternative to Sidecar - accepts remote write from Prometheus, supports multi-tenancy |
| Thanos Store Gateway | Serves historical metrics from object storage (S3, GCS, Azure Blob, MinIO) |
| Thanos Query | Federates queries across Sidecars, Store Gateways, and other Queriers - single PromQL endpoint |
| Thanos Query Frontend | Caching and query splitting layer in front of Query for faster long-range queries |
| Thanos Compactor | Downsamples historical data (5m, 1h) and compacts blocks to reduce storage cost |
| Thanos Ruler | Evaluates recording and alerting rules against Thanos Query for global alerts |
| Object Storage | S3, GCS, Azure Blob, MinIO - the actual long-term storage backend |
| Grafana | Unified dashboards pointing at Thanos Query for cross-cluster visualization |
| Prometheus Operator | kube-prometheus-stack with Thanos Sidecar integration via Helm |
Use Cases
Real-world applications we help teams build and scale
Advisory Consulting
Architecture reviews, Thanos assessments, and strategic guidance for long-term storage and multi-cluster observability decisions
Hands-On Implementation
Thanos deployment, migration, and configuration work alongside your engineering team
Ongoing Production Support
Continuous optimization, incident response, upgrades, and scaling as your Thanos deployment grows
Why Choose Procedure for Thanos Consulting Services
Outcomes from recent engagements
Companies choose Procedure because:
Testimonials
Trusted by Engineering Leaders
“What started with one engineer nearly three years ago has grown into a team of five, each fully owning their deliverables. They've taken on critical core roles across teams. We're extremely pleased with the commitment and engagement they bring.”

“We've worked with Procedure across our portfolio, and the experience has been exceptional. They consistently deliver on every promise and adapt quickly to shifting project needs. We wholeheartedly recommend them for anyone seeking a reliable development partner.”

“Procedure has been our partner from inception through rapid growth. Their engineers are exceptionally talented and have proven essential to building out our engineering capacity. The leadership have been thought partners on key engineering decisions. Couldn't recommend them more highly!”

“What started with one engineer nearly three years ago has grown into a team of five, each fully owning their deliverables. They've taken on critical core roles across teams. We're extremely pleased with the commitment and engagement they bring.”

Why Quality Matters
Poor engineering costs you
Storage Cost Spiral
Storing months of metrics on local SSDs is expensive and doesn't scale
Monitoring Blind Spots
Separate Prometheus instances per cluster with no unified view
Single Point of Failure
Prometheus downtime means monitoring gaps and missed incidents
Knowledge Silos
Complex multi-cluster setups that only one person understands
Premium development is an investment in
Prometheus Metrics Growing Faster Than Your Retention Budget?
We'll audit your current setup and tell you whether Thanos is the right move - or if something else makes more sense.
Schedule a CallNo sales pitch. Just an honest conversation.
Ready to Discuss Your
Thanos Consulting Services Project?
Talk directly with engineers, not sales. We'll assess your monitoring stack and give honest next steps - even if that means you don't need Thanos yet.
Loading calendar...
Frequently Asked Questions
Thanos is a CNCF Incubating project that adds long-term storage, high availability, and global querying to Prometheus. It works alongside your existing Prometheus instances - you don't replace Prometheus, you extend it. Thanos ships metrics to cheap object storage (S3, GCS, Azure Blob), deduplicates data from HA Prometheus pairs, and gives you a single PromQL endpoint to query metrics across all your clusters.