Skip to main content

Thanos Consulting &
Enterprise Support Services

Procedure is a software engineering consultancy based in Mumbai and San Francisco that provides Thanos consulting, implementation, and commercial support for teams scaling Prometheus with long-term storage, high availability, and multi-cluster observability.

5 days
Time to first deployment
3+ years
Average partnership
98%
Client retention

Prefer to write first? Contact us

Free assessment
30-minute call
Talk with engineers, not sales

Trusted by engineering teams at

Aster logo
ESPN logo
KredX logo
MCLabs logo
Pine Labs logo
Setu logo
Tenmeya logo
Timely logo
Treebo logo
Turtlemint logo
Workshop Ventures logo
Last9 logo
Monaire logo
Aster logo
ESPN logo
KredX logo
MCLabs logo
Pine Labs logo
Setu logo
Tenmeya logo
Timely logo
Treebo logo
Turtlemint logo
Workshop Ventures logo
Last9 logo
Monaire logo

Key Capabilities

Everything you need to build production-grade solutions

Thanos Implementation & Setup

We deploy Thanos alongside your existing Prometheus - Sidecar or Receiver mode, Store Gateway, Compactor, Query, and Query Frontend. Object storage integration with S3, GCS, MinIO, or Azure Blob. Production-ready in weeks.

Multi-cluster Observability Setup

Federate metrics from multiple Prometheus instances into a single Thanos Query endpoint. We configure cross-cluster discovery, deduplication, tenant isolation, and unified Grafana dashboards for global infrastructure visibility.

Migration & Storage Optimization

Moving from expensive metric storage or a single overloaded Prometheus? We handle data migration to object storage, configure retention and downsampling policies, and tune Compactor settings for cost-efficient long-term storage.

Thanos Commercial Support & SLA

Production incidents don't wait for business hours. We provide enterprise support - incident response within SLA, managed upgrades, Compactor health monitoring, capacity planning, and security patching for your Thanos deployment.

Who We Work With

Teams Outgrowing Prometheus Storage

Engineering teams hitting Prometheus retention limits who need long-term metric storage without re-architecting their entire monitoring stack.

Multi-cluster Kubernetes Operations

Organizations running multiple Kubernetes clusters that need a unified PromQL endpoint across all their Prometheus instances.

Teams Needing HA Monitoring

Companies where monitoring downtime means missed incidents - needing high availability and replica deduplication for their Prometheus setup.

With engineering leadership across India and a presence in San Francisco, we support teams running Thanos at global scale.

Why Engineering Teams Deploy Thanos

Prometheus wasn't built for months of data

Prometheus stores metrics locally with a default 15-day retention. Need six months of data for capacity planning or compliance? You're either burning disk or losing history. Thanos offloads metrics to cheap object storage like S3 or GCS.

Multi-cluster visibility is a real problem

Ten Kubernetes clusters means ten separate Prometheus instances with no shared view. Thanos Query federates them into one PromQL endpoint, so your team gets a single pane of glass without building custom glue code or scripts.

Prometheus downtime means monitoring gaps

A single Prometheus server is a single point of failure. When it restarts or crashes, you lose in-flight metrics. Thanos Sidecar plus replica deduplication gives you high availability without re-architecting your entire stack.

Object storage is 10-50x cheaper than SSDs

Storing terabytes of metrics on local SSDs is expensive and doesn't scale. Thanos uses S3, GCS, or Azure Blob as a storage backend - with automatic downsampling to keep long-range queries fast while reducing storage cost significantly.

Should You Use Thanos? An Honest Assessment

Thanos is a good fit when

  • You're running multiple Prometheus instances and need a unified query layer
  • You need metric retention beyond 15-30 days (compliance, capacity planning, trend analysis)
  • Your Prometheus storage costs are climbing and you want to offload to object storage
  • You need high availability for monitoring - single Prometheus is a SPOF
  • You're already on Kubernetes and using the Prometheus Operator
  • You want to keep the Prometheus query language (PromQL) across everything

You might not need Thanos when

  • You have a single small Prometheus instance with modest retention needs
  • Your team is happy with 15-30 days of local retention and doesn't query historical data
  • You're already using Grafana Mimir or Cortex for long-term storage
  • You want a fully managed solution with zero operational overhead - consider Amazon Managed Prometheus or Grafana Cloud
  • Your monitoring stack isn't Prometheus-based (Thanos only works with Prometheus)

The common architecture (what production teams actually run)

Prometheus with Thanos Sidecar in each cluster, metrics shipped to S3/GCS via Thanos Store Gateway, Compactor handling downsampling and retention, and Thanos Query sitting in front of everything for a unified PromQL endpoint. Grafana dashboards point at Thanos Query instead of individual Prometheus instances.

Our Process

A predictable process built for high-quality delivery

01

Assessment

We review your current Prometheus setup - how many instances, retention policies, storage costs, query patterns, and pain points. You get a written recommendation on whether Thanos is the right fit, what deployment mode works best (Sidecar vs Receiver), and a rough cost model.

02

Architecture Design

Thanos component topology, object storage backend selection, retention and downsampling policies, multi-cluster discovery setup, Query Frontend caching strategy, and Grafana integration plan. Documented so your team can review and challenge it.

03

Implementation

Deploy Thanos components alongside your existing Prometheus instances. Configure object storage, set up Compactor, connect Query and Query Frontend, build unified Grafana dashboards. We work in your infrastructure, with your team, using your CI/CD pipelines.

04

Knowledge Transfer

Runbooks for each Thanos component, PromQL training for querying across clusters, Compactor troubleshooting guides, capacity planning docs. The goal: your team operates independently after we leave.

05

Ongoing Support (optional)

We stay on for production support - Compactor health monitoring, storage cost optimization, Thanos version upgrades, capacity planning, and incident response. Engagement model based on your needs.

Thanos Components We Deploy & Support

ComponentWhat It Does
Thanos SidecarShips metrics from Prometheus to object storage, exposes StoreAPI for real-time queries
Thanos ReceiverAlternative to Sidecar - accepts remote write from Prometheus, supports multi-tenancy
Thanos Store GatewayServes historical metrics from object storage (S3, GCS, Azure Blob, MinIO)
Thanos QueryFederates queries across Sidecars, Store Gateways, and other Queriers - single PromQL endpoint
Thanos Query FrontendCaching and query splitting layer in front of Query for faster long-range queries
Thanos CompactorDownsamples historical data (5m, 1h) and compacts blocks to reduce storage cost
Thanos RulerEvaluates recording and alerting rules against Thanos Query for global alerts
Object StorageS3, GCS, Azure Blob, MinIO - the actual long-term storage backend
GrafanaUnified dashboards pointing at Thanos Query for cross-cluster visualization
Prometheus Operatorkube-prometheus-stack with Thanos Sidecar integration via Helm

Use Cases

Real-world applications we help teams build and scale

01

Advisory Consulting

Architecture reviews, Thanos assessments, and strategic guidance for long-term storage and multi-cluster observability decisions

02

Hands-On Implementation

Thanos deployment, migration, and configuration work alongside your engineering team

03

Ongoing Production Support

Continuous optimization, incident response, upgrades, and scaling as your Thanos deployment grows

Why Choose Procedure for Thanos Consulting Services

Outcomes from recent engagements

ReducedStorage costs through object storage and downsampling
UnifiedCross-cluster visibility with a single PromQL endpoint
ImprovedMonitoring reliability with high availability

Companies choose Procedure because:

Production-grade Thanos operations experience across multiple deployments
Deep Prometheus and Kubernetes infrastructure expertise
Honest assessments - we'll tell you if Thanos isn't the right fit
Knowledge transfer built into every engagement
Experience across SaaS, fintech, and enterprise infrastructure

Testimonials

Trusted by Engineering Leaders

What started with one engineer nearly three years ago has grown into a team of five, each fully owning their deliverables. They've taken on critical core roles across teams. We're extremely pleased with the commitment and engagement they bring.
Shrivatsa Swadi
Shrivatsa Swadi
Director of Engineering · Setu
Setu

Why Quality Matters

Poor engineering costs you

Storage Cost Spiral

Storing months of metrics on local SSDs is expensive and doesn't scale

Monitoring Blind Spots

Separate Prometheus instances per cluster with no unified view

Single Point of Failure

Prometheus downtime means monitoring gaps and missed incidents

Knowledge Silos

Complex multi-cluster setups that only one person understands

Premium development is an investment in

Cost-efficient long-term storage
Global cross-cluster visibility
High availability monitoring
Team-wide operational confidence

Prometheus Metrics Growing Faster Than Your Retention Budget?

We'll audit your current setup and tell you whether Thanos is the right move - or if something else makes more sense.

Schedule a Call

No sales pitch. Just an honest conversation.

Ready to Discuss Your
Thanos Consulting Services Project?

Talk directly with engineers, not sales. We'll assess your monitoring stack and give honest next steps - even if that means you don't need Thanos yet.

Loading calendar...

Frequently Asked Questions

Thanos is a CNCF Incubating project that adds long-term storage, high availability, and global querying to Prometheus. It works alongside your existing Prometheus instances - you don't replace Prometheus, you extend it. Thanos ships metrics to cheap object storage (S3, GCS, Azure Blob), deduplicates data from HA Prometheus pairs, and gives you a single PromQL endpoint to query metrics across all your clusters.