In 2023, our cloud bill hit a number that finally got the CFO's attention. Not because it was unexpected — engineering had been warning about the trajectory for two quarters — but because it had crossed a threshold that made it visible in board-level reporting. The conversation that followed was uncomfortable, but it led to the most productive cost reduction effort I've been part of: a six-month FinOps program that cut our AWS spend by 34% without a single feature being delayed.
This is what I learned from that program, updated with what's changed in 2026. I'll be specific about the tactics that worked, the ones that sounded good but didn't, and the organizational dynamics that determine whether FinOps succeeds or stalls at the planning stage.
1. What FinOps Actually Is (and Isn't)
FinOps is the practice of bringing financial accountability to the variable spend model of cloud infrastructure. The FinOps Foundation — the nonprofit that governs the practice — defines it as "an operational framework and cultural practice which maximizes the business value of cloud, enables timely data-driven decision making, and creates financial accountability through collaboration between engineering, finance, and business teams."
Note what that definition doesn't say: it doesn't say "cut spending." FinOps is not a cost-cutting mandate. It's a visibility and accountability practice. Sometimes FinOps leads you to spend more — in a specific area where underinvestment is causing outages or slow response times that cost more than the compute would. More often, it leads to significant reductions. But the mechanism is always the same: make the cost visible, attribute it to the people making the decisions, and give them the information they need to make better choices.
The distinction matters because "we need to cut cloud costs" and "we need FinOps" require different organizational approaches. Cost-cutting is a project. FinOps is a cultural change. The former has an end date; the latter becomes permanent operating practice.
2. The Three-Stage FinOps Maturity Model
The FinOps Foundation's maturity model (Crawl, Walk, Run) is a useful map of the journey. I've seen organizations try to skip stages and fail; the progression is meaningful.
Crawl: Visibility
At the Crawl stage, the goal is basic visibility. You're answering: what are we spending, on what, and who owns it? This requires tagging infrastructure consistently — every resource tagged with environment, team, service, and cost center. It requires a cost reporting mechanism that non-finance people can access (AWS Cost Explorer, a simple Grafana dashboard, whatever works). And it requires someone responsible for looking at the data regularly.
Most organizations think they're past Crawl when they're not. The tell: can you tell me within five minutes how much a specific team or service spent last month? If not, you're still Crawling. In my experience, 60–70% of organizations claiming to practice FinOps are at the Crawl stage with Walk aspirations.
Walk: Optimization
At Walk, you have reliable visibility and you're using it to make decisions. This is where Reserved Instance purchases, right-sizing work, and Savings Plans start happening systematically rather than ad hoc. You have a review cadence — weekly or bi-weekly cost reviews with engineering leads. Anomalies are caught and investigated. Teams receive regular showback reports showing their spending.
Walk is where most cost reduction happens. The low-hanging fruit — idle resources, oversized instances, forgotten dev environments — gets harvested here. In our program, we achieved roughly 20% of our total savings in the Crawl-to-Walk transition, just by finding and terminating resources nobody was using.
Run: Continuous Optimization
At Run, cost optimization is embedded in the engineering workflow. New infrastructure decisions include cost modeling. CI/CD pipelines check for cost policy violations. Engineers have real-time cost visibility in their developer portal. Machine learning models predict cost anomalies before they appear on the bill. Committed use discounts are managed dynamically based on usage forecasts.
Very few organizations sustain Run-level FinOps. It requires continuous investment and organizational commitment. But the compounding returns are real: a Run-maturity organization typically spends 35–50% less on cloud than a comparable organization at Crawl, and their engineering team makes better architecture decisions because cost is a visible signal in their decision-making process.
3. The Main Sources of Cloud Waste
Before you optimize, you need to know where the waste is. Based on industry benchmarks and my own experience, cloud waste breaks down roughly as follows:
Idle and Underutilized Resources (30–40% of waste)
EC2 instances, RDS databases, and load balancers that are running but doing nothing — or doing so little that they could be replaced with something much smaller. This includes development and staging environments that run 24/7 when they're only used during business hours, and QA environments provisioned for a project that ended six months ago.
The metric to track: average CPU utilization across your fleet. Industry surveys consistently show average EC2 CPU utilization below 20% in organizations without FinOps programs. If your average CPU utilization is below 15%, you have significant right-sizing and scheduling opportunities.
Overprovisioning (25–30% of waste)
Engineers provision resources based on peak requirements — or imagined peak requirements — and never revisit. An m5.4xlarge instance handling a workload that peaks at 4 vCPU and 8GB RAM 99% of the time is running on twice the hardware it needs. At AWS prices, that's roughly $150/month wasted on a single instance. Multiply by hundreds of instances.
Zombie Resources (10–15% of waste)
Orphaned snapshots, unused Elastic IPs, unattached EBS volumes, forgotten S3 buckets that were never deleted after a project ended. These don't sound significant individually, but they accumulate into substantial monthly spend at scale. I've seen organizations with hundreds of EBS volumes unattached to any instance — paying $0.10/GB/month on storage that nobody needed anymore.
Suboptimal Purchasing (15–20% of waste)
Running on-demand instances for stable, predictable workloads instead of Reserved Instances or Savings Plans. This is arguably the highest-ROI FinOps optimization because it requires no engineering changes — just a procurement decision. A 1-year EC2 Reserved Instance saves 36% over on-demand. A 3-year Compute Savings Plan saves up to 66%.
Data Transfer Costs (5–10% of waste, but growing)
Egress costs — paying to move data out of your cloud provider — are one of the most overlooked expense categories. They're also one of the most architectural: reducing egress often requires changing where data is stored or processed, not just flipping a config flag. I'll cover this separately because it's increasingly significant as data volumes grow.
4. EC2/VM Right-Sizing Strategies
Right-sizing is the process of matching instance type to actual workload requirements. It's one of the most reliable cost reduction tactics, but it requires data collection, analysis, and careful coordination with teams to avoid performance regressions.
The workflow I use:
- Collect 2–4 weeks of CloudWatch metrics for all EC2 instances: CPU utilization (avg, p95, p99), memory utilization (requires CloudWatch agent), network I/O, and disk I/O. Do not use averages alone — p99 CPU matters because your instance needs to handle peak load, not just average load.
- Flag candidates: instances where p99 CPU is below 40% and p99 memory is below 60% are likely candidates for right-sizing. Instances where average CPU is below 5% are candidates for termination or scheduling.
- Check for burst capacity: T-series instances use burst credits. An instance that looks idle in 5-minute CloudWatch metrics might be bursting regularly at 1-minute granularity. Check burst credit balance trends before right-sizing T-series instances.
- Propose and validate with the owning team: never right-size without the team's knowledge. Send a proposed change, give them a week to review, and ask them to monitor after the change. The owning team knows about bursty workload patterns that won't show in 2-week CloudWatch data.
- Change in non-production first: validate the right-sized instance in staging before applying to production. A 30-minute load test at production-like traffic is worth the effort.
5. Reserved Instances vs. Savings Plans vs. Spot: The Decision Framework
| Dimension | Reserved Instances | Savings Plans | Spot Instances |
|---|---|---|---|
| Discount vs on-demand | Up to 72% | Up to 66% | Up to 90% |
| Commitment | 1 or 3 years, specific instance type | 1 or 3 years, $/hour spend commitment | None (interruptible anytime) |
| Flexibility | Low (locked to instance family/region) | High (any instance in the compute family) | High (choose from available capacity) |
| Availability guarantee | Yes | Yes | No (can be interrupted) |
| Best for | Steady-state workloads, specific instance needs | Steady-state workloads, instance flexibility needed | Batch, fault-tolerant, stateless workloads |
| Risk | Stranded commitment if workload changes | Moderate stranded risk, easier to adapt | Workload interruption risk |
| Marketplace | Can resell unused RIs | Cannot resell | N/A |
My recommended framework for purchasing decisions:
- Start with Compute Savings Plans for your baseline steady-state EC2 spend. The flexibility to change instance families and sizes without losing the discount is worth the slightly lower maximum discount versus RIs.
- Layer Standard Reserved Instances on top for workloads you're confident will run on specific instance types for 3+ years (RDS is the classic example — databases rarely change instance families).
- Use Spot aggressively for batch processing, ML training, CI/CD runners, and any stateless, fault-tolerant compute. If your workload can't tolerate 2-minute interruption notices, it's not a Spot candidate. If it can, Spot savings are substantial.
One mistake I see repeatedly: organizations buy 3-year RIs for compute workloads that are actively being Kubernetes-ified. By the time the RI is half used, the workload moved to EKS and the RI is stranded. Match your purchasing horizon to your architectural stability horizon.
6. Kubernetes Cost Optimization
Kubernetes cost optimization deserves its own section because it has unique mechanics that don't apply to plain EC2 workloads.
Resource Requests and Limits
In Kubernetes, the cost of a node is determined by the instances running in the node pool. The cost of scheduling is determined by pod resource requests, not actual utilization. A pod that requests 4 vCPU but uses 0.2 vCPU occupies 4 vCPU of scheduling capacity — and that capacity can't be given to another pod. Most Kubernetes clusters I've seen have pod resource requests set significantly higher than actual utilization. The fix is mechanical but requires data: use Vertical Pod Autoscaler (VPA) in recommendation mode to get suggested request/limit values, then systematically update deployment configurations.
Cluster Autoscaler vs. Karpenter
The Kubernetes Cluster Autoscaler (CA) adds and removes nodes based on pod scheduling pressure. It works, but it's reactive: it adds nodes when pods are pending and removes them after a delay. Karpenter (AWS-native, increasingly supported elsewhere) is a more aggressive autoscaler that provisions exactly the right instance type for the pending workloads rather than using a homogeneous node pool. Karpenter typically reduces cluster cost by 15–30% compared to CA-managed clusters by right-sizing nodes to workload needs and using Spot instances more aggressively.
Namespace-Level Cost Attribution and Budgets
Kubernetes namespaces map well to teams or services and are the natural unit for cost attribution. Label your namespaces with team and cost center, use Kubecost or OpenCost to attribute pod cost to namespaces, and implement ResourceQuotas to set hard limits on namespace resource consumption. ResourceQuotas prevent any single team from overprovisioning — and they force teams to make explicit capacity decisions rather than leaving requests at "maximum, just in case."
Environment Scheduling
Development and staging Kubernetes clusters don't need to run 24/7. A KEDA-based or custom scheduler that scales development cluster node pools to zero outside business hours (say, 8pm to 7am weekdays, and all weekend) can cut dev/staging cluster costs by 50–60%. This requires stateless workloads — you can't scale to zero if your dev environment maintains long-lived state — but most dev/staging environments should be stateless anyway.
7. Storage Optimization
Storage is the quiet cost category. Compute gets attention because it's billed by the hour and shows up dramatically in anomaly reports. Storage accumulates slowly and invisibly.
S3 Intelligent-Tiering
S3 Intelligent-Tiering automatically moves objects between access tiers based on access patterns. Objects not accessed for 30 days move to Infrequent Access; objects not accessed for 90 days move to Archive Instant Access. You pay a small monitoring fee per object, but for buckets with mixed access patterns, Intelligent-Tiering typically saves 25–40% on storage costs. Enable it on buckets with more than 128KB average object size and unpredictable access patterns. For buckets with predictable patterns (logs that are never read after 7 days, for example), use explicit lifecycle policies — they're cheaper than Intelligent-Tiering's monitoring overhead.
EBS Volume Hygiene
Unattached EBS volumes are pure waste — you're paying for storage that nothing is using. Build a weekly cleanup job that identifies volumes in "available" state (not attached to any instance) for more than 7 days, snapshots them for safety, and sends the owning team a notification. After 30 days with no response, delete them. This sounds aggressive; in practice, fewer than 5% of "orphaned" volumes turn out to be needed when you actually investigate.
Also: review EBS volume types. gp2 volumes that were created before gp3 was available can be migrated to gp3 for the same performance at 20% lower cost. This is a pure win — same or better performance, lower price. The migration is online (no downtime) and takes about 15 minutes per volume.
RDS Storage Optimization
RDS instances provision storage in advance, and Amazon doesn't allow you to shrink allocated storage — only grow it. This means databases provisioned at "we might need this much" sizes end up paying for storage they're not using indefinitely. The fix for new databases is to provision conservatively and use RDS's storage autoscaling to grow as needed. For existing oversized instances, the only recourse is snapshot + restore to a new, correctly-sized instance.
8. The Data Transfer Cost Trap
Egress costs catch organizations by surprise in two patterns:
Cross-region and cross-AZ data transfer: AWS charges $0.01–$0.02/GB for data transferred between Availability Zones in the same region. This sounds trivial until you're running a high-throughput microservices architecture where every inter-service call traverses AZ boundaries. At 1TB/day of inter-service traffic, that's $3,000–$6,000/month in hidden transfer costs. The fix: ensure that services that communicate heavily are co-located in the same AZ (Kubernetes pod affinity rules, or service mesh locality routing), and use PrivateLink for high-volume AWS service access instead of going through NAT gateways.
Internet egress: Data transferred from AWS to the internet is $0.09/GB for the first 10TB/month (after a free tier). For a media platform or any service that transfers large volumes of data to end users, this is a significant line item. CDN caching (CloudFront, Cloudflare) is the primary mitigation — moving frequently-accessed content to the CDN edge dramatically reduces origin egress. For large-file delivery, S3 Transfer Acceleration is often worth evaluating.
Callout: The NAT Gateway Surprise
NAT Gateways are one of the most common sources of unexpected cloud cost. Every byte of traffic that flows through a NAT Gateway incurs $0.045/GB processing charge, on top of the hourly gateway cost ($0.045/hour ≈ $32/month). For Kubernetes clusters where all pod egress goes through NAT Gateways, this can easily reach thousands of dollars monthly. Audit your NAT Gateway data processing costs. Routes to AWS services (S3, DynamoDB, SQS) should use VPC Endpoints (free, or flat hourly fee) instead of routing through NAT Gateways.
9. FinOps Tool Comparison
| Tool | Best For | Strengths | Limitations |
|---|---|---|---|
| AWS Cost Explorer | AWS-native visibility | Free, detailed AWS coverage, RI/SP recommendations | AWS-only, limited custom attribution |
| Cloudability | Multi-cloud enterprise | Strong RI management, finance-friendly reporting | Expensive, complex setup, engineering-unfriendly UI |
| Apptio Cloudability | Large enterprise, ITFM integration | Deep Finance/IT integration, compliance reporting | Enterprise pricing, high implementation overhead |
| CAST AI | Kubernetes-heavy orgs | Automated right-sizing, Spot management, K8s-native | AWS/GCP/Azure only, autonomous mode requires trust |
| Kubecost / OpenCost | K8s cost attribution | Namespace/workload granularity, open source option | K8s-only, needs integration for complete picture |
| Infracost | Shift-left cost in CI/CD | Cost diff on IaC PRs, developer-friendly | Estimation only (not actual spend), Terraform focus |
For most organizations getting started, I recommend beginning with AWS Cost Explorer (free, native) + Kubecost (open source, if you run Kubernetes) + Infracost in your CI/CD pipeline (shifts cost awareness left to where infrastructure decisions are made). This costs essentially nothing and provides enough visibility to execute the Crawl-to-Walk optimization work. Add a commercial tool when your FinOps practice is mature enough to extract value from its advanced features.
10. Getting Engineering Teams to Actually Care
This is the hardest part of FinOps. The technical work — finding idle resources, purchasing RIs, optimizing storage — is mechanical. The organizational work of making engineers care about cloud cost is where FinOps programs fail.
Showback vs. Chargeback
Showback: teams receive reports showing what they spend. The cost stays in a central IT budget; the team just sees the number. Chargeback: the cost is actually allocated to the team's budget. They own it.
Chargeback is more effective at changing behavior but requires mature cost attribution (if you can't accurately attribute cost to teams, chargeback creates anger, not accountability). Start with showback. Build trust in the accuracy of the data. Then, once teams accept the numbers, begin moving toward chargeback. The moment a team lead has to explain a large cloud bill in their quarterly business review, FinOps moves up their priority list.
Making Cost a First-Class Engineering Metric
Cost should appear in the same dashboard as latency, error rate, and throughput. If engineers see their service's p99 latency every day but only see cloud cost in a quarterly email from finance, they're not going to optimize for cost. I've had success adding a "cost per request" metric to service dashboards. For each service, calculate (monthly cloud cost attributed to the service) / (monthly request count). Changes to this metric are visible alongside performance metrics.
FinOps Champions
Identify one engineer per major team who is interested in cost optimization and empower them as a FinOps champion. They attend the bi-weekly FinOps review, bring findings back to their team, and advocate for cost considerations in architecture decisions. This is a 5–10% time commitment, not a full-time role. Engineers who enjoy this work find it satisfying — they're solving puzzles with clear metrics. Recognize their contributions publicly.
Callout: The "Who Owns This?" Problem
The most common conversation in FinOps reviews: "Here's $8,000/month of untagged resources — who owns them?" This is a tagging problem, which is ultimately a culture problem. The only durable solution is making untagged resources a blocker: either prevent untagged resources from being created (via AWS Config rules or Terraform validation policies) or enforce a cost center allocation for untagged resources that defaults to the team most likely to have created them. Optionally, put the cost of untagged resources in the FinOps team's budget — they'll fix the tagging problem quickly.
11. A Real 30% Reduction: What We Actually Did
Here's the breakdown of how we achieved a 34% reduction over six months, with approximate contribution from each initiative:
Phase 1 (Months 1–2): Visibility and Quick Wins (saves: ~8%)
Audited tagging compliance and fixed gaps (1 week). Terminated 47 EC2 instances in dev/staging environments that had been running for 90+ days with no traffic. Deleted 200+ unattached EBS volumes (after snapshotting). Turned off 12 RDS databases in non-production environments outside business hours using Lambda + CloudWatch Events. Total saving: approximately $18K/month. Time investment: two engineers, half-time, for 6 weeks.
Phase 2 (Months 2–4): Right-Sizing and Purchasing (saves: ~18%)
Right-sized 120 EC2 instances based on CloudWatch data. Migrated all gp2 EBS volumes to gp3. Purchased Compute Savings Plans to cover 70% of steady-state EC2 spend (had been 100% on-demand). Migrated CI/CD runners to Spot Instances with 3x capacity for the same cost. Total saving: approximately $40K/month against pre-program baseline. Time investment: three engineers, significant coordination with 12 product teams.
Phase 3 (Months 4–6): Kubernetes and Architecture (saves: ~8%)
Deployed Karpenter to replace Cluster Autoscaler on EKS clusters. Implemented namespace ResourceQuotas. Fixed NAT Gateway routing (VPC Endpoints for S3 and DynamoDB). Scaled development EKS node pools to zero outside business hours. Total saving: approximately $18K/month. Time investment: two senior engineers, four months (this work requires architectural understanding, not just configuration changes).
12. FinOps Team Structure
The FinOps Foundation defines three levels of FinOps practitioner role:
FinOps Practitioner: An engineer or finance analyst who understands cloud billing, can run cost analysis, and implements optimization recommendations. This is the "doing" role. Every organization should have at least one.
FinOps Lead: Owns the FinOps program, manages the review cadence, coordinates with engineering leads and finance, prioritizes the roadmap. Requires both technical and organizational skills. At 200+ engineers, this should be a full-time role.
FinOps Steering Group: Engineering leadership, finance, and business stakeholders who review overall cloud spend against business outcomes quarterly and make resource allocation decisions. This isn't a function you hire for; it's a meeting you establish.
For organizations under 100 engineers: one FinOps-focused engineer or SRE (50% time) + a monthly cost review with the VP Engineering + quarterly steering review with Finance. You don't need a dedicated FinOps team; you need FinOps practices embedded in your SRE or platform engineering function.
13. 2026 Trends: AI Cost Prediction and GreenOps
AI-Powered Cost Anomaly Detection and Forecasting
AWS Cost Anomaly Detection (using machine learning models trained on your specific billing patterns) has been available since 2020 but is maturing significantly. In 2026, the more interesting development is LLM-assisted cost analysis: tools that let you query your cost data in natural language ("why did our spend increase 30% last week?" "which team is responsible for this EBS cost spike?") and get analysis that previously required a FinOps practitioner to manually construct.
CAST AI and similar tools are also moving toward more autonomous optimization — not just recommending right-sizing but executing it automatically within defined guardrails. The "Autopilot" mode in CAST AI, for example, continuously adjusts pod resource requests based on live metrics without human intervention. Engineering teams that previously resisted right-sizing recommendations because of implementation burden are more receptive to automated optimization that happens invisibly.
GreenOps: Carbon Alongside Cost
Sustainability reporting requirements (particularly for EU companies under CSRD) are making carbon footprint tracking a FinOps concern, not just a CSR one. AWS Customer Carbon Footprint Tool and similar GCP/Azure equivalents provide emissions data by service and region. The overlap with cost optimization is significant: idle resources waste money and emit carbon. Spot instance use reduces carbon footprint by improving utilization of existing hardware capacity. In 2026, GreenOps is moving from "nice to have" to regulatory necessity for many organizations, and the tooling is maturing to match.
FinOps for AI/ML Workloads
GPU compute is the new EC2: expensive, easy to over-provision, and increasingly necessary. A single p4d.24xlarge instance (8x A100 GPUs) costs $32/hour on demand. LLM fine-tuning jobs left running accidentally can generate five-figure bills in a weekend. FinOps programs are adding GPU compute as a specific focus area: mandatory job duration limits, automated termination of idle training jobs, Spot instances for training workloads (with checkpoint-based fault tolerance), and budget alerts that fire at 50% of expected cost rather than waiting for a bill.
14. Key Takeaways
- FinOps is a practice, not a project. A one-time cost reduction effort degrades. Sustainable savings require ongoing visibility, regular reviews, and cost accountability embedded in engineering culture.
- Start with tagging and visibility. You cannot optimize what you cannot see. Invest two weeks in tagging compliance before any optimization work. Every dollar of optimization that follows depends on accurate cost attribution.
- The biggest ROI comes from purchasing changes, not architecture changes. Compute Savings Plans and right-sizing are faster to implement and less risky than architectural changes. Do them first. Architectural optimization (NAT Gateway routing, K8s scheduling) comes after.
- Chargeback changes behavior; showback builds trust. Don't jump straight to chargeback. Spend two or three quarters on showback, verify your attribution is accurate, and then move to chargeback when teams believe the numbers.
- Karpenter + Spot Instances is a significant K8s cost lever. If you run Kubernetes and you're not using Karpenter with a mixed on-demand/Spot strategy, you're likely leaving 20–35% of your cluster cost on the table.
- Add Infracost to your CI/CD pipeline today. It's free, it's ten minutes to set up, and it shifts cost awareness to the exact moment when infrastructure decisions are made (the PR). This is the highest-leverage, lowest-effort FinOps practice available.
- GreenOps and AI cost management are the new FinOps frontiers. If your engineering organization is running significant AI/ML workloads or has sustainability reporting obligations, plan to extend your FinOps program to cover GPU cost management and carbon attribution in 2026.
Want to automate your FinOps reporting? I built tooling that makes it easier — Check it out
댓글
댓글 쓰기