We recently had the opportunity to attend a session hosted by Apptio where the authors of ‘Cloud FinOps’ (JR Storment and Mike Fuller) spoke about their experiences and approach to practicing financial operations in the cloud. Although ‘FinOps’ is a relatively new term in the industry, it has relevance for many organisations grappling with proactive response to cloud being consumed at scale throughout various teams.

Anyone that is starting on the cloud journey or dealing with the challenges of distributed and escalating usage efficiencies should read the book.  This article will provide some key points that we found important to consider as the FinOps competency gathers pace in the industry.

FinOps What, Why and Organisation

The simple definition that summarises FinOps you take away is the following:

“FinOps = Teams working together + Just-in-time processes + Real-time reporting”

One of the things that is clear in FinOps success is the importance of collaboration.  It is the first in a series of principles.  The key principles are: team collaboration, business value driven decisions, everyone owns usage, reports are accessible and timely, a central team drives FinOps and advantage needs to be taken of the variable cost models.

At its core, FinOps is founded on cost accounting or unit economics.  To get to unit economics relies on “tagging, cost allocation, cost optimisation and FinOps Operations”.  Associating a spend of X with a revenue of Y is one of the key tenants.  Cloud spend is having a major impact on organisational balance sheets.  Procurement no longer has control as the power has shifted to engineers.  FinOps balances the organisation cloud struggle, balancing operational + functional control with new high speed decision making.

Instead of central approvals like through procurement, FinOps builds spend visibility for the appropriate distributed teams to create accountability.  What the central FinOps team does is tie engineering to finance.  The central team drives the rules of cost allocation and disseminates reports and coaching.  Entire organisations need to shift from central cost control to shared responsibility.  The FinOps team is generally housed under the COO and matrixed through to the Directors of Engineering and Finance.  

The book also notes the need for a common language between Finance and Operations.  This includes foundation terms and specific finance terms for some of the engineering team.  The cloud bill formula driving costs is defined as: Spend = Usage x Rate.  Most successful companies decentralise using less (usage or avoiding costs) and centralise paying less (rates or RI & CUD commitments).  Distributed teams are empowered with usage optimisation recommendations for them to enact infrastructure change decisions.

FinOps Lifecycle

The FinOps lifecycle is split into 3 parts - Inform, Optimise, Operate.  Ideally you would start at Inform to provide a base, but the lifecycle is really more circular.  Engage all your cross functional teams early - particularly finance and engineering.  Provide teams with granular, real-time visibility into their spending, ensuring to fully load and allocate costs.  Start small though and remember - Crawl, Walk, Run.  

Inform

Considerations for informing and evaluating include: Unit Economics (tying cloud spend to actual business outcomes), Culture (improving communications and breaking through silos to improve cloud usage), Speed of Delivery (controlled through trade off between cost and quality) and Value to the Business (management review of cloud spend to value).  The book also lists a great set of questions for getting started with the business dialogue.

To be successful, it is important to focus on organisation and culture.  Key tasks include aligning executives, educating engineers, ensuring skills in teams and bridging the finance & IT divide.  Each team should have a scorecard to help understand where to improve and a comparative view.  

Activity based costing is really the target in accounting for the costs to enable distributed teams to affect spending.  Cost allocation is the vital link between business value and cloud spend.  A central team does the allocation, but accountability is distributed to teams driving the cost.  Chargeback or showback will create accountability.  Although definitive tagging is ideal, it may not be possible for shared costs resources like S3, Kubernetes, Cloudwatch, shared clusters and some RI support/amortization costs..  Shared costs can be split by using Proportionally, Even Split or Fixed approach.

Definitive allocation can be done through tag and hierarchical based methods.  Ideally a combination of both is used but the most basic is the hierarchical or account allocation.  Note that accounts are mutually exclusive whilst tags are not.  Tags or labels are often defined as key value pairs (like NoSQL).  There are 3 methods the book defines for adding cost allocation to usage/billing:

  • Accounts/projects/subscriptions: provided by vendor and represented in bill

  • Resource-level tag: applied by engineer or provider directly to resource

  • Post bill constructs: billing supplement using third party analytics tools or self managed

To start the crawl, keep it simple.  Start with tags focusing on cost centre/business unit, workload/service name, product, resource owner, name (intuitive) and environment tag.  Have some tags be mandatory and then a set of optional tags.

Sometimes it pays to have an account hierarchy cover the most important one or two tags - like cost centre and environment.  Then use tags for the remaining important ones.  It is best to have the tagging standard in place and communicated to engineers before commencing.  Getting it right is really important as tag rework is not easy.  You may already have started the cloud journey so this may be a consideration.   Tag hygiene is also important.  Having a culture where engineers are conscious of quality in unvalidated instances, caps sensitivity etc.  Untagged resources can also be allocated to a default cost centre and then perform the appropriate treatment.

Optimise

The optimise phase is about spending efficiently and addressing usage and rate.  The book introduces Objectives and Key Results (OKRs) as a framework for focusing on results.  The 3 key focus areas for OKRs are:  

  • Credibility (regular spend updates - daily, weekly, monthly - for specific stakeholder granularity), 

  • Sustainability (application and business facing folks maintain tag repository for meaning) and 

  • Control (drive accountability for usage control to application/product teams).  

Engineers and finance need to work together on these OKRs.  The outlook duration length is generally 3 months but watch the trends.  It is important to detect anomalies early to address quickly and avoid billing surprises.

The following are the areas of usage optimisation they consider:
  • Removing/moving resources: forgotten or orphaned resource or moving data to more efficient/optimised storage.

  • Rightsizing: based on data on CPU, memory, network and disk usage.  Ideally present multiple options to at least start discussion.  Look beyond average usage into daily, weekly, monthly or even seasonal patterns and simulate performance before rightsizing.  Block storage can be reduced by treating orphaned volumes (take a snapshot before deletion), zero IOPS block storage instance, IOPS reduction for over-catered volumes and elastic volumes.

  • Redesign: utilise scaling heuristics to take advantage of cloud elasticity and manage resources around operations work cycle.

  • RI Usage: instead of trying to clean up usage before leveraging RI, start small with a 20-25% coverage of RIs and then manage and increase in tandem with cleanup.

  • Severless: often not worth re-architecting but for greenfields development, may well be worth it.

The central FinOps team creates best practices and recommendations, then the decentralised teams address the usage.  Usage optimisation workflow should be tracked in a tool such as Jira.  Not all optimisation suggestions may be actioned.  Automated changes to environments can also be pushed out to the teams giving the application owners ability to approve an automated rightsize change.  Although cloud providers don’t provide a function to track savings and it is difficult, it can be very helpful.  Again, work collaboratively to prioritise actions that yield greater savings.

There are a number of ways to reduce the rate you are charged for the same cloud resources.  Varying service performance, availability and durability enables different rates.  The central FinOps team are best placed to advise the service offering that best fits the workload profile.  Cloud providers offer different server instance types for reserved or committed resources and convertible derivatives.  These should be fully understood when considering an approach and these are covered well in the book.  

It is important to have an RI or CUD strategy.  The book defines 5 steps for creating your first RI strategy.  Building a centralised reservation function is important for RIs.  This cannot be managed by teams.  Remember also when considering Rightsizing v Reserving - Rightsizing takes time.  When starting, you should set a small reservation coverage goal (maybe 25-30%).  Don’t delay buying RIs - do both in parallel. 

Operate

The optimise phase sets the goals. The operate phase is where decisions are taken to action and process - turning on or off resources.  The book again recommends crawl, walk, run - develop processes in small increments.  Establish the processes and then apply automation.  

There are 4 facets to process:

  • Onboarding: adding items to a new process.

  • Responsibility: each process should have an owner & clearly defined team expectations.

  • Visibility: for before & after action.  Reports should use common language, be clear and easy to understand.

  • Action: process or action that should be followed.  For example, clear processes for how rightsizing recommendations are generated.  Not every optimisation will lead to action - the decisions are important.

Measuring overall savings of teams and opening conversation will help with the process.  The book talks of Metric Driven Cost Optimisation (MDCO).  This actually covers all 3 phases and cadence on MDCO is better to be shorter than longer for optimise opportunities (monthly better than quarterly).  The core principles are:

  • Automated measurement: computers perform analysis (suiting large cloud billing data).

  • Targets: metrics needs these to drive the outcome.

  • Achievable goals: need a proper understanding of data and realistic outcomes.

  • Data driven: data is driving you to take actions.

2 key metrics are reservation coverage and utilisation.  

  • Reserved Coverage % is how much usage is being charged at reserved rates v on-demand rates (ie RI run hours/Total hours).  To get to true measure, for the denominator, you can exclude the non-coverable hours for the peaks that are smoothed off (under break even threshold).  80% is a good target but start smaller than this.  Also, converting the hours into a possible savings or cost representation is even better (eg first hour may cost more than later).  

  • Reservation Utilisation % is how much reservation is being used versus not being used.  Can also look individually or aggregate across a number of reservations.

An important consideration of FinOps is that of automation for measurement and goal tracking.  There are many automation tooling options to consider - cloud native, self-built, third party self hosted or third party SaaS.  We and the book recommend picking a specific product built for this purpose.  The main areas to automate include Usage reduction, Tagging governance and Schedule resource start/stop.  Tips for automation include: “Use it in an inform mode first, Build confidence in the automation, Do plenty of testing, Don’t build it yourself and Measure the performance”.

Conclusion

The book closes talking of Managing to Unit Economics or a FinOps Nivarna. The target is to get to unit economic spend management (what is our cost per x where x = customer, file render, user, airline seat etc). The North Star Metric (NSM) measures the business value of your cloud spend.  There may be multiple unit metrics in an organisation for specific product line or services.  Decisions should not be based on cost of cloud BUT the benefits cloud spend generates for the organisation.

In summary, the book is a great guide for establishing FinOps and for anyone starting or are in a cloud journey and are in need of a north star to navigate the tsunami of usage.

Ben
Ben // AUTHOR

Ben is a passionate leader with over 25 years of experience in leveraging latest technology to bring value based outcomes and transformation to clients.

Related
Technologies