How to Implement FinOps Successfully - Episode 4
This is the fourth (and second-last) part of our FinOps series - Optimize. Here is a list of posts in the series:
- Tidal ❤️ FinOps - Episode 1
- How to Implement FinOps Successfully - Episode 2
- How to Implement FinOps Successfully - Episode 3
Note: I am ex-AWS, so you will notice a lot more focus on AWS tools and services as examples here, however all cloud providers have similar services and tools.
The Optimize Phase
1- According to Gartner “spending on public cloud services is forecast to grow 23.1% in 2021” and by the end of 2022, businesses are expected to be spending almost $400 billion on public cloud.
These figures are astounding and they provide a very positive outlook for the public cloud migration market. However, it is still very common for businesses to believe that simply moving to the cloud will automatically save them money compared to on-premises models. This common misconception often results in bill-shock and unexpected charges.
McKinsey reported in 2020 that cloud initiatives projects are consistently over budget, and it’s estimated that 30% of these expenditures are wasted.
*Cost management is still on the list of top 5 biggest Cloud challenges, not far behind Security.
“So, how can businesses take advantage of the cloud whilst ensuring maximum return on investment? “
In this blog I will guide you through the Optimize phase of the FinOps lifecycle and how you can achieve your financial goals.
💡Remember: It is not just about saving money, but more importantly, making money! Aligning your business and project goals will ensure FinOps success.
Cloud Cost Optimization
Let me start by touching on a very important target end-state of some migrations - the “lift and shift”. Many cloud vendors and service providers still have a “lift and shift” mindset, which in some scenarios makes sense, but more often it yields inefficient use of the cloud with costly results.
According to Gartner, most organizations that do not focus on efficient use of the cloud and are very infrastructure-centric, are likely to overspend.
So as a first note, I’d like you to reconsider the “lift and shift” paradigm and have an open mind to a more “transformative” approach when thinking about Cloud Migrations.
As opposed to a lift and shift migration, where the objective is moving workloads to the cloud “as-is”, the “transformative” approach offers a balanced cloud adoption in line with the organization’s unique circumstances, goals, as well as a business value methodology - Data-driven cloud migrations accelerate your journey.
Having a “transformative” mindset helps organizations become more agile and also helps teams to make ROI-based decisions & architect for lower costs while improving productivity and/or revenue
Once teams are enabled to architect for cost containment, optimization becomes Business-as-Usual.
Secondly, and already covered in Episode 3, comes cost visibility and cost allocation strategies, very important mechanisms for when you have already moved to the cloud. Start your journey after migrating with an initial analysis of your current cloud state (regardless of whether or not you lifted and shifted or moved in a more transformative way).
Now that you, your organization, and your teams have an understanding of the benefits between a lift and shift vs a transformative migration and are empowered with clear visibility of your spend and allocation strategy, it’s time to Optimize your cloud footprint. But before I jump into the technical pillars of Optimization we need to set goals. In order to measure success, you need to have a good understanding of your Organizational goals as this will set the stage for the Optimization Phase.
Many organizations and FinOps practitioners often begin their journey by focusing on fast ways to cut cloud costs. This makes sense and is a worthy goal, but it should definitely NOT be your only goal. Keep in mind that the aim of FinOps is to help businesses become more innovative and the economic benefits of the cloud go far beyond cost savings. You should also consider other objectives and how to measure them, such as:
- Organizational Productivity, Eg. Reduce application-environment provisioning by X hrs
- Operational Resilience, Eg. Increase uptime by X%
- Business Agility, Eg. Expedite deployment of tools and technology (X% faster)
- Security Posture, Eg. Augment Security by X%
No single cloud journey is the same and different businesses (or business units) have different needs and goals. Setting targets and goals will help you monitor how you are doing against expectations. For example, some businesses are more concerned about the speed in which services are deployed, while others are more concerned about cost and budgets and others might have already moved to the cloud and are now spending more than expected. There might even be several goals within the Organization such as in different business units and teams. So it is really important to understand and set your goals and KPIs properly to be able to identify where things aren’t working quite as efficiently as expected.
What are your goals?
- Reduce idea to service delivery time?
- Spend the right amount to deliver bigger benefits?
- What benefits? Increased revenue, faster project delivery, cost efficiency?
Technical Pillars of Optimization
Reducing your cloud spend can be tough and it can take time, but now that you understand your goals and KPIs, I will guide you through some aspects of the Cost Optimization technical pillars to help you get started.
Unlike traditional IT, you do not need to provision resources months in advance and you don’t need to keep resources that are not needed. Entering this phase means focusing on near-real-time decision-making, anomaly detection, and spending efficiency.
Let’s go through six technical Pillars that can help organizations accelerate their Cloud Cost Optimization journey:
You shouldn’t just right-size for cost savings but also ensure that you are selecting the cheapest instance size and family while still meeting your performance needs. This will be typically based on CPU, RAM, storage, licensing, and network needs. Remember that it’s crucial to ensure that peak (max) values are looked at, not only average, as it can be misleading.
For a pre-migration analysis, you can use discovery tools to determine metrics such as Peak CPU and RAM usage. This will allow you to choose the right instance sizes from day 1 which takes much less effort than right-sizing after deployment.
Every recommendation can be an opportunity for discussion to ensure services are not impacted.
Decommission unused Resources
Do you think only of compute when talking about rightsizing? Many people do, but bear in mind that rightsizing goes beyond compute. Ensure you look at Storage and other resources, such as the ones mentioned below as there is a lot of potential savings associated with optimized selection of these resources too…
- Unattached Elastic IPs
- Unattached EBS
- Idle or unattached load balancers
- Incomplete S3 transfers
- Idle EC2, RDS, or Redshift instances
- Clearing old EBS snapshots
Neglecting to rightsize your resources in the cloud leads to the loss of one of the cloud’s most important value propositions: Elasticity.
Rightsizing Tools & Guides examples:
Tidal Tools - Discover and analyze your applications.
EC2 Rightsizing Report - If you are already in AWS
Tidal Calculator - if you want an optimized footprint before you migrate
Trusted Advisor - Identify idle resources
Azure Monitor - Helps maximize the availability and performance of your applications and services
- Increase Elasticity
In the cloud, there is less need to forecast customer demand far into the future. You can provision resources in a matter of minutes, or even automatically, rather than taking weeks or months like in traditional IT.
Here are some examples of how to leverage the elasticity of the cloud to lower your costs:
Many non-production workloads can be turned off outside of working hours.
You can create a policy to turn resources off at night and then manually turn them on in the morning, as an example.
During my time at AWS, we often spoke about one particular organization that connects their building access passes to resource on-off. When they swiped out, the resources would automatically turn off, and when swiping in, they came back on!
Maximize savings by automatically turning off resources outside working hours
- Reduce production spend via Auto Scaling
Auto Scaling helps improve elasticity as well. With horizontally scaling architecture, Auto Scaling helps you meet demand increases by adding more resources, and similarly removing resources when they are not needed.
This can be based on rules such as CPU utilization reaching a user specified threshold like requests per second.
Increasing Elasticity Tools & Guides:
Tidal Saver - Reduce non-prod spend via scheduling
AWS Auto-Scaling - Guide
- Spot Market for Resources
Spot Instances are unused capacity offered at steep discounts that cloud providers can reclaim back with a short notice (ideal for workloads that can be interrupted)
Spot instances are suitable for BigData, CI/CD, WebServices, HPC or containerized workloads and ideal for fault-tolerant, flexible, loosely coupled and stateless workloads.
In practice, you will likely want to combine several of the following instance purchase options to optimize your workload deployment.
No commitment priced per second spiky or unidentified workloads
1 or 3 year commitment discounted rate Steady usage
Spare EC2 capacity
Up to 90% discount off on demand pric
Fault-tolerant, flexible and stateless workloads
Spot Instances Tools & Guides:
- Reserved instances (RIs) and Savings Plans (SPs)
“RIs” or “Commited use discounts” are a long term commitment between customer and the cloud provider in exchange for a great discount. They are often used for steady workloads (always on) but it can also be used for workloads that are not always on and still achieve savings.
“SPs” on the other hand is a more flexible pricing model but also offers lower prices compared to On-Demand. You are still committing to a specific usage (measured in $/hour) for a period of time but just as monetary values.
RIs & SPs Tools & Guides:
3rd Party Tools
Amazon Elastic Block Store (EBS) or Azure Managed Disks provides persistent block storage volumes used with instances/VMs. Block storage costs vary based on performance, size, time. Optimizing each of these to match your requirements can yield material savings.
In AWS the default EBS type is General Purpose GP2 storage which has a high performance SSD providing up to 10000 IOPS. There are also cheaper options like Throughput Optimized SC1 which saves about 50% compared to GP2 and Cold SC1 saving about 75% compared to GP2.
For EBS, you pay for the amount you provision so where possible pick a volume size that fits to your storage and throughput needs to reduce the costs.
Remember that, by default, Amazon EBS root device volumes are automatically deleted when the instance terminates. However, by default, any additional EBS volumes that you attach at launch, or any EBS volumes that you attach to an existing instance persist even after the instance terminates.
EFS and S3
Services like EFS and S3, scale dynamically, so it isn’t necessary to pick a “rightsize” since you pay for only what you store. You should, however, select the appropriate storage class that matches your access, replication, and business requirements.
AWS’s object storage service S3 or Azure Blob Storage also have different types of storage. Each one is typically cheaper than the next:
- Hot (S3 standard) or Azure Blob Storage = Active data
- Warm (S3 infrequent access) or Azure Cool Blob Storage = Infrequently accessed data
- Cold (Glacier) orAzure Archive Blob Storage = Archive data
💡 Remember: pick the right storage class for your business needs.
EFS (or Azure Files) is a file Storage Service for EC2 instances
There are 2 types of EFS offers: Standard or One Zone storage classes. Standard storage classes store data within and across multiple availability zones (AZ). One Zone storage classes store data redundantly within a single AZ, at a 47% lower price compared to file systems using Standard storage classes, for workloads that don’t require multi-AZ resilience.
Storage Optimization Tools & Guides:
S3 Analytics - Automatically transition infrequently used folders and objects into S3 infrequent access to save cost, and it cleans up incomplete S3 transfers
S3 Intelligent Tiering - Automatic Cost Optimization for Amazon S3
- Design for Cost efficiency
Historically, organizations were designing far into the future for availability, performance and security. With the Cloud, this changes completely! By leveraging the true benefits of the cloud, you now have the ability to be a lot more precise on your design - no need to over-provision any more.
Remember that optimal cloud architectures need to be designed with cost in mind and once teams are enabled to architect for cost, optimization becomes business as usual with less need for resource/time investment.
Examples of designing for Cost Efficiency:
- Serveless & Lambda
- Automation & cloudformation to save time and reduce error
- Cheaper Region
- Static Web hosting (S3)
- Open source platforms & databases (reduce licensing costs)
- Application Load Balancers
- Using Consolidated Billing to leverage RI Volume Discounts
Understand where the low-hanging fruit is in your business.
The above pillars will help your business understand where the low-hanging fruit is. If quick wins are part of your goals, you will find that purchasing RIs and SPs is a faster path to cost reduction compared to Rightsizing.
The Cloud is ever-evolving, so organizations need to ensure they are evolving with it. Reviewing processes, procedures and KPIs for Cost Optimization should be a continuous process.
As I always mention on my blog posts, cost optimization is a lot more than saving money, it’s a value-driven strategic move to create more business value and drive revenue.
To ensure an optimal return of investment, cloud cost optimization needs to be practiced both as part of an initial cloud migration and as an on-going effort. I see a great opportunity for a cultural shift - Team collaboration is the key!
The Cloud can deliver great value over traditional IT deployments but without optimization the real value of the cloud will be left on the table. The cloud provides great visibility and understanding to an Organization’s IT spending, but that information must be used to develop and continuously refine the strategy, controls, and operating model. When implemented and operated in an optimized manner, the Cloud is without a doubt, one of the most crucial and promising technology investments an organization can make.
💡 Set your goals - Goals allows people to understand where things are and where they should be.
💡 It’s not always about reducing costs - it’s about spending efficiently.
💡 A “transformative” mindset helps organizations become more agile while also helping teams to make ROI-based decisions & architect for lower costs. The need for business agility and automation is increasing drastically, to keep up with the market demand and lack of available skilled personnel. Leveraging Tidal practices and approach, allows teams to scale by working faster/smarter and to ride this wave of growth - But that’s a topic for another post.
💡 Cost Optimization is a continuous process and team effort.
Up next: The Operate phase - Continuous improvement & Operations