Search

How Much Money Can Azure Save You?

A review of 24 scenarios that demonstrate the cost savings you can gain when you augment your storage for your Data Warehouse using Azure Data Lake Gen2 storage with Azure Synapse Analytics (formerly SQL Data Warehouse).


Author: Juan A. Vega, Simatree's Director of Technology and Architecture

Publish Date: April 30th, 2020


In 2019, Microsoft introduced their latest version of their Cloud Data Warehouse and renamed it from Azure SQL Data Warehouse to Azure Synapse Analytics. In a previous article, I went into detail on how Microsoft has broken down barriers and made it much easier to utilize Azure Data Lake storage within your Data Warehouse. Meaning, you no longer need to move data from the Data Lake to the Data Warehouse and vice-versa. Instead, Azure Synapse Analytics makes it very easy to query your relational data (stored in Azure Synapse Analytics) and non-relational data (stored in the Azure Data Lake) inside your Data Warehouse like in a single SQL script. You can even create a join query which joins these two data sets using ANSI standard SQL. The article also discussed how you can use Parquet file format in Azure Data Lake which increases query performance up to 13x compared to other file formats.


So how much money can I save by using Azure Data Lake to augment my storage costs with Azure Synapse Analytics? To figure this out, I dove into 24 different scenarios.

Below are my summarized points on this analysis:


Bottom line is the savings can be huge. As you scroll down this article you’ll notice cost savings between 17% up to 70% savings. A lot depends on your company size and storage requirements for your Data Warehouse. We experimented with these three storage requirements:


  1. Small Business with 10 TB storage requirements

  2. Medium Business with 100 TB storage requirements

  3. Large Business with 1000 TB storage requirements


The larger the size the more savings you can gain. The mix of storage costs between Azure Data Lake and Azure Synapse Analytics can make a huge difference. We experimented with four storage mixes between Azure Data Lake vs Azure Synapse Analytics:


  • 0 / 100 mix (meaning 100% of the storage came from Azure Synapse Analytics (or the more expensive storage found in your Data Warehouse)

  • 50 / 50 mix (meaning 50% came from Azure Data Lake Gen2 and 50% came from Azure Synapse Analytics)

  • 70 / 30 mix (meaning 70% came from Azure Data Lake Gen2 and 30% came from Azure Synapse Analytics)

  • 90 / 10 mix (meaning 90% came from Azure Data Lake Gen2 and 10% came from Azure Synapse Analytics)


If you’re willing to sign up for 1-year commitment, or 3-year commitment, can also bring huge savings as compared to monthly pay-as-you-go. Specifically, for the Azure Synapse Analytics we experimented with the following:


  • 1-year commitment you can gain up to 37% discount overall

  • 3-year commitment you can gain up to 65% discount overall


For Azure Data Lake Gen2 you can only start saving once you get to 100 Terabytes. So, 1-year and 3-year commitments only impact Large Business with storage requirements greater than 100 Terabytes. Your Disaster Recovery and Business Continuity (DRBC) requirements can also make an impact on cost savings. In all our scenarios, we used Zone Redundant storage (ZRS) which in most cases should be sufficient.

Before we get into each scenario, there’s some major assumptions that need to be disclosed:


  • We always select Azure Data Lake Gen2 which is significantly less expensive than Azure Data Lake Gen1. Gen2 has a lot more features, and Microsoft recommends and is pushing users to Gen2.

  • For the Azure Data Lake Gen2, we used Zone Redundant Storage which means if the node fails, it will fail over to another data center in the same region. Note, you also have the choice to use Geo Redundant Storage which means if the node fails it will fail over to another data center in another region that is typically 100s of miles away. This would cost more. You also have the choice to have Local Redundant Storage which means if the node fails it will fail over to another node within the same data center. This would cost less (but if the data center goes down, you’re out of business). Thus, Zone Redundant Storage is a good comprise for your disaster recovery requirement.

  • For Azure Data Lake Gen2, we always select the file structure to use Hierarchical Namespace. This is important to have if you ever plan to use HDInsight or Azure Databricks for analysis. Furthermore, it allows you to organize your storage much like your typical file system.

  • We used the default backup strategy for Azure Synapse Analytics which is 42 retention point backups over the past 7 days.

  • For Azure Data Lake Gen2, we always select the “cool” tier. Selecting the “Hot” tier is basically your premium SSD storage which is 84% more expensive.


A significant cost for Azure Synapse Analytics is the DWU cost (Data Warehouse Units) which is your computing cost. It’s very important to note that DWU is calculated independent of storage costs. Below is the breakdown of DWU we used in our scenarios:


  1. Small Business default DWU = 200

  2. Medium Business default DWU = 1,000

  3. Large Business default DWU = 5,000


Note, these DWU costs can be raised or lowered or even paused in real time to save money. This can be done automatically using a job schedular which would run a PowerShell script. The savings could be significant. However, when you do so, all running queries will be aborted. So, it’s a great feature to save money, but it would need to be well coordinated with your production jobs.

You may be wondering, why would you experiment with a 10 / 90 mix? Isn’t that a little extreme? Well in this day in age, with so much IOT data, such as geospatial data, device log data, telemetry data, click data, log data, sensor data, etc.…. it’s very easy to see how all this collected data can be exponentially huge. You would never want to store this in your expensive data warehouse storage.


Furthermore, within your data warehouse, over time, you will accumulate a ton of data. Some of this data will be “hot” data like the most current data sets (past week, past month, past quarter, past year). But a lot of the older historical data won’t be queried that often. These older data sets are perfect for Azure Data Lake storage.


The simple goal would be to keep the most “in demand” data sets stored in the more expensive data warehouse storage. And the less often used data sets stored in the Data Lake. But the key goal, is even though the less often use data sets are found in the Data Lake, it can still be accessed the same way as in the Data Warehouse. It will just be a little slower to query.


With Azure Synapse Analytics, this is now possible. You’ll no longer need a Data Engineer to move data to the data warehouse to meet some “out-of-the-blue” ad-hoc requirement. If designed properly, your business clients should be able to query this historical data while signed into Azure Synapse Analytics and get the results they need from this historical data set. So, you save on storage costs as well as data engineering costs.


Thus, a 30 / 70 mix or 10 / 90 mix for your entire Data Lake / Data Warehouse solution is not impractical. A lot, of course, depends on your business. But the idea here is to strive to a 10 / 90 mix since the cost savings can be huge. How you do that is the challenge of your IT leadership and development teams. Fortunately, with the latest features of Azure Synapse Analytics, it has become more possible than ever.


In a spreadsheet I ran 24 different scenarios to see how much you can save. Note, all the prices were taken as of 3/22/2020 directly from the Azure Pricing Calculator:

https://azure.microsoft.com/en-us/pricing/calculator/

Below is a walkthrough of all the scenarios:

SMALL BUSINESS SCENARIOS – 1yr commitment with 10 Terabyte Data Warehouse Storage:


1-year commitment for Small Business with requirements for total storage of 10 Terabytes. Within this main scenario, we examine four different storage mixes:


0 / 100 mix meaning all the storage comes from the more expensive Azure Synapse Analytics DW storage


50 / 50 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 5 TB or 50% storage from Azure Data Lake Gen2

  • 5 TB or 50% storage from Azure Synapse Analytics


70 / 30 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 7 TB or 70% storage from Azure Data Lake Gen2

  • 3 TB or 30% storage from Azure Synapse Analytics


90 / 10 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 9 TB or 90% storage from Azure Data Lake Gen2

  • 1 TB or 10% storage from Azure Synapse Analytics

Below are the results of the four scenarios with the cost savings highlighted in yellow.

SMALL BUSINESS SCENARIOS – 3yr commitment with 10 Terabyte Data Warehouse Storage:


3-year commitment for Small Business with requirements for total storage of 10 Terabytes. Within this main scenario, we examine four different storage mixes:


0 / 100 mix meaning all the storage comes from the more expensive Azure Synapse Analytics DW storage


50 / 50 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 5 TB or 50% storage from Azure Data Lake Gen2

  • 5 TB or 50% storage from Azure Synapse Analytics


70 / 30 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 7 TB or 70% storage from Azure Data Lake Gen2

  • 3 TB or 30% storage from Azure Synapse Analytics


90 / 10 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 9 TB or 90% storage from Azure Data Lake Gen2

  • 1 TB or 10% storage from Azure Synapse Analytics


Below are the results of the four scenarios with the cost savings highlighted in yellow.

MEDIUM BUSINESS SCENARIOS – 1yr commitment with 100 Terabyte Data Warehouse Storage:


1-year commitment for Medium Business with requirements for total storage of 100 Terabytes. Within this main scenario, we examine four different storage mixes:


0 / 100 mix meaning all the storage comes from the more expensive Azure Synapse Analytics DW storage


50 / 50 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 50 TB or 50% storage from Azure Data Lake Gen2

  • 50 TB or 50% storage from Azure Synapse Analytics


70 / 30 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 70 TB or 70% storage from Azure Data Lake Gen2

  • 30 TB or 30% storage from Azure Synapse Analytics


90 / 10 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 90 TB or 90% storage from Azure Data Lake Gen2

  • 10 TB or 10% storage from Azure Synapse Analytics

Below are the results of the four scenarios with the cost savings highlighted in yellow.

MEDIUM BUSINESS SCENARIOS – 3yr commitment with 100 Terabyte Data Warehouse Storage:


3-year commitment for Medium Business with requirements for total storage of 100 Terabytes. Within this main scenario, we examine four different storage mixes:


0 / 100 mix meaning all the storage comes from the more expensive Azure Synapse Analytics DW storage


50 / 50 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 50 TB or 50% storage from Azure Data Lake Gen2

  • 50 TB or 50% storage from Azure Synapse Analytics


70 / 30 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 70 TB or 70% storage from Azure Data Lake Gen2

  • 30 TB or 30% storage from Azure Synapse Analytics


90 / 10 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 90 TB or 90% storage from Azure Data Lake Gen2

  • 10 TB or 10% storage from Azure Synapse Analytics


Below are the results of the four scenarios with the cost savings highlighted in yellow.

LARGE BUSINESS SCENARIOS – 1yr commitment with 1000 Terabyte Data Warehouse Storage:


1-year commitment for Large Business with requirements for total storage of 1000 Terabytes. Within this main scenario, we examine four different storage mixes:


0 / 100 mix meaning all the storage comes from the more expensive Azure Synapse Analytics DW storage


50 / 50 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 500 TB or 50% storage from Azure Data Lake Gen2

  • 500 TB or 50% storage from Azure Synapse Analytics


70 / 30 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 700 TB or 70% storage from Azure Data Lake Gen2

  • 300 TB or 30% storage from Azure Synapse Analytics


90 / 10 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 900 TB or 90% storage from Azure Data Lake Gen2

  • 100 TB or 10% storage from Azure Synapse Analytics

Below are the results of the four scenarios with the cost savings highlighted in yellow.


LARGE BUSINESS SCENARIOS – 3yr commitment with 1000 Terabyte Data Warehouse Storage:


3-year commitment for Large Business with requirements for total storage of 1000 Terabytes. Within this main scenario, we examine four different storage mixes:


0 / 100 mix meaning all the storage comes from the more expensive Azure Synapse Analytics DW storage


50 / 50 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 500 TB or 50% storage from Azure Data Lake Gen2

  • 500 TB or 50% storage from Azure Synapse Analytics


70 / 30 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 700 TB or 70% storage from Azure Data Lake Gen2

  • 300 TB or 30% storage from Azure Synapse Analytics


90 / 10 mix between Azure Data Lake Gen2 and Azure Synapse Analytics

  • 900 TB or 90% storage from Azure Data Lake Gen2

  • 100 TB or 10% storage from Azure Synapse Analytics

Below are the results of the four scenarios with the cost savings highlighted in yellow.

In summary, you can see that augmenting your Data Warehouse storage by using the less expensive Azure Data Lake storage can substantially reduce your costs. Furthermore, making a one-year or three-year commitment also has significant savings. The scenarios listed were very specific, but keep in mind you can grow into these storage mixes. For example, you may start at a 20 / 80 mix during the development phase with the idea to grow into the 70 / 30 mix over time. A lot also depends on the kind of data and volume of data that you will ingest over time as well as your retention policy. There are many factors that go into your Data Warehouse storage requirements. At the very least, I hope this article will give you some ideas on cost savings to building your modern Data Warehouse.

Reach out

Email:

info@simatree1.com

Telephone:

703-314-4371

linkedin (1).png
instagram.png
twitter.png

Stay in the loop

What would you be interested in hearing more about?

© 2019 by Simatree.  All rights reserved.