Introduction

This aim of this blog is to do a cost comparison between two cloud architectures running a small-to-medium data analytics workload. I’ll be comparing an Infrastructure as a Service (IaaS) architecture with an “equivalent” Platform as a Service (PaaS) architecture. The quotation marks are intended as a disclaimer – comparing the two architectures on a like for like basis is a non-trivial exercise given the myriad of factors that determine the respective costs. As a result, I’ve made some assumptions to support this exercise, I’ll outline them shortly.

Without taking costs into consideration, the benefits of PaaS over IaaS are well documented. They include;

  • Rapid Time to Market – developers can focus on building applications without worrying about configuring or maintaining (upgrades/patching) the underlying infrastructure.
  • Reduced Barriers to Entry – With PaaS, developers can easily experiment with new Azure technologies even with limited resources, PaaS is also an excellent option for rapid prototyping.

General Assumptions

In order to compare the IaaS Architecture against the PaaS Architecture on a somewhat like for like basis, I’ll need to make some assumptions to establish a baseline.

  1. The architectures will support a DW/Reporting workload that runs ETL processes up to 200 runs everyday.
  2. The average size of a source/destination table is <1GB, therefore Synapse Analytics is not considered.
  3. I’ll select the cheapest option at the standard tier.

Architecture Diagrams

Costs

IaaS Cost Considerations

The chosen IaaS Architecture makes use of SQL Server on Azure VMs; follow the link for in-depth guidance. What follows is a high level overview of some options to consider when implementing this architecture.

  • There are 3 workloads to choose from;
    • Web,
    • Standard,
    • Enterprise
  • There are 3 options for paying;
    • Free License SQL Server Edition,
    • Pay per usage (pay as you go),
    • Bring your own license (BYOL)

Following Microsoft best practices for SQL Server on Azure VM, the VM I’ll use will have the following spec.

IaaS Assumptions

I’ll be using a Standard workload & pay per usage (pay as you go) payment method for the VM. The pay as you go model provides the most flexibility.

  • You pay for compute capacity by the second.
  • No long-term commitment or upfront payments.
  • You can increase/decrease compute capacity on demand.
  • Start/Stop the VM and only pay for what you use.

I’ll also assume no Azure Hybrid Benefit, which according the Microsoft Doc

“Azure Hybrid Benefit is a licensing benefit that lets you bring your on-premises Windows Server and SQL Server licenses with active Software Assurance or Linux subscriptions to Azure and helps to significantly reduce the costs of running your workloads in the cloud.”

With Azure Hybrid Benefit, Microsoft state you can “Save up to 85% over standard pay-as-you-go rate…”

Note you can also achieve significant savings over pay-as-you-go (PAYG) using long term options offered through Microsoft Reservations.

Quick word on the Micrsoft Cloud Solution Provider (CSP) Programme

Unless your organisation has an Enterprise Agreement with Microsoft, you can either buy subscriptions directly from Microsoft with your credit card or via a CSP at the same RRP. Benefits of the latter include:

  • You will receive a monthly bill for what you have used and consumed. No need to use your credit card
  • You don’t have to make any commitment to your expected usage or consumption.
  • We provide access to a Power BI dashboard to allow you to view your consumption.
  • As your CSP, we are your point of contact. We can provide advice and guidance and can escalate issues to Microsoft, if required.
  • You have the same access and control of your subscriptions.
  • You can switch Microsoft CSP Licensing at anytime, there is no tie-in.
  • As your CSP, we can also provide a comprehensive Cloud Operation service for your Azure, Office 365 and Power BI environment.

PaaS Cost Considerations

Azure Data Factory Assumptions (Ingestion)

Daily Calculation

  • 200 pipeline runs per day – each pipeline has 5 activities
  • That equates to 1000 activities per day.
  • Suppose each pipeline runs for 2 mins, that’s 7 hours of processing per day on Azure IR

Monthly calculation

  • 6000 pipeline runs per month – each pipeline has 5 activities.
  • That equates to 30000 activities per month
  • Supposing each pipeline runs for 2 mins, that’s 200 hours of processing per month on Azure IR.

Execution

  • Data movement activities = $0.25 per DIU / hour – Assuming the copy activity makes up 75% of the time it takes for pipeline to complete. 150 hours * 4 DIUs = 600 units.
  • Pipeline activities: $0.005 / hour – 200 hours
  • External activities: $0.00025 / hour – N/A

The default DIU setting on all copy tasks is 4, the minimum you can configure is 2 and the maximum that you can figure is 256. 

All activities are prorated by the minute and rounded up

As you can see above, I’ve gone with the default DIU setting.

Azure SQL DB Assumptions (Storage)

The Azure SQL Database will be a S1 standard Tier Single Database giving us 20 DTUs / 250GB Storage.

Azure Analysis Services Assumptions (Analysis)

For Azure Analysis Service, I’ve chosen the Standard Tier – “This tier is for mission-critical production applications that require elastic user-concurrency, and have rapidly growing data models. It supports advanced data refresh for near real-time data model updates and supports all tabular modeling features.”

I’ll assume it runs for 12 hours a day, which equates to 360 hours a month.

Cost Comparison

Quick word on Power BI (Visualization)

The Cost considerations for the Power BI Service are beyond the scope of this blog, you can find useful information on Microsoft docs. I would argue that neither the IaaS or PaaS Architecture would have a cost advantage over the other when connecting to the service, all things being equal. As a result, I’ve not included Power BI in the below comparison.

Conclusion

In order to decide whether to go IaaS or PaaS for a data analytics workload you must consider a myriad of factors. Chief among these is your current solution architecture. If you’re already running ETL processes on-prem using SQL Server, the question then becomes “how quickly do I want to realise the benefits of moving workloads to the cloud”.  Going down the PaaS route does require higher upfront investment/effort in terms of re-architecting the solution and retraining developers and as a result benefits will be realised in the long term.

By contrast an IaaS ‘Lift and Shift’ approach requires less upfront investment/effort but will cost more in the long term as shown in the cost comparison. There is also an added opportunity cost incurred when maintaining infrastructure which results in increased Time to Market and Increased Barriers to Entry, to name a few.

Where feasible, I recommend a PaaS first approach in order to realise the full benefits of the cloud. Telefónica Tech is happy to discuss your journey to the cloud. Through our data architecture services we can help you design and build your ideal data architecture – applying best practices, and getting the most from your people, technology and budget. Get in touch.