This is part of a group of posts that aims to demystify the cloud for non-cloud experts and provide a framework in which to think about the services cloud can provide. In essence, to make cloud simple.

It follows the premise that all the cloud providers have essentially the same types of core services, with various strengths and benefits.  As we progress to a situation where most organisations use multiple clouds for different purposes, it is useful to understand the higher concepts and how they relate to each other.  Only then can you go down to the next level and understand what makes one cloud different to another, and why you might use one cloud provider for one use case and another cloud for a different one.

To recap, in the first post of the series, we talked about the high level service categories which can be broken down into: 


Core Services

Data & Analytics

Enterprise

Compute

Networking

Storage

Security

Data Integration

Databases

ML/Artificial Intelligence

Analytics

Hybrid Connectivity

Integration

Workflow

Search

Management

Migration

Development

Mobile & IoT

Application Discovery

Application Migration

Data Migration

Developer Tools

DevOps Pipelines

Mobile

IoT

Global Infrastructure (Regions, Availability Zones)



In this post, let’s go down to the next level and understand the service types that fit into the Data & Analytics category.  That being Data Processing, Databases, Machine Learning/Artificial Intelligence (ML/AI) and Analytics.  We will cover the remaining categories in following posts.

Data Processing

Data Processing can be complex as there are a lot of different ways you can move and process data in the cloud ecosystem depending on your specific requirements.  At a very high level you can process data in either real-time or batches.  All the sub-services help enable either one of those two scenarios.  The core service types are:


Service Type

Description

AWS

Azure

GCP

Stream Data Processing

Managed streaming data

Amazon Kinesis

Stream Analytics

Cloud Dataflow

Messaging

Asynchronous messaging

Amazon Simple Queueing Service

Service Bus

Cloud Pub/Sub

Hadoop

Managed Hadoop clusters

Amazon Elastic MapReduce

HDInsight

Cloud Dataproc

Spark

Managed Spark clusters

Amazon Elastic MapReduce

Azure Databricks

Cloud Dataproc

Job Orchestration

Managed orchestration service

Amazon Data Pipeline

Azure Data Factory

Cloud Dataflow

ETL

Extract, Transform & Load

AWS Glue

Azure Data Factory

Cloud Dataprep

Data Catalog

Metadata management service

AWS Glue Data Catalog

Data Catalog

Data Catalog

Compute Intensive Processing

Managed large batch processes

AWS Batch

Batch

N/A


Stream data processing services allow you to accept data in a piecemeal fashion and process it on the fly.  It has special features to read and replay records, transform data and perform stream analytics on each piece.  Similar, but used for a different purpose, are the messaging services.  These allow one application to asynchronously send messages to a queue which is held for other applications to pick them up in either a FIFO or sporadic fashion. 

Batch processing is typically done by a combination of underlying service types.  A key element to batch processing is the use of a job orchestration tool which allows you to tie together discrete processing steps into a cohesive flow.  The good thing is those discrete steps can be in a variety of underlying languages/tools which allows for high flexibility.  This often goes hand in hand with the ETL tool which provides core functions to process your data.

A key service used for more complex data processing is Hadoop.  It was all the rage a few years ago, but is slowly losing its grip on the big data space as the more generic services from the cloud providers also allow you to cater for the classic 3 V’s - Velocity, Volume and Variety.  But each cloud provider still has an instantiation of Hadoop which gives you access to a wide variety of open source tools.  A key offshoot from the Hadoop ecosystem is Spark which is an in-memory processing framework that has gained a lot of traction.  Spark has built its own ecosystem that allows you to perform ETL, Machine Learning, Steaming and Graph processing. 

Lastly, a new type of service which is easily available in the cloud is the ability to run compute intensive processing tasks in a cost effective manner.  E.g. Mapping the Universe.  This provides the ability to schedule and run tasks on a cloud providers spare capacity.  Therefore it is a lot cheaper than running on on-demand instances but your processes will need to be able to gracefully restart if the underlying node gets taken away.

Databases

Every organisation needs a database to manage their data to make it easier to consume information.  Managing databases have typically been a high maintenance activity managing size, indexing, backup, recovery etc.  All cloud providers are working hard to manage many of these aspects on your behalf so you just need to concentrate on the data structures and data.  You could still install your own DB on a base IaaS machine but you probably aren’t maximising the benefits of the cloud that way.  The core service types are:


Service Type

Description

AWS

Azure

GCP

RDBMS

Managed relational DB

Amazon Relational Database Service (RDS)

SQL Database

Cloud SQL

High Availability Managed RDBMS

High Performance managed relational DB

Amazon Aurora

Sql Database Managed Instance

Cloud Spanner

NoSQL DB

Managed NoSQL DB

Amazon DynamoDB

Cosmos DB

Cloud Bigtable

Data Warehouse

Horizontally scalable DB

Amazon Redshift

Sql Data Warehouse

BigQuery

Caching

In-memory data store

Amazon ElastiCache

Azure Cache for Redis

Cloud Memorystore


Relational Database Management Systems (RDBMS) have long been part of the core backbone of an enterprise application.  This is likely to remain the case.  The key with this service though is the ease of instantiation and management of core operations functions.  It is typically restricted to one region but can be replicated to other regions for Read Performance and Disaster Recovery.  They all support core database flavours such as PostgreSQL or MySQL.

However, the cloud providers have gone one step further and provided a more proprietary version that really maximises the benefit of the cloud.  A.k.a. the high availability Managed RDBMS.  These allow the provision of a global footprint and higher level of management abstraction.  Of course, expect to pay more for this extra functionality. 

NoSQL databases came about with the internet era to provide extremely fast lookups to support millions and now billions of users.  They do have constraints as they don’t fully support relationships and transaction consistency but these limitations are slowly becoming overcome.  NoSQL databases have been designed to favor scalability, performance, and availability over the consistency of relational databases.

Finally most organisations have a data warehouse.  A specially designed database that can handle large data which is designed to be very scalable.  They are more closely aligned to an RDBMS than NoSQL.  While each cloud provider has their own flavour there are other managed cloud based data warehouses available from external providers that are competitive.  E.g. Snowflake.

One of the aspects of having a variety of database options is it no longer is a one size fits all world.  It is now easy to use a traditional RDBMS for one part of the application and an in-memory datastore or NoSQL DB for another part.

Machine Learning/Artificial Intelligence

Given the rise and exposure of AI at the moment, it's no surprise that each cloud provider is focusing heavily on building out its ML and AI toolsets.  The list covered below are just the more well known services.  More and more specialised tool sets have and are being released to cover different types of AI.  E.g. Video analysis vs. Text Analysis vs. Speech analysis etc….


Service Type

Description

AWS

Azure

GCP

ML Studio

Managed platform for ML

Amazon SageMaker

Machine Learning Studio

Cloud Datalab

AI Studio

Managed platform for AI

TensorFlow

Cognitive Services

AI Platform

Conversational Interface

Bot Framework

Amazon Lex

Azure Bot Service

Dialogflow Enterprise Edition


Providing a short description for these service types will not do them justice.  The general theme is that managed platforms are being provided to help with the end to end lifecycle.  From ingestion of data, model development and training, to testing and even migration to a production environment.  While they are constantly building out the frameworks and toolsets, it is all geared to increase the productivity of the developer.   

Analytics

Last but not least is the business intelligence tools that allow users to slice/dice and interpret the data.  The core services are:


Service Type

Description

AWS

Azure

GCP

Analytics

Business Intelligence tool

Amazon QuickSight

Power BI

BigQuery

Object Analytics

Object Storage SQL Query tool

Amazon Athena




Business Intelligence tools allow you to define standard reports for consumption as well as provide an easy mechanism to do some self service analysis.  Given an endpoint is all that is needed for analytics tools to access the data, the traditional BI toolsets such Tableau, Cognos, etc. are still major players in this space. 

One area that is a great step forward is the emergence of tools that can just query the data in the underlying data lake/object storage using standard SQL.  This means you don’t necessarily need to load all the data into a database for it to be easily accessed.  You also have the ability in most databases to just create a view on top of an external file so you can query without actually loading it into the database. 

Summary

In this post, we have dug a little deeper and explained the different types of data and analytics services you will find across most cloud providers.  

The most exciting aspects of the new cloud environment is the ease of mixing and matching databases and data processing tools depending on what is the best fit.  In the old world, you were typically stuck with your standard ETL tool, database and reporting tool.  Now, since there is no real cost in standing a new tool up, and you are just paying for usage, you can now use the best tool for the job.  While this provides flexibility, it also introduces complexity as you now need to create standards around when to use what. 

As mentioned previously, while understanding conceptually what type of services a cloud provider has, the true magic happens when you start integrating them into patterns for devops, microservices, and data & analytics.  We will cover these patterns in later posts.

For access to the full reference list, download our free resource here.

For access to the full reference list, feel free to download our free resource.

Free Resource

For access to the full reference list, feel free to download our free resource.

If you want more information regarding any of our services to help reduce the complexity of the cloud, please contact us at contact@cloudmill.com.au.
Tony
Tony // AUTHOR

Tony is a cloud, data and analytics professional with over 24 years experience and deep expertise in cloud technologies (holding expert certifications in AWS, Azure and GCP).

Related
Technologies