This is part of a group of posts that aims to demystify the cloud for non-cloud experts and provide a framework in which to think about the services cloud can provide. In essence, to make cloud simple.
It follows the premise that all the cloud providers have essentially the same types of core services, with various strengths and benefits. As we progress to a situation where most organisations use multiple clouds for different purposes, it is useful to understand the higher concepts and how they relate to each other. Only then can you go down to the next level and understand what makes one cloud different to another, and why you might use one cloud provider for one use case and another cloud for a different one.
To recap, in the first post of the series, we talked about the high level service categories which can be broken down into:
Core Services | Data & Analytics | Enterprise |
Compute Networking Storage Security | Data Integration Databases ML/Artificial Intelligence Analytics | Hybrid Connectivity Integration Workflow Search Management |
Migration | Development | Mobile & IoT |
Application Discovery Application Migration Data Migration | Developer Tools DevOps Pipelines | Mobile IoT |
Global Infrastructure (Regions, Availability Zones) |
In this post, let’s go down to the next level and understand the service types that fit into the Data & Analytics category. That being Data Processing, Databases, Machine Learning/Artificial Intelligence (ML/AI) and Analytics. We will cover the remaining categories in following posts.
Data Processing
Data Processing can be complex as there are a lot of different ways you can move and process data in the cloud ecosystem depending on your specific requirements. At a very high level you can process data in either real-time or batches. All the sub-services help enable either one of those two scenarios. The core service types are:
Service Type | Description | AWS | Azure | GCP |
Stream Data Processing | Managed streaming data | |||
Messaging | Asynchronous messaging | |||
Hadoop | Managed Hadoop clusters | |||
Spark | Managed Spark clusters | |||
Job Orchestration | Managed orchestration service | |||
ETL | Extract, Transform & Load | |||
Data Catalog | Metadata management service | |||
Compute Intensive Processing | Managed large batch processes | N/A |
Stream data processing services allow you to accept data in a piecemeal fashion and process it on the fly. It has special features to read and replay records, transform data and perform stream analytics on each piece. Similar, but used for a different purpose, are the messaging services. These allow one application to asynchronously send messages to a queue which is held for other applications to pick them up in either a FIFO or sporadic fashion.
Batch processing is typically done by a combination of underlying service types. A key element to batch processing is the use of a job orchestration tool which allows you to tie together discrete processing steps into a cohesive flow. The good thing is those discrete steps can be in a variety of underlying languages/tools which allows for high flexibility. This often goes hand in hand with the ETL tool which provides core functions to process your data.
A key service used for more complex data processing is Hadoop. It was all the rage a few years ago, but is slowly losing its grip on the big data space as the more generic services from the cloud providers also allow you to cater for the classic 3 V’s - Velocity, Volume and Variety. But each cloud provider still has an instantiation of Hadoop which gives you access to a wide variety of open source tools. A key offshoot from the Hadoop ecosystem is Spark which is an in-memory processing framework that has gained a lot of traction. Spark has built its own ecosystem that allows you to perform ETL, Machine Learning, Steaming and Graph processing.
Lastly, a new type of service which is easily available in the cloud is the ability to run compute intensive processing tasks in a cost effective manner. E.g. Mapping the Universe. This provides the ability to schedule and run tasks on a cloud providers spare capacity. Therefore it is a lot cheaper than running on on-demand instances but your processes will need to be able to gracefully restart if the underlying node gets taken away.
Databases
Every organisation needs a database to manage their data to make it easier to consume information. Managing databases have typically been a high maintenance activity managing size, indexing, backup, recovery etc. All cloud providers are working hard to manage many of these aspects on your behalf so you just need to concentrate on the data structures and data. You could still install your own DB on a base IaaS machine but you probably aren’t maximising the benefits of the cloud that way. The core service types are:
Service Type | Description | AWS | Azure | GCP |
RDBMS | Managed relational DB | |||
High Availability Managed RDBMS | High Performance managed relational DB | |||
NoSQL DB | Managed NoSQL DB | |||
Data Warehouse | Horizontally scalable DB | |||
Caching | In-memory data store |
Relational Database Management Systems (RDBMS) have long been part of the core backbone of an enterprise application. This is likely to remain the case. The key with this service though is the ease of instantiation and management of core operations functions. It is typically restricted to one region but can be replicated to other regions for Read Performance and Disaster Recovery. They all support core database flavours such as PostgreSQL or MySQL.
However, the cloud providers have gone one step further and provided a more proprietary version that really maximises the benefit of the cloud. A.k.a. the high availability Managed RDBMS. These allow the provision of a global footprint and higher level of management abstraction. Of course, expect to pay more for this extra functionality.
NoSQL databases came about with the internet era to provide extremely fast lookups to support millions and now billions of users. They do have constraints as they don’t fully support relationships and transaction consistency but these limitations are slowly becoming overcome. NoSQL databases have been designed to favor scalability, performance, and availability over the consistency of relational databases.
Finally most organisations have a data warehouse. A specially designed database that can handle large data which is designed to be very scalable. They are more closely aligned to an RDBMS than NoSQL. While each cloud provider has their own flavour there are other managed cloud based data warehouses available from external providers that are competitive. E.g. Snowflake.
One of the aspects of having a variety of database options is it no longer is a one size fits all world. It is now easy to use a traditional RDBMS for one part of the application and an in-memory datastore or NoSQL DB for another part.
Machine Learning/Artificial Intelligence
Given the rise and exposure of AI at the moment, it's no surprise that each cloud provider is focusing heavily on building out its ML and AI toolsets. The list covered below are just the more well known services. More and more specialised tool sets have and are being released to cover different types of AI. E.g. Video analysis vs. Text Analysis vs. Speech analysis etc….
Service Type | Description | AWS | Azure | GCP |
ML Studio | Managed platform for ML | |||
AI Studio | Managed platform for AI | |||
Conversational Interface | Bot Framework |
Providing a short description for these service types will not do them justice. The general theme is that managed platforms are being provided to help with the end to end lifecycle. From ingestion of data, model development and training, to testing and even migration to a production environment. While they are constantly building out the frameworks and toolsets, it is all geared to increase the productivity of the developer.
AnalyticsLast but not least is the business intelligence tools that allow users to slice/dice and interpret the data. The core services are:
Service Type | Description | AWS | Azure | GCP |
Analytics | Business Intelligence tool | |||
Object Analytics | Object Storage SQL Query tool |
Business Intelligence tools allow you to define standard reports for consumption as well as provide an easy mechanism to do some self service analysis. Given an endpoint is all that is needed for analytics tools to access the data, the traditional BI toolsets such Tableau, Cognos, etc. are still major players in this space.
One area that is a great step forward is the emergence of tools that can just query the data in the underlying data lake/object storage using standard SQL. This means you don’t necessarily need to load all the data into a database for it to be easily accessed. You also have the ability in most databases to just create a view on top of an external file so you can query without actually loading it into the database.
Summary
In this post, we have dug a little deeper and explained the different types of data and analytics services you will find across most cloud providers.
The most exciting aspects of the new cloud environment is the ease of mixing and matching databases and data processing tools depending on what is the best fit. In the old world, you were typically stuck with your standard ETL tool, database and reporting tool. Now, since there is no real cost in standing a new tool up, and you are just paying for usage, you can now use the best tool for the job. While this provides flexibility, it also introduces complexity as you now need to create standards around when to use what.
As mentioned previously, while understanding conceptually what type of services a cloud provider has, the true magic happens when you start integrating them into patterns for devops, microservices, and data & analytics. We will cover these patterns in later posts.
For access to the full reference list, download our free resource here.
For access to the full reference list, feel free to download our free resource.
Free Resource
For access to the full reference list, feel free to download our free resource.
Your AWS, Azure and GCP monthly news roundup
Join our list to receive The Radar delivered directly to your inbox on a monthly basis.