Storage and Databases

Summary

Amazon Elastic Block Storage (EBS)

Block-level storage is a place to store files, but they are stored as a blocks on disk. Only changes compared to last stored contents are getting stored.
Elastic Block Storage is efficient for :
– Databases
– Enterprise Software
– File systems

When an EC2 instance is launched, it is provided with a local storage called Instance Store Volumes. These volumes are physically attached to the host that the instance is running on top of. The catch here is when EC2 instance is stopped or terminated, all data written to an instance store volume are deleted. Next time the EC2 instance (EC2 instance is a virtual machine) is started, it is likely that it will be started on another host. Instance Store Volumes are useful in a situations when the data can be lost without an impact, such as temporary files, scratch data, data that can be easily re-created.

Amazon Elastic Block Store (EBS) is a store drive that is not tied to the host that EC2 instance is running on. EBS volumes come with different Size, Type and Configurations. EBS allows to create incremental backups of data called snapshots. This way, the corrupted data can be easily restored from the snapshot.

In general, Amazon EBS is good for large files – change made leads to delta update, not reloading the whole file.


Amazon Simple Storage Service (S3)

Amazon S3 allows to store and retrieve an unlimited amount of data at any scale. Amazon S3 is :
– serverless, it is not tied to EC2 instances unlike Amazon EBS.
– web enabled
– regionally distributed (has high durability)
– offers cost savings

Amazon S3 :
– stores data in objects (file is an object)
– stores objects in buckets (directory is a bucket)
– maximum object size that can be uploaded is 5 TB
– objects can be versioned to protect them from accidental deletion (previous version of an object is always retained)
– multiple buckets can be created and stored across different classes or tiers of data

Tiers of data storage in S3 are :

– Amazon S3 Standard (99.999999999% of durability), data is stored in such a way that AWS can sustain a concurrent loss of data, data is stored in at least of 3 facilities
– Amazon S3 Standard – Infrequent Access (for data that is not required frequently, but require rapid access to when needed), ideal for storing backups, disaster recovery files or any object that requires long-term storage
– Amazon S3 Glacier, used to archive data; data can be uploaded directly to Amazon S3 Glacier or S3 lifecycle policies can be used (Amazon S3 lifecycle management is moving data automatically between tiers)
– Amazon S3 Glacier Deep Archive

Amazon S3 Outposts is a feature provided by AWS that allows to deliver object storage to your on-premises AWS Outposts environment. Amazon S3 Outposts is designed to store data durably and redundantly across multiple devices and servers on your Outposts. It works well for workloads with local data residency requirements that must satisfy demanding performance needs by keeping data close to on-premises applications.


Amazon Elastic File System (EFS)

Amazon EFS allows multiple instances to access the data in EFS at the same time. It scales up and down without needing to make any actions from user to make that scaling happen.
Difference between EBS and EFS is that Amazon EBS volumes are attached to EC2 instances; Amazon EBS is availability zone level resource, so EBS needs to be in the same Availability Zone (AZ) with EC2 instance; EBS volumes do not automatically scale
Whereas Amazon EFS can have multiple instances reading and writing from/to it at the same time; Amazon EFS runs on Linux file system; Amazon EFS is a regional resource; Amazon EFS automatically scales; on-premises servers can access Amazon EFS using AWS Direct Connect


Amazon Relational Database Service (Amazon RDS)

Amazon RDS is a service that enables you to run relational databases in AWS cloud.
Amazon RDS comes with such features as :
– Automated patching
– Backups
– Redundancy
– Failover
– Disaster recovery

Amazon RDS is available on six database engines, which optimize for memory, performance, or input/output (I/O). Supported database engines include :
– Amazon Aurora
– PostgreSQL
– MySQL
– MariaDB
– Oracle Database
– Microsoft SQL Server

Amazon Aurora is an enterprise-class relational database. It is compatible with MySQL and PostgreSQL relational databases and up to five times faster than standard MySQL database and up to three times faster than standard PostgreSQL databases. Amazon Aurora is cost effective, it is around 1/10th cost of commercial databases. Amazon Aurora data is replicated across facilities, so there are 6 copies of data at any given time.


Amazon DynamoDB

Amazon DynamoDB is a key-value pair non-relational database. It is serverless, i.e does not require an EC2 instance to be provisioned. DynamoDB has a millisecond response time. DynamoDB is highly scalable


Amazon Redshift

Amazon Redshift is a data warehousing service that can be used for big data analytics. Redshift is massively scalable.


AWS Database Migration Service (AWS DMS)

AWS DMS enables you to migrate relational databases, nonrelational databases, and other types of data stores
– the source database remains fully operational during the migration
– downtime is minimized for applications that rely on that database
– the source and target databases don’t have to be of the same type

examples of same type of source and target databases (homogenous databases) : Microsoft SQL Server -> Amazon RDS for SQL Server
examples of using different type of source and target databases (heterogeneous databases) : convert the databases using AWS Schema Convertion tool, and then use AWS DMS to migrate from source to target database

some use cases for AWS DMS:
– Development and test database migrations
– Combining several databases into a single database (database consolidation)
– Continuous replication


Additional database services

there is no one-size-fits-all database for all purposes

– Amazon DocumentDB (with MongoDB compatibility) : great for content management, catalogues, user profiles
– Amazon Neptune : graph database, engineered for social networking and recommendation engines, also great for fraud detection needs
– Amazon Quantum Ledger Database (Amazon QLDB) : to review a complete history of all the changes that have been made to your application data
– Amazon Managed Blockchain : service that you can use to create and manage blockchain network with open-source frameworks. Blockchain is a distributed ledger system that lets multiple parties run transactions and share data without a central authority
– Amazon ElastiCache is a service that adds caching layers on top of your databases to help improve the read times of common requests; it supports two types of data stores : Redis and Memcached
– DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB; it helps to improve response times from single-digit milliseconds to microseconds

Previous

Leave a Reply

Your email address will not be published. Required fields are marked *