Part -3: Creating a Highly Available Architecture

Amit Dhanik
Jul 11, 2021
9 min read

Updated: Sep 22, 2022

If you have not read part-2, it is recommended to read which you can find here.

Here we will discuss the rest of the services in our architecture - S3, Cloudfront, and SNS. Hope you enjoy reading!!

Last time we discussed Dynamo DB and its use cases. In this post, we will start with Amazon Storage Services. Let's get started.

AMAZON Storage Services

AWS provides the following basic storage services -

Amazon S3 - Object Storage.
Amazon EFS - Elastic File System for file storage.
Glacier - Archiving and Backups.
EBS - Block Storage (Disks)
Storage Gateway - Data Transfer - Provides On-premise access to Unlimited Cloud Storage. Helps to Connect On-premise with storage in AWS infrastructure.

In our Architecture Diagram, since we are using Amazon S3, we will be discussing S3 only. There are some additional storage services provided by Amazon, which we discussed above and are covered in more depth here.

Amazon S3

Amazon S3 is an Object storage service that allows you to upload files that are stored in Buckets across multiple devices and facilities. You pay only for what you use, and you can use S3 to store any type of data and retrieve any amount of data from anywhere, just by choosing the bucket and file you want to download. Files in S3 are treated just like Objects which have unique key-values pairs. An object consists of the following :

Key - To retrieve an object from the bucket, we make use of a key. Key is the name assigned to our object.
Value - Value is the content you are storing in your file.
Metadata - Data about data you are storing.
Version ID - Important for Versioning and uniquely identifying an object.
Access control - Grant access to your bucket using bucket policies and access control lists.

Tip - S3 is a universal namespace, i.e. name of your S3 Bucket must be unique globally, as it creates a web address based on the name you select for your bucket. As soon as you upload a file to S3, it gives an HTTP 200 code if the upload was successful.

S3 Storage Classes

We have six different types of Storage classes, and all are highly available with all having AZ >=3 except for One Zone-IA.

S3 Standard (99.99% Availability, Min storage duration -None)
S3 Standard-IA (99.9% Availability, Min storage duration - 30 days)
S3 Intelligent-Tiering (99.9% Availability, Min storage duration -30 days)
S3 One Zone-IA (99.5% Availability, Min storage duration -30 days)

We have separate storage classes for archiving Objects.

S3 Glacier (99.99% Availability, Min storage duration -90 days)
S3 Glacier Deep Archive (99.99% Availability, Min storage duration -180 days)

The minimum number of days here is of significance. For eg - if you delete an object from Glacier before the 90 day minimum period, you will still be charged for the whole 90 days. As Amazon states - Archives deleted before a 90days period incur a pro-rated charge equal to the storage charge for the remaining days.

Amazon S3 pricing

S3 charges us for the following -

Storage - Since we have different storage types for S3, the pricing also differs according to the storage class. We have to pay for the object size and duration for which the objects have been stored in S3 Buckets.
Request and Data retrieval - S3 charges us for the requests we make against the object. These include GET, PUT, POST, SELECT, and various other requests.
Data Transfer - S3 charges us for the Data Transfer as well. Eg - You have shared data between S3 buckets in the same region.
S3 Replication - If you have to share files between two S3 bucket which are hosted in different AWS region, Cross-Region Replication is required. It allows for the automatic and asynchronous copying of your objects between S3 buckets in different regions. S3 charges you for storage charges for the primary copy, replication PUT requests, and inter-region Data Transfer OUT from S3 to each destination region. (Taken from Amazon)
S3 Object Lambda - If you modify your S3 GET requests, you are charged.

If you want to get to know the detailed pricing, you can check it here - AWS S3 Pricing.

Some important properties of S3 buckets

Versioning

Versioning is a great backup tool for your S3 Objects. Why would you enable versioning on your S3 Bucket? Suppose you have very important files stored in S3 which are at the risk of being accidentally deleted or might get maliciously overwritten. You cannot afford to lose these files!! When such a situation arises, you enable versioning on your bucker(it is disabled by default). When you enable versioning, S3 creates a new version of the same file you upload, thus retaining the old version as well. When you or an intern accidentally deletes an object, it is not deleted, but instead, S3 creates a delete marker on the object, and you can keep track of your deleted objects without actually removing the object. Cool !! Now you won't lose any of your important files again. You could also enable MFA Delete on your bucket so that people who don't have the permission are not able to delete. Versioning keeps your files safe and you happy.

Note - Versioning can be suspended. When you suspend, then old versions are safe and can't be overwritten accidentally. New versions are not created after suspend.

Drawbacks

Well, every good thing comes at a cost. What could possibly go wrong when you enable versioning on S3 Bucket? When you work frequently on your S3 Bucket and upload a large number of files daily, S3 retains all the versions of your file along with the deleted ones. This results in you using up more space, and your cost for the same increases as well. You cannot disable versioning once it has been enabled. To completely turn it off, you have to delete the bucket and create a new one(bad idea). This leads to heavy fees!! What could we do? The Answer lies in Lifecycle Management.

Lifecycle Policies

As we are facing the problem of an increase in our bucket size due to the piling up of large files(along with versions), we need to figure out an efficient way to handle this. With lifecycle Management, we can automate moving the objects between the different storage tiers. This helps us in removing files to different storage classes and eventually brings down our bucket costs. It can be used in conjunction with versioning. So the best practice is to use versioning along with lifecycle policies.

Some Additional important properties one should know are as follows -

Cross-Region Replication -You can read about CRR here
Transfer acceleration
Encryption
Static website hosting
AWS CloudTrail data events

S3 is a global service. You specify a region in which you want to create your S3 Bucket, and then upload it to the bucket using S3 API or console. Here, in our architecture, we see that our instances in the private subnet are communicating with S3.

How does one acces Amazon S3 if it is not present inside of our VPC without going over the internet?

As discussed above as well, we have VPC Endpoints that enable the private connection between VPC and supported AWS services. VPC Endpoint services are powered by AWS Private Link. AWS Private link enables us to privately access services by using private IP addresses. Traffic between our VPC and other services does not leave the Amazon Network. VPC Endpoints do not require Virtual Private Gateway, Internet Gateway, NAT devices, VPN Connection or Direct connect.

For connection with S3, we have Gateway Endpoints, and Gateway Endpoints support both Dynamo DB and Amazon S3. We can see in below Architecture Diagram as well the functioning of Endpoints.

Amazon Cloudfront

Cloudfront is another global service provided by Amazon and is a Content Delivery Network that is used to cache content at Edge locations. A CDN is a system of Distributed servers(networks) that deliver webpages and other web content to a user based on the Geographic location of the user, the origin of the webpage, and the content delivery server. Cloudfront is generally used all over the world to speed up the delivery of our content.

Let's take a simple example. Suppose I have my site youthindiaspeak.com hosted in India and I have users from all around the world using the website. When the CDN is not enabled, a customer from Brazil would have to send the request from another side of the world, which would result in slow requests and responses, due to high latency. This would eventually lead to a poor customer experience, which is disastrous for a company. Hence, Cloudfront is used by companies all around the world to serve content faster to its user.

How Cloudfront is able to do this? Let's see.

Edge Locations

Cloudfront caches the content at nearby Edge Locations, which is separate from an AWS Region/A.Z. Our first user does a query, which is redirected to the Edge location. The Cloudfront checks if it has a copy of the file present. If the file is present, it returns the response immediately. But if the file is not present at the edge location, it downloads the file from the origin server and caches it for the TTL period. Now if another user requests the same file, he is able to download the file with lower latency, a higher download speed, and gets a better customer experience. This also means that we have now reduced the number of requests that are being made to your Origin Server, as more objects are being served from the Edge Locations. Hence, we no more have lag in our request and response time, and latency is reduced. A person viewing from Brazil will now have the response delivered from the nearest edge location, instead of traversing all around the world. This also reduces our cache hit ratio(request served from CF/ all the requests), hence CF serves as a great tool. You can read here how Netflix makes use of this service.

Lets us discuss the technical terms associated with Cloudfront.

Edge Locations

Edge locations are the data centers of AWS spread across all the world to deliver services with as much low latency as possible. Edge locations help in returning a fast response to the user and are used by services such as Cloudfront and Route 53.

Origin

This is the location where all the files are stored that CDN will distribute. This can be an S3 Bucket, EC2 Instance, an ELB, or a Route 53.

Distribution

This is the name given to the CDN which consists of a collection of edge locations. Here is a pictorial representation of how Cloudfront works.

Here, in our Architecture, we are using Amazon S3 as a source from which CloudFront will request files before placing them in its edge locations. Cloudfront serves content directly to our users and this is how a well-planned architecture looks like in AWS.

Signed URLs

Cloudfront also gives us the ability to Restrict access to our Content. This can be done with the help of signed URLs. When you make use of signed URLs, only a few of the members will have access to the content. For eg - You are able to view movies on Netflix only when you pay for your subscription. When we create signed URLs or signed cookies, we have to attach a policy(JSON) which can include the following -

URL Expiration - CloudFront checks the expiration date and time in a signed URL at the time of the HTTP request. (taken from AWS documentation)
IP ranges
Trusted signers(which AWS Accounts can create signed URLs)

If you are serving content from S3 Bucket, it is generally advised to use S3 signed URLs as they provide great flexibility. (Yes, we do have signed URLs for S3 Bucket as well!). You can read more about Signed URLs and the difference between S3 Signed URLs and Cloudfront Signed URLs.

Some important points to keep in mind while using signed URLs -

Use signed URLs/cookies when you want to secure content so that only the people you authorize are able to access it.
A signed URL is for individual files, while a signed cookie is for multiple files
If you are serving content from EC2(Origin is EC2), then you should use Cloudfront signed URLs
If you are serving content from S3 Bucket, then it is better to use S3 Signed URLs instead of Cloudfront signed URLs. In our Architecture, we can see that we are serving content from S3 Bucket to our users via Cloudfront.

I hope you got to understand how Cloudfront works and when do we need to use it. Though there are many more things, for the basics, what we discussed was enough. So now let's go to our last service, SNS.

AMAZON SNS

SNS Stands for Simple Notification services. Understanding the working of SNS is very simple as it behaves as a Publisher/Subscriber system. A real-life example of a Publisher/Subscriber system is the notifications that you receive when you subscribe to a youtube channel. Every time the Publisher publishes a new video, a notification is sent to the subscriber informing the subscriber that a new video has been published on the system. You can now imagine a number of daily real-life applications where notifications are being sent to you as soon as an event happens. Since the messages are published to a large number of subscribers, it is generally referred to as a fan-out approach(you can publish to lambda, SQS, email, etc).

We also have SQS messaging service in AWS but SNS is a push-based messaging service as compared to SQS which is pull-based. The diagram below shows how SNS can be used with subscribers as well as integrated with other applications.

You can read more about common SNS scenarios here.

That's it, guys!! This was a long article and took quite a time to write. This was the final part. I hope I was able to provide good information to you all on the basics of all the commonly used AWS Services. I will be publishing how we can use all these services in real life and their practical use. TIll then, stay tuned. Thanks for reading !!

Feel free to provide your feedback in the comment section. Leave a like if you enjoyed reading.

Connect with me on LinkedIn - Amit Dhanik.

Credits - This post was successful because of the following people - ACloudguru, AWS resources, pythoholic, and many others.

Great reading post on AWS .

Part -3: Creating a Highly Available Architecture

AMAZON Storage Services

Amazon S3

S3 Storage Classes

Amazon S3 pricing

Some important properties of S3 buckets

Versioning

Drawbacks

Lifecycle Policies

Amazon Cloudfront

How Cloudfront is able to do this? Let's see.

Edge Locations

Lets us discuss the technical terms associated with Cloudfront.

Edge Locations

Origin

Distribution

Signed URLs

Some important points to keep in mind while using signed URLs -

AMAZON SNS

Recent Posts

Comments