#4 System Design: Content Delivery Networks

Backend Developer and a busy mom who loves technology and sharing knowledge that makes her fulfilling and happy
What is Content Delivery Networks [CDN] all about?
Latency is directly proportional to Geographic Distance
What it means is that, if you shared a photo album url over a call to your ( these photos are hosted on a server that's geographically closer to you) your friend who lives in another part of the world, may have to wait a little longer for the photos to load since it has to get transferred from a far away place to him/her.
Wait you just described a problem !! what does it have to do with CDN ???


What does it solve?
When fetching static assets from a server, CDNs (Content Distribution/Delivery Networks) are a modern and popular solution for minimizing request latency.
An ideal CDN consists of a global network of servers that ensures that no matter how far a user is from your server (also known as an origin server), they will always be near a CDN server.
Users can then get cached copies of these resources through the CDN instead of having to download static assets (images, videos, HTML/CSS/Javascript) from the origin server.
AWS Cloud Front (CF):
The origin server here could be a S3 bucket. S3 is a place where static assets like images, videos etc are stored ( well S3 can hold a lot of other things as well. but for this purpose let's keep it simple to images and videos)

The steps are broken down for more clarity.
CF Viewer Request: your computer's web browser makes a request to the server where photo is stored, a function is invoked to see if the image existed in the cache. In AWS world they call is "Edge Cache". If present, send the response of the image to the user "Cache Hit". Else move on to the next step
CF Origin Request: Ask the origin server for the picture in this case let's assume its stored in a S3 bucket. the orgin server gets the request only if the asset is not present in cache. "Cache Miss" situation
CF Origin Response: Gets you the photo image that is packed as a response
CF Viewer Response: If you needed to do some resizing or any other customization's that you wish to perform can be done using lambda functions regardless of whether the object is already in the edge cache or not. Finally your friend can now see the picture of you :)
You look stunning !!!!
They call the lambda function that does this job with a special name called "Edge Function" since they run on "Edge Locations" aka "Closer to the user"
If you wish to read more .. here is the link for it
Side Kick: Because static assets can be quite large (think of an HD wallpaper image), requesting that file from a local CDN server saves a significant amount of network bandwidth.
Ok, What if something got updated in the origin server and the user needs to see it without having to request for it?
Tadaaa !!!! Let me bring you to the two different types of Content Delivery Networks
As said before, a CDN is a globally distributed network of computers that cache static materials for your origin server.
Every CDN server has its own local cache, which should be in sync with one another.
There are two basic methods for populating a CDN cache, resulting in the distinction between Push and Pull CDNs.
In a Push CDN origin server is responsible for pushing new/updated files to the CDN, which subsequently propagates them to all of the CDN server caches.
When a user sends a static asset request to the CDN server and it doesn't have it, the CDN server will fetch the asset from the origin server, populate its cache with the asset, and then send the object/asset to the user.
Pull CDN

If the CDN doesn’t have the static asset in its cache, then it forwards the request to the origin server and then caches the new object.
If the CDN already has the object in cache

Push CDN

The origin server sends the asset to the CDN, which stores it in its cache. The CDN never makes any requests to the origin server.
Which one is popularly used ?
In the case of push based CDN's there is some effort required by the origin servers to "push" updates to the cache, but the origin server needs to be really be sure that the change is propagated otherwise, it is likely the case that the client sees the stale data.
Pull CDNs, on the other hand, require less care because the CDN will automatically fetch assets from the origin server if they are not already in the cache. The disadvantage of Pull CDNs is that if they already have your asset cached, they will not know if you decide to modify it or not, and will not be able to download the new object/asset.
As a result, after assets are updated on the origin server, a Pull CDN's cache will become stale for a period of time. But there is some good news coming later ie TTL(Time to Live).
Another disadvantage is that the first request to a Pull CDN takes a long time since it must go to the origin server.
Even with its disadvantages, Pull CDNs are still a lot more popular than Push CDNs, because they are much easier to maintain. There are also several ways to reduce the time that a static asset is stale for.
Time to Live !!!
Pull CDNs usually attach a timestamp to an asset when cached, and typically only cache the asset for up to 24 hours by default(TTL default is 24 hours). If a user makes a request for an asset that’s expired in the CDN cache, the CDN will re-fetch the asset from the origin server, and get an updated asset if there is one.
Pull CDNs also usually support Cache-Control response headers, which offers more flexibility with regards to caching policy, so that cached assets can be re-fetched every five minutes, whenever there’s a new release version, etc. Another solution is “cache busting”, where you cache assets with a hash or etag that is unique compared to previous asset versions.

When not to use CDNs?
CDNs are generally a good service to add your system for reducing request latency (note, this is only for static files and not most API requests).
However, there are some situations where you do not intend to use CDNs. If your service’s target users are in a specific region( you know that is not going to change), then there won’t be any benefit of using a CDN, as you can just host your origin servers there instead.
CDNs are also not a good idea if the assets being served are dynamic and sensitive. You don’t want to serve stale data for sensitive situations, such as when working with financial/government services.
How does it look like End to End?

That's the end :) Happy Learning.




