CDN
The storage needs of a typical Content Delivery Network (CDN) can vary widely depending on the size and purpose of the CDN, and the type and volume of content it delivers. However, in many usage scenarios MooseFS can be the perfect solution for the backend and backup storage of a CDN.
In most general terms, CDNs can be divided into two types: Distributed Full Replica Model - a CDN with multiple geographical locations, where each location stores a complete or near-complete copy of the content, and Hierarchical Caching Model, where one central location stores a complete content and other locations store only frequently requested (hot) content temporarily. We will show examples of how MooseFS can help you in each of those scenarios.
Distributed Full Replica CDN
This model is rather typical for services, that require time-sensitive content to be accessed from many locations simultaneously (like synchronized video streaming across time zones) or to serve content with global popularity (blockbuster movies, operating system or mainstream software updates, massively popular games etc.). MooseFS can offer the following features in this case:
- Erasure Coding format for stored files - this format saves a lot of disk space, while giving you high Redundancy Level, that means high tolerance for hardware failures
- Unique solution for synchronised backups between storage locations (snapshots mechanic together with the
mfspatchtool) that allows you to make consistent copies of stored data in several physical, geographical locations
A CDN network of this type based on MooseFS could consist of:
- one "base" instance, on which new content is first stored
- a number of "copy" instances, to which the new content is copied from the "base" instance with automated scripts
Thanks to MooseFS's snapshot mechanic and dedicated mfspatch tools, the automation of the copy process can be achieved easily and the copied content will be optimal (no unnecessary data copying in case new content is added as an update of already existing content, rather than completely new data in separated directory structure).
The configuration of all used MooseFS instances would be identical or very similar, with all of them storing the data in EC format. The exact format (wether to use 4 or 8 data parts and how many parity parts to use) would have to be determined by the size of each instance. A small CDN with not many nodes in each instance may want to use EC4 format to limit the number of physical servers needed, while a big one can easily afford to use the EC8 format, which is slightly more space efficient. On the other hand, the bigger the instance, the higher probability of more than one, or even more than two, concurrent hadrware failures (broken hdd, broken power unit etc.), so in a bigger instance higher Redundancy Level (more parity parts) is recommended.
Hierarchical Caching CDN
Some CDNs also serve more dynamic content, and there is no point in making a full copy of the generated content in each location that is included in the CDN's network. Instead, there is one main location that stores all the content and satellite locations, that store a cache of the most often accessed content. This architecture is typical for user generated content, like private and small bussines websites, images and short movies posted by internet users, and for content that may be valid only in one or several, but not all, geographical locations served by the CDN. The typical example of the latter would be a news or ads service.
A CDN network of this type based on MoooseFS could consist of:
- one "base" instance, on which full content is stored, and the storage policy is adjusted for most optimal space usage
- a number of "cache" instances, on which only the most popular content for that geographical area is stored, and the storage policy is adjusted for most optimal access by users
Here the base instance and the cache instances are more independent, as there is no need for exact synchronization of the entire content. Instead, the instances would serve as backend for specialized CDN software and would have to be configured to best serve their purpose:
- the "base" instance serves as the source of all content, data is not changed nor deleted frequently; here, a storage class with fast convertion of stored data to EC format should be used
- the "cache" instances will see a lot of traffic and a lot of additions and deletions of data; here, a storage class with slower convertion to EC format should be used, so that very short lived data is never converted, but more popular files, that tend to stay longer in the cache, occupy less space.