Skip to main content

Backup

Everobody needs to backup their data: small companies, big companies, individuals and organisations. As the saying goes: you either already backup your data, or you will in the near future. MooseFS, as a storage solution, is perfect for backup purposes, because:

  • it is very easily scalable - your MooseFS instance can grow together with your backups
  • it is very safe - stored data is continously scrubbed (the integrity and readability of stored files' chunks is tested in a continous loop), to avoid silent deterioration of data that is otherwise rarely, or never, accessed
  • it offers various storage policies via the use of storage classes, so you can tier your backup: store more copies of really important files and save space on the less important ones
  • it offers the Erasure Coding format, which allows you to store data very efficiently, space-wise
  • it offers the possibility to store copies of files on specific machines, to help distribute them between storage nodes (like in different racks, different sever rooms or buildings), to make them more resistant to area failure scenarios (fire or flooding in a server room, electricity problems etc.)

Some example of backup setups:

Backup in two different physical locations (two rooms or two buildings):

Depending on if you are using Pro or CE version of MooseFS, your instance needs to have at least one master server in each location (Pro version) or a master sever in a selected location and a metalogger in the other location. In both cases you need several chunk servers (at least two, we recommend at least a couple more) in each of the locations. Via the use of labels you indicate, which server is in which location (e.g. you label all the servers from the first location with label A and all the server from the second location with label B). Then, you use storage classes with specific labels to make sure that a desired number of copies is always kept in each location.

To keep 1 copy of data in each location, use the following class:

mfsscadmin create -K A,B -m s my_simple_backup

This class will always store exactly one copy of each chunk on a chunkserver with label A and one on a chunkserver with label B in a strict mode, that means if servers with one of the labels (either A or B) are all not accessible or totally full (no space on disks), new data cannot be recorded and old data will not be replicated. You need to set up monitoring of your backup instance, so you are always informed if this situation (lack of availability of servers or disk space on servers) occurs.

One copy means if one location is down, data can be read from the other location, but if the other location suffers at least one hardware failure too, then some of the data will become inacessible or lost (depending on if the faiulre is permament or not). So it is safer to keep two copies in each location, like this:

mfsscadmin create -K A,A,B,B -m s my_safer_backup

We recommed using this class, rather than the simple one.

When one of the locations becomes fully inaccessible (due to some massive failure or disaster), to be able to access data in read only mode from the other one, you need to:

  • in case of Pro version: make sure you configured the DEPUTY by setting the variable DEPUTY_ELECTION_TIME in mfsmaster.cfg to a value greater than zero; when this is configured, your other master will become a deputy automatically and provide you with read-only access to your data
  • in case of CE version: use the metalogger to set up a new master; you can prepare your metalogger machine to become your new master by installing the master binary there and adding all necessary configs; in case of a longer or permament failure of your main master you can start the mfsmaster process using the metadata stored by the metalogger; don't forget to change your DNS entry for your mfsmaster (or other you use) domain

Backup with different security levels

Not all data stored in backup is of same importance and not all backups need to be stored permanently. You can use various mechanics in your backup:

  1. Use storage classes with different Redundancy Levels (RL), e.g. store super important data with RL 3, quite important data with RL 2, less important data with RL 1

    mfsscadmin create -K 4* -A@8+3 -o f my_super_important_data
    mfsscadmin create -K 3* -A@8+2 -o f my_important_data
    mfsscadmin create -K 2* -A@8+1 -o f my_normal_data
  2. If you don't have enough chunkservers to utilise the EC format, use the ARCHIVE state to lower the Redundancy Level of older backup files, especially when they are being replaced by newer files and eventually deleted anyway

    mfsscadmin create -K 3* -A 2* -o amc -d 7d my_backup

This class will store your data in 3 copies, but after 7 days (-d 7d) have passed from last acces or modification (-o amc), it will be stored only in 2 copies.

  1. You can always write your own script or utilise an externall tool to simply change the storage class of some files to another storage class after some time has passed or a certain number of newer files has been added to the storage.

Backup with remote copy

Sometimes backup is so important that you want to store it in several remote geographical locations, locations too far away from each other or with not stable enough connections to setup as one MooseFS instance. You can achieve this goal with MooseFS too. Simply set up an independent MooseFS instance in each of the locations and use the mfspatch tool to synchronize data between locations. This tool will create optimal patches, so that the minimum data amount is sent between to locations. If a transfer is broken, you can re-apply the patch and the tool will know to skip the part that was already applied. All the previous advice is still applicable: you can set up various storage classes depending on data importance in each of your remote sites, you can use labels etc.