Skip to main content

Tips and Tricks with Storage Classes

Multilocations enable extensive fine-tuning of storage definitions. The following patterns are worth keeping in mind.

Labels Are Scoped per Location

Chunkserver labels are evaluated per location. Servers in different locations can share the same label letters; the storage class definition controls which copies go where by combining the location and label expression.

Example: Two locations, each with chunkservers on Very Reliable disks (labeled R) and Less Reliable disks (labeled L). To keep 3 copies per location – 1 on a reliable disk and 2 on less reliable disks – no location-specific label names are needed:

mfsscadmin create -l location1 -K R,2L -l location2 -K R,2L my_storage_class

The R label in the first expression applies only to chunkservers labeled R that belong to location1; the R in the second expression applies only to location1's servers. The system evaluates each location's expression independently.

Zero Expressions Give Precise Control Over What Goes Where

Each data state (CREATE, KEEP, ARCHIVE, TRASH) must be recorded somewhere – but not necessarily in the same locations, or in all locations.

Example: A cluster with 3 locations – two research facilities (facility1 and facility2) and a secure backup (backup) with a slower connection. Researchers in both facilities analyze the same raw data and produce results. Analysis generates temporary files that are discarded after each run. Results must be backed up.

Four storage classes cover this cleanly:

mfsscadmin create -m a -d 72 -l facility1 -K 2 -A @4+1 -l facility2 -K 2 -A @4+1 raw_data
mfsscadmin create -l facility1 -K 1 temporary1
mfsscadmin create -l facility2 -K 1 temporary2
mfsscadmin create -m f -l facility1 -K 2 -A @4+1 -l facility2 -K 2 -A @4+1 -l backup -C 0 -K 1 -A @4+3 results

What each class does:

  • raw_data – 2 copies in each facility for local read performance. After 72 hours of inactivity, data converts to EC 4+1 to save space.
  • temporary1 / temporary2 – 1 local copy per facility. Temporary files are recomputed from scratch on failure anyway, so a single copy is an acceptable risk and avoids wasting space and bandwidth.
  • results – 2 copies in each facility initially (fast local writes, no CREATE in backup due to the slow link). As soon as chunks are created, 1 copy is replicated to backup. In fast mode (-m f), all copies convert to EC format immediately: EC 4+1 in the facilities (results are read-only, so this is efficient) and EC 4+3 in backup for higher durability.
note

The temporary1 and temporary2 classes have an implicit zero expression for the backup location – backup was never defined for these classes, so the system records nothing there. In the results class, however, backup is explicitly defined with -C 0 to avoid creating chunks there during the initial write, while still keeping 1 copy via the KEEP state.

Use explicit zero expressions (-C 0, -K 0, etc.) when you need to suppress a state in a location that would otherwise inherit a non-zero definition. Rely on implicit zeros (simply omitting a location from the class) when you never want any data in that location.