Data Redundancy with Erasure Coding
Erasure Coding (EC) is a method of data protection and redundancy used in MooseFS as an alternative to traditional copy-based replication. It is designed to provide the same or greater levels of fault tolerance while using significantly less storage space.
Why Erasure Coding?
In systems with large datasets, keeping multiple full copies of each file can be costly in terms of storage. Erasure Coding addresses this by dividing data into smaller parts and generating additional parity parts. Even if some parts are lost or unavailable, the original data can still be reconstructed using the remaining parts.
This approach offers a more storage-efficient way to protect data, especially in environments where high redundancy is required but minimizing storage overhead is also a priority.
How It Works (At a High Level)
When a file is stored using Erasure Coding in MooseFS, each chunk is split into a defined number of data parts and parity parts. These parts are then distributed across multiple Chunkservers. As long as enough parts are available, MooseFS can fully reconstruct the original chunk—even if some parts are lost due to hardware failures.
For example:
- An EC 4+2 scheme means 4 data parts and 2 parity parts.
- An EC 8+3 scheme means 8 data parts and 3 parity parts.
These configurations define how many parts can be lost without affecting data integrity.
Use Cases
Erasure Coding is ideal for:
- Archival or infrequently accessed data
- Large-scale deployments where storage efficiency is critical
- Environments where bandwidth and latency are acceptable trade-offs for reduced storage cost
For more frequently accessed or latency-sensitive data, traditional replication may still be the better choice.
Integration with Storage Classes
In MooseFS, Erasure Coding is configured through Storage Classes. You can define which files should use EC, how many parts should be created, and where those parts should be stored. MooseFS handles the encoding, distribution, and recovery automatically in the background.
Summary
Erasure Coding provides a highly efficient and flexible alternative to replication, reducing the amount of raw storage needed for redundancy. In MooseFS, it is fully integrated into the storage architecture and can be applied selectively to suit different data types and access patterns.
For technical details on configuring and managing Erasure Coding in MooseFS, refer to the Data Redundancy chapter.