Benchmarking Ceph erasure code plugins
The erasure code implementation in Ceph relies on the jerasure library. It is packaged into a plugin that is dynamically loaded by erasure coded pools.
The ceph_erasure_code_benchmark is implemented to help benchmark the competing erasure code plugins implementations and to find the best parameters for a given plugin. It shows the jerasure technique cauchy_good with a packet size of 3072 to be the most efficient on a Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz when compiled with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5). The test was done assuming each object is spread over six OSDs and two extra OSDs are used for parity ( K=6 and M=2 ).
- Encoding: 4.2GB/s
- Decoding: no processing necessary (because the code is systematic)
- Recovering the loss of one OSD: 10GB/s
- Recovering the loss of two OSD: 3.2GB/s
The processing is done on the primary OSDs and therefore distributed on the Ceph cluster. Encoding and decoding is an order of magnitude faster than the typical storage hardware throughput.
./autogen.sh ; ./configure ; make
which compiles the ceph_erasure_code_benchmark benchmark tool.
The results of the erasure code bench script ( which relies on ceph_erasure_code_benchmark ) were produced on a Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz and compiled with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5).
CEPH_ERASURE_CODE_BENCHMARK=src/ceph_erasure_code_benchmark \ PLUGIN_DIRECTORY=src/.libs \ qa/workunits/erasure-code/bench.sh
They can be interpreted as follows:
seconds KB plugin k m work. iter. size eras. 0.612510 1048576 example 2 1 encode 1024 1048576 0 0.317254 1048576 example 2 1 decode 1024 1048576 1
The first line used the example plugin to encode 1048576KB (1GB) in 0.612510 seconds which is ~1.7GB/s. The measure was done by iterating 1024 times to encode a 1048576 (1MB) bytes buffer. The second line used the example plugin to decode 1048576KB (1GB) when 1 chunk has been erased (last column) in 0.317254 seconds which is ~3.1GB/s. The measure was done by iterating 1024 times to decode a 1048576 (1MB) bytes buffer that was encoded once.
When using the Jerasure Ceph plugin and the Reed Solomon technique to sustain the loss of two OSDs (i.e. K=6 and M=2 ) the results are:
seconds KB plugin k m work. iter. size eras. 0.103921 1048576 jerasure 6 2 decode 1024 1048576 1 0.277644 1048576 jerasure 6 2 decode 1024 1048576 2 0.238322 1048576 jerasure 6 2 encode 1024 1048576 0
The first line shows that if 1 OSD is lost ( erased ), it can be recovered at a rate of 10GB/s ( 1/0.103921 ). If 2 OSDs are lost, recovering both of them can be done at a rate of 3.6GB/s ( 1/0.277644 ). Encoding can be done at a rate of 4.2GB/s ( 1/0.238322 ).
The corresponding jerasure technique is cauchy_good with a packet size of 3072:
--parameter erasure-code-packetsize=3072 --parameter erasure-code-technique=cauchy_good
After profiling a single call and reducing the number of iterations from 1024 to 10 because valgrind makes the run significantly slower:
valgrind --tool=callgrind src/ceph_erasure_code_benchmark --plugin jerasure --workload encode --iterations 10 --size 1048576 --parameter erasure-code-k=6 --parameter erasure-code-m=2 --parameter erasure-code-directory=.libs --parameter erasure-code-technique=cauchy_good --parameter erasure-code-packetsize=3072
It shows that 97% of the time is spent in table lookups.