Performance comparison Classic vs. Crimson OSD with Seastore

Jose Juan Palacios Perez (IBM)

Seastore as the native object storage in Crimson

We are very excited to present the performance comparison between the Classic OSD and Seastore, the native object storage engine in Crimson OSD.

We show that Seastore performs better than the Classic OSD for 4K random reads, and has the same performance for sequential 64K read and write workloads. Only the random write 4k shows a slightly lower performance for Seastore, for which we are actively working on optimisations, in an effort lead by Samuel Just.

These results remain consistent and there is no regression in terms of the performance of Seastore with respect to the results shown from build 6aab5c07ae (May 2025, internal technical report and available in the Crimson slack channel).

In a nutshell, here are the key points about the comparison:

  • We used the same Ceph dev build from main branch (hash 785976e3179) for all test runs.

  • Single OSD, 32 RBD volumes each of size 2 GB, four FIO jobs per volume.

  • All configurations for Crimson used the balanced OSD algorithm on a single NUMA socket (this is to be consistent with previous tests).

  • We used the traditional workloads (random read 4k, random write 4k, sequential read 64k and sequential write 64k).

  • We used the dual reactor configuration for Seastore (using the max 56 CPU cores in the NUMA socket 0) and similarly for the Classic OSD. For details on this, please have a glance at our previous blog entry.

Seastore vs Classic OSD - max CPU cores configuration

In this section we show the performance comparison between the OSD Classic vs Crimson/Seastore for the dual reactor configuration. This consists on using the max 56 CPU cores in the NUMA socket 0 of the system used (Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz). This is the same CPU configuration used in the report on May 29,2025 previously published. The tests have been performed with the same hardware and software configuration, see the Appendix below for details.

Randread 4k

Seastore vs Classic (dual reactor config) - randread - IOPS vs
Latency

Seastore clearly shows a significant performance improvement over Classic for this workload, with a maximum throughput of 400K IOPS vs 130K IOPS, respectively.

Click to see the CPU utilisation.
ClassicSeastore
classic_randread_osd_cpuseastore_randread_osd_cpu
  • Note: we do not show the memory utilisation in the charts since its mostly remains constant, and does not add much to the analysis.

    In the following tables we show the detailed measurements for each workload. All the column names in lower case are provided from the benchmark FIO, whereas the only two columns in Uppercase were measured using top command, namely the OSD CPU and OSD memory utilisation.

Classic OSD - rand4read k - detailed stats

iodepthiopstotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
1108365.8732509868.000.290.111.411.26353.2053.66
2124779.6137434134.000.510.311.551.40426.7666.35
4130072.3339022090.000.980.861.591.44440.6266.60
8130657.6139197807.001.962.591.561.43442.4066.60
16131217.6239366729.003.908.301.401.36437.5866.60
24132017.4639607879.005.8118.941.301.30429.8366.60
32131567.9939477765.006.8127.081.371.42418.1266.60
40134032.4940220335.006.2226.811.781.87416.3566.60
52135246.3640576206.005.3424.462.572.71414.8666.60
64136637.7940993659.004.8523.903.323.44408.2366.60

Seastore OSD - randread 4k - detailed stats

iodepthiopstotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
1126108.0537832541.000.250.071.751.624259.313479.04
2186270.3855881299.000.340.132.502.344792.023628.80
4272327.6081698825.000.470.203.563.185148.663628.80
8313378.5794014512.000.810.473.562.975258.813628.80
16396901.06119071509.001.290.894.363.185365.523628.80
24362284.30108688913.002.124.813.672.755362.523639.44
32387618.69116296461.002.6414.404.363.235368.533645.60
40346395.63103929773.003.6918.483.652.735354.853645.60
52371220.47111386558.004.4833.794.243.115349.683645.60
64328832.7898673511.006.2350.753.512.695328.603645.60

Randwrite 4k

Seastore vs Classic - randwrite - IOPS vs
Latency

There is clear opportunity to improve the performance of Seastore for this workload. This is a very active area of work, led by Samuel Just, involving a number of optimisations for Seastore.

Click to see the CPU utilisation.
ClassicSeastore
classic_randwrite_osd_cpuseastore_randwrite_osd_cpu

Classic OSD - randwrite 4k - detailed stats

iodepthiopstotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
165903.7019771176.000.481.001.361.01781.4175.73
286286.0725885995.000.730.522.031.50993.7057.47
496025.2228807854.001.330.702.511.911094.5374.25
8103263.4230979542.002.471.033.342.571158.5376.47
16103583.2331075901.004.945.422.231.301158.3176.22
24103991.8131199311.007.3810.901.991.091143.3675.73
32104675.9831405724.009.7817.081.860.981145.6175.73
40104459.7731342317.0012.2523.811.810.891144.9676.47
52104192.4031264494.0015.9734.901.720.821142.1475.97
64104833.3031463618.0019.5347.701.640.791137.1076.22

Seastore OSD - randwrite 4k - detailed stats

iodepthiopstotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
131367.499410655.001.012.370.620.493648.143941.28
242493.1712748376.001.503.970.750.594160.045390.00
458204.7317461826.002.193.760.950.734680.486027.28
865861.8919759489.003.885.341.060.794748.006194.16
1669801.8020942843.007.3329.561.110.814928.676199.20
2468240.3520477292.0011.2546.161.110.794815.886199.20
3271570.2021477716.0014.3160.141.160.82390.356199.20
4010710.193363.00354.63691.850.080.074281.696199.20
5216412.435810.00160.41334.380.250.174969.026199.20
647648.092608.00394.05527.430.080.164545.876199.20

Seqread 64k

Seastore vs Classic - seqread - IOPS vs
Latency

Notice that the performance of Seastore is less than 10% lower than Classic for this workload, for practical purposes they can be considered the same.

Click to see the CPU utilisation.
ClassicSeastore
classic_seqread_osd_cpuseastore_seqread_osd_cpu

Classic OSD - seqread 64k - detailed stats

iodepthbwtotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
12259.5410591650.000.900.450.490.44244.6978.03
22252.2510557518.001.821.160.470.43264.2680.91
42226.4110436444.003.682.350.460.43260.6079.92
82229.9010453186.007.344.710.450.43258.8779.67
161867.488754162.0016.2826.080.400.40226.6378.19
241771.648304783.0016.6330.380.530.59224.7878.19
321686.787906337.0015.3224.540.720.84226.8778.44
401620.787597046.0015.3630.580.841.03215.5079.18
521537.757207264.0015.8140.050.961.27213.5979.18
641457.196829824.0014.7646.981.211.59215.7777.95

Seastore OSD - seqread 64k - detailed stats

iodepthbwtotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
11473.666907887.001.380.780.420.382849.745953.20
21690.217923018.002.421.610.440.412957.456199.20
41780.628346807.004.603.250.450.433092.816199.20
81868.708759887.008.767.660.460.432992.766199.20
161957.459176207.0016.7415.610.480.443235.706199.20
242037.469552208.0024.1226.460.470.463335.786199.20
322130.589988723.0030.7635.760.490.483437.996199.20
402181.1910227339.0037.5549.960.490.483284.566199.20
522293.9310757006.0046.4259.170.520.503502.086199.20
642241.9710514348.0042.6561.720.630.653315.776199.20

Seqwrite 64k

Seastore vs Classic - seqwrite - IOPS vs
Latency

Similarly as the previous workload, the performance of Seastore is within 10% than Classic, and hence can be considered the same.

Click to see the CPU utilisation.
ClassicSeastore
classic_seqwrite_osd_cpuseastore_seqwrite_osd_cpu

Classic OSD - seqwrite 64k - detailed stats

iodepthbwtotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
12413.6011313833.000.830.361.900.65597.7440.63
22629.7512326994.001.540.552.020.75684.4948.59
42679.7612561442.003.041.281.950.69698.0355.99
82688.7212603496.006.083.871.990.51716.5060.68
162701.1612661982.0012.1115.452.330.43697.2661.17
242703.4312673303.0018.1734.592.480.41682.3361.91
322596.5812173046.0024.7657.192.540.41679.3761.91
402672.8312533910.0023.3455.863.290.49674.9261.91
522607.4912226048.0020.8647.364.620.69650.5562.16
642655.0112446451.0019.1042.896.060.88639.8461.67

Seastore OSD - seqwrite 64k - detailed stats

iodepthbwtotal_iosclat_msclat_stdevusr_cpusys_cpuOSD_cpuOSD_mem
11895.088883327.001.071.121.210.433177.556001.82
22220.3610408096.001.831.971.290.483184.466249.60
42244.2010519929.003.643.361.270.483345.726249.60
82301.5410788847.007.116.381.530.493213.676249.60
162381.8611165433.0013.7412.082.160.493493.736249.60
242479.9911627449.0019.8020.332.430.503431.816249.60
321557.107299365.0042.0736.082.050.503732.996249.60
401521.377131945.0053.8349.442.030.483723.616249.60
521563.377328734.0068.1064.742.130.493810.176249.60
641571.507367144.0070.1871.462.520.583860.276249.60

Conclusions

In this blog entry, we have shown the performance of Seastore OSD vs the Classic OSD. The results show that Seastore has better performance for random read 4k, and the same performance for the sequential workloads read and write 64k. Only the random write 4k shows a lower performance for Seastore, for which we are actively working on optimisations. We will continue to monitor the performance of Seastore OSD and report on any significant changes in future blog entries. We would like to thank Samuel Just for his insights on the performance of Seastore as discussed in the Crimson community meeting calls.

Appendix: configuration details

The tests were executed on a single node cluster, o05, in the Sepia Lab.

The following is the summary of hardware and software configuration:

  • CPU: 2 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GH (56 cores each)

  • Memory: 384 GB

  • Storage: Drives: 8 x 93.1 TB NVMe

  • OS: Centos 9.0 on kernel 5.14.0-511.el9.x86_64

  • Kernel: 5.4.0-91-generic

  • Ceph: squid dev build from the main branch, hash 785976e3179 (Fri Aug 29 2025)

  • podman version 5.2.2

  • FIO: 3.28 (using the librbd engine for workloads, and the AIO engine for precondition of the NVMe drives).

We build Ceph in developer mode with the following options:

WITH_CRIMSON=true ./install-deps.sh
$ ./do_cmake.sh -DWITH_CRIMSON=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CXX_FLAGS="-fno-omit-frame-pointer" -DWITH_TESTS=OFF && ninja -C build -j 20 -l 20 -k 20 && ninja -C build install

All the tests for this report were executed using vstart.sh for cluster creation, using a single node. In terms of storage, the single configuration tested involved 32 RBD volumes, each of 2 GB size.

The RBD pool was created without replication (size 1). In the snippet below, we show the options used for the RBD pool and volumes.

Click to see the RBD configuration details.

    if pgrep crimson; then
    	bin/ceph daemon -c /ceph/build/ceph.conf osd.0 dump_metrics > /tmp/new_cluster_dump.json
    fi

    # basic setup
    bin/ceph osd pool create rbd 128
    bin/ceph osd pool application enable rbd rbd
    bin/ceph osd pool set rbd size 1 --yes-i-really-mean-it

    [ -z "$NUM_RBD_IMAGES" ] && NUM_RBD_IMAGES=1
    [ -z "$RBD_SIZE" ] && RBD_SIZE=2GB
    for (( i=0; i<$NUM_RBD_IMAGES; i++ )); do
      bin/rbd create --size ${RBD_SIZE} rbd/fio_test_${i}
      rbd du fio_test_${i}
      echo "Prefilling rbd/fio_test_${i}"
      bin/rbd bench -p rbd --image fio_test_${i} --io-size 64K --io-threads 1\
        --io-total ${RBD_SIZE} --io-pattern seq --io-type write  && rbd du fio_test_${i}
    done

    bin/ceph status
    bin/ceph osd dump | grep 'replicated size'

    # Show pool’s utilization statistics:
    rados df
    # Turn off auto scaler for existing and new pools - stops PGs being split/merged
    bin/ceph osd pool set noautoscale
    # Turn off balancer to avoid moving PGs
    bin/ceph balancer off
    # Turn off deep scrub
    bin/ceph osd set nodeep-scrub
    # Turn off scrub
    bin/ceph osd set noscrub
    # Turn off RBD coalescing
    bin/ceph config set client rbd_io_scheduler none