New in Mimic: centralized configuration management


One of the key new features in Ceph Mimic is the ability to manage the cluster configuration--what traditionally resides in ceph.conf--in a central fashion.  Starting in Mimic, we also store configuration information in the monitors internal database, and seamlessly manage the distribution of that config info to all daemons and clients in the system.

Historically, operators wanting to make a configuration change would need to edit the ceph.conf files manually, distribute them to the right nodes, and ensure that the right daemons have been restarted.  Most large-scale users relied on external tools like ansible, puppet, or salt to do this, but the solution always varied, and there was always a disconnect between what the config management service thought the configuration should be and what configuration the running daemon is using (has its local ceph.conf updated?  has the daemon been restarted?  has the operator injected a configuration change via the command line?).

This new feature is designed to bridge this gap, providing a robust view into what the configuration should be (and whether the running configuration matches), and avoid the need for external tools to manage ceph.conf configuration files.  Most importantly, it provides a simplified configuration experience out of the box.

Note that the new capability is designed to interoperate with the traditional way of managing configurations via ceph.conf, so somebody upgrading to mimic doesn't have to make any changes at all if they don't want to.  However, we expect that the advantages of migrating to the new mode of operation will pay off.

The basics

The monitors jointly manage a configuration database.  The database has the same semantic structure as a ceph.conf file:

  • There are option names (e.g., osd scrub load threshold) and values.
  • A setting can be associated with a "global" group, and type group that applies to all entities of a given type (e.g., "osd" or "mds"), or a specific daemon (e.g., "osd.123").

The ceph config dump command will output the equivalent of the cluster-wide ceph.conf in table format.

When a daemon or client starts up, it will look for a ceph.conf file like it always does.  In most cases a small ceph.conf is still necessary in order to identify who the monitors are.  For example, a typical minimal ceph.conf file might be:

mon host =,,

or better yet

mon host =

where ceph-mons is a DNS entry with multiple A records, one for each monitor.  This allows the number and identities of monitors to change over time without modifying any configuration files at all. More importantly, the configuration file on each is usually static over the lifetime of the cluster, simplifying deployment and management.

You can put any other settings you like in ceph.conf as well.  The overall priority order that Ceph uses to set options is:

  1. Compiled-in default values
  2. Cluster configuration database (the new thing!)
  3. Local ceph.conf file
  4. Runtime override (via "ceph daemonconfig set ..." or "ceph tellinjectargs ...")

Command line interface

Typing ceph config -h will summarize the set of commands available:

$ ceph config -h [...] config assimilate-conf Assimilate options from a conf, and return a new, minimal conf file config dump Show all configuration option(s) config get{} Show configuration option(s) for an entity config helpDescribe a configuration option config log {} Show recent history of config changes config resetRevert configuration to previous state config rmClear a configuration option for one or more entities config setSet a configuration option for one or more entities config show{} Show running configuration config show-with-defaultsShow running configuration (including compiled- in defaults)

A good place to start is simply dumping the cluster configuration:

$ ceph config dump WHO MASK LEVEL OPTION VALUE RO global advanced mon_pg_warn_min_per_osd 3
global advanced osd_pool_default_min_size 1
global advanced osd_pool_default_size 1
mon advanced mon_allow_pool_delete true

We can set an option like so:

$ ceph config set osd debug_ms 1 $ ceph config dump WHO MASK LEVEL OPTION VALUE RO global advanced mon_pg_warn_min_per_osd 3
global advanced osd_pool_default_min_size 1
global advanced osd_pool_default_size 1
mon advanced mon_allow_pool_delete true
... osd advanced debug_ms 1 ...

Note that this is all that is necessary to make the change: any daemons or clients in the system that this option applies to will be notified of the configuration change immediately. No restarting of daemons, no use of the awkward ceph tell ... injectargs ... command, or anything else.

In the above dump output, the MASK field is a secondary restriction on which daemons or clients the option applies to, and can match either a CRUSH location (e.g., "rack:foo") or an OSD class (e.g., "ssd" vs "hdd"). For example, we could set a higher debug level that only applies to OSDs that are backed by SSDs (and reported by the ceph osd crush tree command):

$ ceph config set osd/class:ssd debug_ms 2 $ ceph config dump WHO MASK LEVEL OPTION VALUE RO ... osd advanced debug_ms 1 osd class:ssd advanced debug_ms 2 ...

Instead of dumping the entire config database you can also inspect the config for a individual daemon in the system. For example,

$ ceph config set osd.0 debug_osd 10 $ ceph config get osd.0 WHO MASK LEVEL OPTION VALUE RO osd class:ssd advanced debug_ms 2/2
osd.0 advanced debug_osd 10/10 global advanced mon_pg_warn_min_per_osd 3

This output tells you which options and values apply the daemon, as well as where the option is coming from (is it set globally, for this daemon specifically, etc.).

Naturally, a config entry can also be cleared:

$ ceph config rm osd/class:ssd debug_ms $ ceph config get osd.0 WHO MASK LEVEL OPTION VALUE RO osd advanced debug_ms 1/1
global advanced mon_pg_warn_min_per_osd 3

Enforced configuration schema

One of the new advantages of the new approach is that configuration values are validated and checked at the time they are set. The configuration schema (what options exist and what values are legal) are compiled into the system and globally known. So, if you try to set something that doesn't make sense, you'll get an informative error message without affecting the existing configuration. For example,

$ ceph config set osd.10 debug_osd very_high Error EINVAL: error parsing value: value must take the form N or N/M, where N and M are integers $ ceph config set osd.10 bluestore_compression_mode 1 Error EINVAL: error parsing value: '1' is not one of the permitted values: none, passive, aggressive, force

The schema for a particular option can be queried with a help command:

$ ceph config help bluestore_compression_mode bluestore_compression_mode - Default policy for using compression when pool does not specify (std::string, advanced) Default: none Possible values: none passive aggressive force Can update at runtime: true

'none' means never use compression. 'passive' means use compression when clients hint that data is compressible. 'aggressive' means use compression unless clients hint that data is not compressible. This option is used when the per-pool property for the compression mode is not present.

One thing you'll notice is that advanced on the second line. All options are divided into three categories: basic, advanced, and dev. The dev options are meant for development, testing, or are generally not intended to ever be modified by a user. The advanced options are, unsurprising, only meant for advanced users. There are relatively few basic options because, well, in general we aim not to require much in the way of configuration in order to make Ceph work.

Some numeric options include a minimum and maximum value, and will accept suffixes like K or M for large values:

$ ceph config set mon mon_data_size_warn 100G $ ceph config get mon.a WHO MASK LEVEL OPTION VALUE RO mon advanced mon_data_size_warn 107374182400

Note that whether 'K' means 1000 or 1024 depends on the configuration option in question: some are based on SI units (base-10) and some on IEC units (base-2, like KiB and GiB).

Running configuration

Because configuration can come from many places (defaults, cluster config, local ceph.conf, operator override) there is a show command that returns the active configuration options as reported by any daemon in the system. For example,

$ ceph daemon mgr.x config set debug_mgr 10 # manual override of config option $ ceph config set mgr.x ms_type simple # set an option normally $ ceph config show mgr.x NAME VALUE SOURCE OVERRIDES IGNORES debug_mgr 10/10 override mon[20/20]
debug_mon 20/20 mon
debug_ms 1/1 file
ms_type async+posix default mon

The NAME and VALUE columns tell you which options and values are currently in effect. SOURCE tells you where the value came from: "override" from our ceph daemon command above, "mon" from the cluster configuration database, and "file" from a local ceph.conf file. In the case of an override source, the OVERRIDES column tells you what the value would have been (and from where); in this case debug_mgr would have been set to 20/20 by the mon if we hadn't issued that ceph daemon ... command.

The IGNORES column indicates where there is an option that has been set to a new value but the daemon is still using an old value. This is true for lots of options that can only take effect when the daemon is restarted, such as ms_type (which controls which message passing implementation to use). You can also see that this is a read-only value from the RO column in config get command results:

$ ceph config get mgr.x WHO MASK LEVEL OPTION VALUE RO mgr advanced debug_mgr 20/20 *
mgr advanced ms_type simple *

You'll also note that the help result for ms_type tells us the same thing:

$ ceph config help ms_type ... Default: async+posix Can update at runtime: false

Configuration change history

One of the key advantages of using an external configuration management framework is that those tools usually store the declarative system configuration in a source control tool like Git. This provides a history of changes to the system so that if something goes wrong changes can be undone.

Ceph's new configuration management provides a simple version of that capability. Every configuration change in the system is recorded and easily viewable:

$ ceph config log --- 15 --- 2018-06-13 15:02:46.176060 ---

  • mgr.x/ms_type = simple
  • mgr.x/ms_type = async --- 14 --- 2018-06-13 14:52:51.877714 ---
  • mgr.x/ms_type = simple --- 13 --- 2018-06-13 14:45:33.988326 ---
  • mon/mon_data_size_warn = 107374182400 ...

The output is meant to be somewhat familiar to anyone familiar with diff output, where "+" lines indicate a new configuration entry and "-" lines indicate a removed or replaced entry (and its prior value).

The configuration of the system can be reverted to a previous state based on the numeric identifier preceding each change record. For example, to undo our changes to ms_type,

$ ceph config reset 13 $ ceph config log --- 16 --- 2018-06-13 15:05:10.960659 --- reset to 13 ---

  • mgr.x/ms_type = async --- 15 --- 2018-06-13 15:02:46.176060 ---
  • mgr.x/ms_type = simple
  • mgr.x/ms_type = async --- 14 --- 2018-06-13 14:52:51.877714 ---
  • mgr.x/ms_type = simple --- 13 --- 2018-06-13 14:45:33.988326 ---
  • mon/mon_data_size_warn = 107374182400 ...

(The net effect of resetting to 13 is that the ms_type entry is removed, even though it had two intermediate values since then.) Since the reset command is a configuration change like any other you can also undo it with another reset command.

Migrating from old configuration files

Any existing cluster is likely to have various settings in the ceph.conf files stored on each node of the system. We also provide a command to easily import these files into the configuration database.

One challenge is that not all options are suitable to be stored in the central config database. The mon_host option is a good example: it's used to bootstrap a connection to the cluster before fetching any additional configuration options. For this reason, the import command takes both the existing config file as input and generates a (hopefully shorter) config file for output that contains any options that could not be assimilated. For example,

$ cat ceph.conf [global] mon host = [osd.1] debug_osd = 0/0 [mds.a] mds invalid option = this option does not exist

$ ceph config assimilate-conf -i ceph.conf -o [global] mon_host =

[mds.a] mds_invalid_option = this option does not exist

$ ceph config get osd.1 WHO MASK LEVEL OPTION VALUE RO osd.1 advanced debug_osd 0/0

In this simple example, only the debug_osd option for osd.1 was imported; mon_host was left behind (it's needed for bootstrapping) and mds_invalid_option was left behind (it was not a recognized option).

For a cluster making a transition to a cluster-managed config, the basic process would be to run an assimilate command like the above on each host to incorporate settings into the cluster's configuration database, leaving behind only the bootstrap-related options on each host. For example,

$ cd /etc/ceph $ ceph config assimilate-conf -i ceph.conf -o $ cat # make sure it looks okay! $ mv ceph.conf

This will work in the majority of cases. However, be warned that if assimilating a configuration file will change any settings mentioned in the input, which means that if two hosts have config files setting the same option to different values, the end result will depend on the order in which the files are assimilated.

Next steps

Looking forward, the key next step is to surface all of these configuration options into the new management dashboard. There is a in-flight pull request that adds this functionality now that will provide this for the upcoming Nautilus release.