Cache trimming is now throttled. Dropping the MDS cache via the “ceph tell mds.<foo> cache drop” command or large reductions in the cache size will no longer cause service unavailability.
Behavior with recalling caps has been significantly improved to not attempt recalling too many caps at once, leading to instability. MDS with a large cache (64GB+) should be more stable.
MDS now provides a config option “mds_max_caps_per_client” (default: 1M) to limit the number of caps a client session may hold. Long running client sessions with a large number of caps have been a source of instability in the MDS when all of these caps need to be processed during certain session events. It is recommended to not unnecessarily increase this value.
The “mds_recall_state_timeout” config parameter has been removed. Late client recall warnings are now generated based on the number of caps the MDS has recalled which have not been released. The new config parameters “mds_recall_warning_threshold” (default: 32K) and “mds_recall_warning_decay_rate” (default: 60s) set the threshold for this warning.
The “cache drop” admin socket command has been removed. The “ceph tell mds.X cache drop” remains.
A health warning is now generated if the average osd heartbeat ping time exceeds a configurable threshold for any of the intervals computed. The OSD computes 1 minute, 5 minute and 15 minute intervals with average, minimum and maximum values. New configuration option “mon_warn_on_slow_ping_ratio” specifies a percentage of “osd_heartbeat_grace” to determine the threshold. A value of zero disables the warning. A new configuration option “mon_warn_on_slow_ping_time”, specified in milliseconds, overrides the computed value, causing a warning when OSD heartbeat pings take longer than the specified amount. A new admin command “ceph daemon mgr.# dump_osd_network [threshold]” lists all connections with a ping time longer than the specified threshold or value determined by the config options, for the average for any of the 3 intervals. A new admin command ceph daemon osd.# dump_osd_network [threshold]” does the same but only including heartbeats initiated by the specified OSD.
The default value of the “osd_deep_scrub_large_omap_object_key_threshold” parameter has been lowered to detect an object with large number of omap keys more easily.
radosgw-admin introduces two subcommands that allow the managing of expire-stale objects that might be left behind after a bucket reshard in earlier versions of RGW. One subcommand lists such objects and the other deletes them. Read the troubleshooting section of the dynamic resharding docs for details.