v14.2.8 Nautilus released

Mar 3, 2020 TheAnalyst

This is the eighth update to the Ceph Nautilus release series. This release fixes issues across multiple subsystems. We recommend that all users upgrade to this release.

Notable Changes ¶

The default value of bluestore_min_alloc_size_ssd has been changed to 4KiB to improve performance across all workloads.
The following OSD memory config options related to BlueStore cache autotuning can now be configured during runtime:
- osd_memory_base (default: 768 MB)
- osd_memory_cache_min (default: 128 MB)
- osd_memory_expected_fragmentation (default: 0.15)
- osd_memory_target (default: 4 GB)
The above options can be set with:
ceph config set osd <option> <value>
The Manager now accepts profile rbd and profile rbd-read-only user caps. These caps can be used to provide users access to Manager-based RBD functionality such as rbd perf image iostat an rbd perf image iotop.
The configuration value osd_calc_pg_upmaps_max_stddev used for upmap balancing has been removed. Instead use the mgr balancer config upmap_max_deviation which now is an integer number of PGs of deviation from the target PGs per OSD. This can be set with a command of the form ceph config set mgr mgr/balancer/upmap_max_deviation 2. The default upmap_max_deviation is 5. There are situations where CRUSH rules will not allow a pool to ever have perfect balanced PGs. For example, if CRUSH requires one replica in each of three racks, but there are fewer OSDs in one of the racks. In such cases, the configuration value can be increased.
RGW: a mismatch between the bucket notification documentation and the actual message format was fixed. This means that any endpoints receiving bucket notifications will now receive the same notifications inside a JSON array named Records. Note that this does not affect pulling bucket notification from a subscription in a pubsub zone, as these are already wrapped inside that array.
CephFS: multiple active MDS forward scrub is now rejected. Scrub currently only is permitted on a file system with a single rank. Reduce the ranks to one via ceph fs set <fs_name> max_mds 1.
Ceph now refuses to create a file system with a default EC data pool. For further explanation, see: https://docs.ceph.com/docs/nautilus/cephfs/createfs/#creating-pools
Ceph will now issue a health warning if a RADOS pool has a pg_num value that is not a power of two. This can be resolved by adjusting the pool to a nearby power of two:
ceph osd pool set <pool-name> pg_num <new-pg-num>
Alternatively, the warning can be silenced with:
ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false

Changelog ¶

bluestore: common/options: bluestore 4k min_alloc_size for SSD (pr#32998, Mark Nelson, Sage Weil)
bluestore: os/bluestore: Add config observer for osd memory specific options (pr#31852, Sridhar Seshasayee)
bluestore: os/bluestore/BlueStore.cc: set priorities for compression stats (pr#32845, Neha Ojha)
bluestore: os/bluestore: default bluestore_block_size 1T -> 100G (pr#32283, Sage Weil)
build/ops: cmake: remove seastar tests from “make check” (pr#32658, Kefu Chai)
build/ops: install-deps,rpm: enable devtoolset-8 on aarch64 also (issue#38892, pr#32651, Kefu Chai)
build/ops: rpm: add rpm-build to SUSE-specific make check deps (pr#32208, Nathan Cutler)
build/ops: switch to boost 1.72 (pr#32441, Willem Jan Withagen, Kefu Chai)
build/ops: tools/setup-virtualenv.sh: do not default to python2.7 (pr#30739, Nathan Cutler)
cephfs: cephfs-journal-tool: fix crash and usage (pr#32913, Xiubo Li)
cephfs: client: Add is_dir() check before changing directory (pr#32916, Varsha Rao)
cephfs: client: add procession of SEEK_HOLE and SEEK_DATA in lseek (pr#30764, Shen Hang)
cephfs: client: add warning when cap != in->auth_cap (pr#32065, Shen Hang)
cephfs: client: EINVAL may be returned when offset is 0 (pr#30762, wenpengLi)
cephfs: client: fix lazyio_synchronize() to update file size and libcephfs: Add Tests for LazyIO (pr#30769, Sidharth Anupkrishnan)
cephfs: client: _readdir_cache_cb() may use the readdir_cache already clear (issue#41148, pr#30763, huanwen ren)
cephfs: client: remove Inode.dir_contacts field and handle bad whence value to llseek gracefully (pr#30766, Jeff Layton)
cephfs,common: osdc/objecter: Fix last_sent in scientific format and add age to ops (pr#31081, Varsha Rao)
cephfs: disallow changing fuse_default_permissions option at runtime (pr#32915, Zhi Zhang)
cephfs: mds: add command that config individual client session (issue#40811, pr#32245, “Yan, Zheng”)
cephfs: mds: “apply configuration changes through MDSRank” and “recall caps from quiescent sessions” and “drive cap recall while dropping cache” (pr#30761, Patrick Donnelly, Jeff Layton)
cephfs: mds: fix assert(omap_num_objs <= MAX_OBJECTS) of OpenFileTable (pr#32756, “Yan, Zheng”)
cephfs: mds: fix revoking caps after after stale->resume circle (pr#32909, “Yan, Zheng”)
cephfs: mds: free heap memory may grow too large for some workloads (pr#31802, Patrick Donnelly)
cephfs: MDSMonitor: warn if a new file system is being created with an EC default data pool (pr#32600, Patrick Donnelly)
cephfs: mds: no assert on frozen dir when scrub path (pr#32071, Zhi Zhang)
cephfs: mds: note client features when rejecting client (pr#32914, Patrick Donnelly)
cephfs: mds/OpenFileTable: match MAX_ITEMS_PER_OBJ to osd_deep_scrub_large_omap_object_key_threshold (pr#32921, Vikhyat Umrao, Varsha Rao)
cephfs: mds: properly evaluate unstable locks when evicting client (pr#32073, “Yan, Zheng”)
cephfs: mds: reject forward scrubs when cluster has multiple active MDS (more than one rank) (pr#32602, Patrick Donnelly, Milind Changire)
cephfs: mds: reject sessionless messages (issue#40784, pr#30843, “Yan, Zheng”, Xiao Guodong, Shen Hang)
cephfs: mds: remove unnecessary debug warning (pr#32077, Patrick Donnelly)
cephfs: mds returns -5(EIO) error when the deleted file does not exist (pr#30767, huanwen ren)
cephfs: mds: split the dir if the op makes it oversized, because some ops maybe in flight (pr#31302, simon gao)
cephfs: mds: tolerate no snaprealm encoded in on-disk root inode (pr#32079, “Yan, Zheng”)
cephfs: mgr: “mds metadata” to setup new DaemonState races with fsmap (pr#31905, Patrick Donnelly)
cephfs: mgr/volumes: allow setting uid, gid of subvolume and subvolume group during creation (issue#42923, pr#31741, Venky Shankar, Jos Collin)
cephfs: mgr/volumes: fetch trash and clone entries without blocking volume access (issue#44282, pr#33526, Venky Shankar)
cephfs: mgr/volumes: fs subvolume resize command (pr#31332, Jos Collin)
cephfs: mgr/volumes: misc fix and feature enhancements (issue#42646, issue#43645, pr#33122, Rishabh Dave, Joshua Schmid, Venky Shankar, Ramana Raja, Jos Collin)
cephfs: mgr/volumes: unregister job upon async threads exception (issue#44315, pr#33569, Venky Shankar)
cephfs: mon: print FSMap regardless of file system count (pr#32912, Patrick Donnelly)
cephfs: pybind/mgr/volumes: idle connection drop is not working (pr#33116, Patrick Donnelly)
cephfs: RuntimeError: Files in flight high water is unexpectedly low (0 / 6) (pr#33115, Patrick Donnelly)
ceph.in: check ceph-conf returncode (pr#31367, Dimitri Savineau)
ceph-monstore-tool: correct the key for storing mgr_command_descs (pr#33278, Kefu Chai)
ceph-volume: add db and wal support to raw mode (pr#32979, Sébastien Han)
ceph-volume: add methods to pass filters to pvs, vgs and lvs commands (pr#33217, Rishabh Dave)
ceph-volume: add raw (–bluestore) mode (pr#32733, Jan Fajerski, Sage Weil)
ceph-volume: add sizing arguments to prepare (pr#33231, Jan Fajerski)
ceph-volume: allow raw block devices everywhere (pr#32868, Jan Fajerski)
ceph-volume: assume msgrV1 for all branches containing mimic (pr#31616, Jan Fajerski)
ceph-volume: avoid calling zap_lv with a LV-less VG (pr#33297, Jan Fajerski)
ceph-volume: batch bluestore fix create_lvs call (pr#33232, Jan Fajerski)
ceph-volume: batch bluestore fix create_lvs call (pr#33301, Jan Fajerski)
ceph-volume/batch: fail on filtered devices when non-interactive (pr#33202, Jan Fajerski)
ceph-volume: Dereference symlink in lvm list (pr#32877, Benoît Knecht)
ceph-volume: don’t remove vg twice when zapping filestore (pr#33337, Jan Fajerski)
ceph-volume: finer grained availability notion in inventory (pr#33240, Jan Fajerski)
ceph-volume: fix has_bluestore_label() function (pr#33239, Guillaume Abrioux)
ceph-volume: fix is_ceph_device for lvm batch (pr#33253, Jan Fajerski, Dimitri Savineau)
ceph-volume: fix the integer overflow (pr#32873, dongdong tao)
ceph-volume: import mock.mock instead of unittest.mock (py2) (pr#32870, Jan Fajerski)
ceph-volume/lvm/activate.py: clarify error message: fsid refers to osd_fsid (pr#32864, Yaniv Kaul)
ceph-volume: lvm/deactivate: add unit tests, remove –all (pr#32863, Jan Fajerski)
ceph-volume: lvm deactivate command (pr#33209, Jan Fajerski)
ceph-volume: make get_devices fs location independent (pr#33200, Jan Fajerski)
ceph-volume: minor clean-up of “simple scan” subcommand help (pr#32556, Michael Fritch)
ceph-volume: pass journal_size as Size not string (pr#33334, Jan Fajerski)
ceph-volume: refactor listing.py + fixes (pr#33238, Jan Fajerski, Rishabh Dave, Guillaume Abrioux)
ceph-volume: reject disks smaller then 5GB in inventory (issue#40776, pr#31554, Jan Fajerski)
ceph-volume: skip osd creation when already done (pr#33242, Guillaume Abrioux)
ceph-volume/test: patch VolumeGroups (pr#32558, Jan Fajerski)
ceph-volume: use correct extents if using db-devices and >1 osds_per_device (pr#32874, Fabian Niepelt)
ceph-volume: use fsync for dd command (pr#31553, Rishabh Dave)
ceph-volume: use get_device_vgs in has_common_vg (pr#33254, Jan Fajerski)
ceph-volume: util: look for executable in $PATH (pr#32860, Shyukri Shyukriev)
ceph-volume/zfs: add the inventory command (pr#31295, Willem Jan Withagen)
common/admin_socket: Increase socket timeouts (pr#32063, Brad Hubbard)
common/bl: fix the dangling last_p issue (pr#33277, Radoslaw Zarzynski)
common/config: update values when they are removed via mon (pr#32846, Sage Weil)
common: FIPS: audit and switch some memset & bzero users (pr#32167, Radoslaw Zarzynski)
common: fix deadlocky inflight op visiting in OpTracker (pr#32858, Radoslaw Zarzynski)
common/options: remove unused ms_msgr2_{sign,encrypt} (pr#31850, Ilya Dryomov)
common/util: use ifstream to read from /proc files (pr#32901, Kefu Chai, songweibin)
core: auth/Crypto: fallback to /dev/urandom if getentropy() fails (pr#31301, Kefu Chai)
core: mon: keep v1 address type when explicitly set (pr#32028, Ricardo Dias)
core: mon/OSDMonitor: Fix pool set target_size_bytes (etc) with unit suffix (pr#31740, Prashant D)
core: osd/OSDMap: health alert for non-power-of-two pg_num (pr#30689, Sage Weil)
crush/CrushWrapper: behave with empty weight vector (pr#32905, Kefu Chai)
doc/cephfs/client-auth: description and example are inconsistent (pr#32781, Ilya Dryomov)
doc/cephfs: improve add/remove MDS section (issue#39620, pr#31116, Patrick Donnelly)
doc/ceph-fuse: mention -k option in ceph-fuse man page (pr#30765, Rishabh Dave)
doc/ceph-volume: initial docs for zfs/inventory and zfs/api (pr#32746, Willem Jan Withagen)
doc: remove invalid option mon_pg_warn_max_per_osd (pr#31300, zhang daolong)
doc/_templates/page.html: redirect to etherpad (pr#32248, Neha Ojha)
doc: wrong datatype describing crush_rule (pr#32254, Kefu Chai)
global: disable THP for Ceph daemons (pr#31646, Patrick Donnelly, Mark Nelson)
kv: fix shutdown vs async compaction (pr#32715, Sage Weil)
librbd: diff iterate with fast-diff now correctly includes parent (pr#32469, Jason Dillaman)
librbd: fix rbd_open_by_id, rbd_open_by_id_read_only (pr#32837, yangjun)
librbd: remove pool objects when removing a namespace (pr#32839, Jason Dillaman)
librbd: skip stale child with non-existent pool for list descendants (pr#32841, songweibin)
librbd: support compression allocation hints to the OSD (pr#32842, Jason Dillaman)
mgr: add ‘rbd’ profiles to support ‘rbd_support’ module commands (pr#32086, Jason Dillaman)
mgr/alerts: simple health alerts (pr#30820, Sage Weil)
mgr: Balancer fixes (pr#31956, Neha Ojha, Kefu Chai, David Zafman)
mgr/DaemonServer: fix ‘osd ok-to-stop’ for EC pools (pr#32844, Sage Weil)
mgr/dashboard: add debug mode, and accept expected exception when SSL handshaking (pr#31190, Kefu Chai, Ernesto Puerta, Joshua Schmid)
mgr/dashboard: block mirroring page results in internal server error (pr#32133, Jason Dillaman)
mgr/dashboard: check embedded Grafana dashboard references (issue#40008, pr#31808, Kiefer Chang)
mgr/dashboard: check if user has config-opt permissions (pr#32827, Alfonso Martínez)
mgr/dashboard: Cross sign button not working for some modals (pr#32012, Ricardo Marques)
mgr/dashboard: Dashboard can’t handle self-signed cert on Grafana API (pr#31792, Volker Theile)
mgr/dashboard: disable ‘Add Capability’ button in rgw user edit (pr#32930, Alfonso Martínez)
mgr/dashboard: fix restored RBD image naming issue (pr#31810, Kiefer Chang)
mgr/dashboard: grafana charts match time picker selection (pr#31999, Alfonso Martínez)
mgr/dashboard,grafana: remove shortcut menu (pr#31980, Ernesto Puerta)
mgr/dashboard: Handle always-on Ceph Manager modules correctly (pr#31782, Volker Theile)
mgr/dashboard: Hardening accessing the metadata (pr#32128, Volker Theile)
mgr/dashboard: iSCSI targets not available if any gateway is down (and more…) (pr#32304, Ricardo Marques)
mgr/dashboard: KeyError on dashboard reload (pr#32233, Patrick Seidensal)
mgr/dashboard: key-value-table doesn’t render booleans (pr#31789, Patrick Seidensal)
mgr/dashboard: Remove compression mode unset in pool from (pr#31784, Stephan Müller)
mgr/dashboard: show “Rename” in header & button when renaming RBD (pr#31779, Alfonso Martínez)
mgr/dashboard: sort monitors by open sessions correctly (pr#31791, Alfonso Martínez)
mgr/dashboard: Standby Dashboards don’t handle all requests properly (pr#32299, Volker Theile)
mgr/dashboard: Trim IQN on iSCSI target form (pr#31942, Ricardo Marques)
mgr/dashboard: Unable to set boolean values to false when default is true (pr#31941, Ricardo Marques)
mgr/dashboard: Using wrong identifiers in RGW user/bucket datatables (pr#32888, Volker Theile)
mgr/devicehealth: ensure we don’t store empty objects (pr#31735, Sage Weil)
mgr/devicehealth: fix telemetry stops sending device reports after 48 hours (pr#33346, Yaarit Hatuka, Sage Weil)
mgr: drop reference to msg on return (pr#33498, Patrick Donnelly)
mgr/MgrClient: fix open condition (pr#32769, Sage Weil)
mgr/pg_autoscaler: calculate pool_pg_target using pool size (pr#33170, Dan van der Ster)
mgr/pg_autoscaler: default to pg_num[_min] = 16 (pr#32069, Sage Weil)
mgr/pg_autoscaler: default to pg_num[_min] = 32 (pr#32931, Neha Ojha)
mgr/pg_autoscaler: implement shutdown method (pr#32068, Patrick Donnelly)
mgr/pg_autoscaler: only generate target_* health warnings if targets set (pr#32067, Sage Weil)
mgr/prometheus: assign a value to osd_dev_node when obj_store is not filestore or bluestore (pr#31556, jiahuizeng)
mgr/prometheus: report per-pool pg states (pr#33157, Aleksei Zakharov)
mgr/telemetry: anonymizing smartctl report itself (pr#33082, Yaarit Hatuka)
mgr/telemetry: check get_metadata return val (pr#33095, Yaarit Hatuka)
mgr/telemetry: split entity_name only once (handle ids with dots) (pr#33168, Dan Mick)
mgr/zabbix: Adds possibility to send data to multiple zabbix servers (pr#30009, slivik, Jakub Sliva)
mon/ConfigMonitor: fix handling of NO_MON_UPDATE settings (pr#32856, Sage Weil)
mon/ConfigMonitor: only propose if leader (pr#33155, Sage Weil)
mon: Don’t put session during feature change (pr#33152, Brad Hubbard)
mon: elector: return after triggering a new election (pr#33007, Greg Farnum)
monitoring: wait before firing osd full alert (pr#32070, Patrick Seidensal)
mon/MgrMonitor.cc: add always_on_modules to the output of “ceph mgr module ls” (pr#32997, Neha Ojha)
mon/MgrMonitor.cc: warn about missing mgr in a cluster with osds (pr#33142, Neha Ojha)
mon/OSDMonitor: Don’t update mon cache settings if rocksdb is not used (pr#32520, Sridhar Seshasayee, Sage Weil)
mon/OSDMonitor: fix format error ceph osd stat –format json (pr#32062, Zheng Yin)
mon/PGMap.h: disable network stats in dump_osd_stats (pr#32466, Neha Ojha, David Zafman)
mon: remove the restriction of address type in init_with_hosts (pr#31844, Hao Xiong)
mon/Session: only index osd ids >= 0 (pr#32908, Sage Weil)
mount.ceph: give a hint message when no mds is up or cluster is laggy (pr#32910, Xiubo Li)
mount.ceph: remove arbitrary limit on size of name= option (pr#32807, Jeff Layton)
msg: async/net_handler.cc: Fix compilation (pr#31736, Carlos Valiente)
osd: add osd_fast_shutdown option (default true) (pr#32743, Sage Weil)
osd: Allow 64-char hostname to be added as the “host” in CRUSH (pr#33147, Michal Skalski)
osd: Diagnostic logging for upmap cleaning (pr#32716, David Zafman)
osd/OSD: enhance osd numa affinity compatibility (pr#32843, luo rixin, Dai zhiwei)
osd/PeeringState.cc: don’t let num_objects become negative (pr#32857, Neha Ojha)
osd/PeeringState.cc: skip peer_purged when discovering all missing (pr#32847, Neha Ojha)
osd/PeeringState: do not exclude up from acting_recovery_backfill (pr#32064, Nathan Cutler, xie xingguo)
osd/PrimaryLogPG: skip obcs that don’t exist during backfill scan_range (pr#31028, Sage Weil)
osd: set affinity for *all* threads (pr#31359, Sage Weil)
osd: set collection pool opts on collection create, pg load (pr#32123, Sage Weil)
osd: Use physical ratio for nearfull (doesn’t include backfill resserve) (pr#32773, David Zafman)
pybind/mgr: Cancel output color control (pr#31697, Zheng Yin)
rbd: creating thick-provision image progress percent info exceeds 100% (pr#32840, Xiangdong Mu)
rbd: librbd: don’t call refresh from mirror::GetInfoRequest state machine (pr#32900, Mykola Golub)
rbd-mirror: clone v2 mirroring improvements (pr#31518, Mykola Golub)
rbd-mirror: fix ‘rbd mirror status’ asok command output (pr#32447, Mykola Golub)
rbd-mirror: make logrotate work (pr#32593, Mykola Golub)
rgw: add bucket permission verify when copy obj (pr#31089, NancySu05)
rgw: Adding ‘iam’ namespace for Role and User Policy related REST APIs (pr#32437, Pritha Srivastava)
rgw: adding mfa code validation when bucket versioning status is changed (pr#32759, Pritha Srivastava)
rgw: add num_shards to radosgw-admin bucket stats (pr#31182, Paul Emmerich)
rgw: allow reshard log entries for non-existent buckets to be cancelled (pr#32056, J. Eric Ivancich)
rgw: auto-clean reshard queue entries for non-existent buckets (pr#32055, J. Eric Ivancich)
rgw: build_linked_oids_for_bucket and build_buckets_instance_index should return negative value if it fails (pr#32820, zhangshaowen)
rgw: crypt: permit RGW-AUTO/default with SSE-S3 headers (pr#31862, Matt Benjamin)
rgw: data sync markers include timestamp from datalog entry (pr#32819, Casey Bodley)
rgw_file: avoid string::front() on empty path (pr#33008, Matt Benjamin)
rgw: fix a bug that bucket instance obj can’t be removed after resharding completed (pr#32822, zhang Shaowen)
rgw: fix an endless loop error when to show usage (pr#31684, lvshuhua)
rgw: fix bugs in listobjectsv1 (pr#32239, Albin Antony)
rgw: fix compile errors with boost 1.70 (pr#31289, Casey Bodley)
rgw: fix data consistency error casued by rgw sent timeout (pr#32821, 李纲彬82225)
rgw: fix list versions starts with version_id=null (pr#30743, Tianshan Qu)
rgw: fix one part of the bulk delete(RGWDeleteMultiObj_ObjStore_S3) fails but no error messages (pr#33151, Snow Si)
rgw: fix opslog operation field as per Amazon s3 (issue#20978, pr#32834, Jiaying Ren)
rgw: fix refcount tags to match and update object’s idtag (pr#30741, J. Eric Ivancich)
rgw: fix rgw crash when token is not base64 encode (pr#32050, yuliyang)
rgw: gc remove tag after all sub io finish (issue#40903, pr#30733, Tianshan Qu)
rgw: Incorrectly calling ceph::buffer::list::decode_base64 in bucket policy (pr#32832, GaryHyg)
rgw: maybe coredump when reload operator happened (pr#33149, Richard Bai(白学余))
rgw: move forward marker even in case of many rgw.none indexes (pr#32824, Ilsoo Byun)
rgw multisite: fixes for concurrent version creation (pr#32057, Or Friedmann, Casey Bodley)
rgw: prevent bucket reshard scheduling if bucket is resharding (pr#31298, J. Eric Ivancich)
rgw/pubsub: fix records/event json format to match documentation (pr#32221, Yuval Lifshitz)
rgw: radosgw-admin: sync status displays id of shard furthest behind (pr#32818, Casey Bodley)
rgw: return error if lock log shard fails (pr#32825, zhangshaowen)
rgw/rgw_rest_conn.h: fix build with clang (pr#32489, Bernd Zeimetz)
rgw: Select the std::bitset to resolv ambiguity (pr#32504, Willem Jan Withagen)
rgw: support radosgw-admin zone/zonegroup placement get command (pr#32835, jiahuizeng)
rgw: the http response code of delete bucket should not be 204-no-content (pr#32833, Chang Liu)
rgw: update s3-test download code for s3-test tasks (pr#32229, Ali Maredia)
rgw: update the hash source for multipart entries during resharding (pr#33183, dongdong tao)
rgw: url encode common prefixes for List Objects response (pr#32058, Abhishek Lekshmanan)
rgw: when resharding store progress json (pr#31683, Mark Kogan, Mark Nelson)
selinux: Allow ceph to read udev db (pr#32259, Boris Ranto)