12.2.7 Luminous released

Jul 17, 2018 TheAnalyst

This is the seventh bugfix release of Luminous v12.2.x long term stable release series. This release contains several fixes for regressions in the v12.2.6 and v12.2.5 releases. We recommend that all users upgrade to v12.2.7.

note:	The v12.2.6 release has serious known regressions. While 12.2.6 wasn't formally announced, if you've still installed this release, please see the upgrade procedure below.
note:	The v12.2.5 release has a potential data corruption issue with erasure coded pools. If you ran v12.2.5 with erasure coding, please see below.

Upgrading from v12.2.6 ¶

v12.2.6 included an incomplete backport of an optimization for BlueStore OSDs that avoids maintaining both the per-object checksum and the internal BlueStore checksum. Due to the accidental omission of a critical follow-on patch, v12.2.6 corrupts (fails to update) the stored per-object checksum value for some objects. This can result in an EIO error when trying to read those objects.

If your cluster uses FileStore only, no special action is required. This problem only affects clusters with BlueStore.
If your cluster has only BlueStore OSDs (no FileStore), then you should enable the following OSD option:
osd skip data digest = true
This will avoid setting and start ignoring the full-object digests whenever the primary for a PG is BlueStore.
If you have a mix of BlueStore and FileStore OSDs, then you should enable the following OSD option:
osd distrust data digest = true
This will avoid setting and start ignoring the full-object digests in all cases. This weakens the data integrity checks for FileStore (although those checks were always only opportunistic).

If your cluster includes BlueStore OSDs and was affected, deep scrubs will generate errors about mismatched CRCs for affected objects. Currently the repair operation does not know how to correct them (since all replicas do not match the expected checksum it does not know how to proceed). These warnings are harmless in the sense that IO is not affected and the replicas are all still in sync. The number of affected objects is likely to drop (possibly to zero) on their own over time as those objects are modified. We expect to include a scrub improvement in v12.2.8 to clean up any remaining objects.

Additionally, see the notes below, which apply to both v12.2.5 and v12.2.6.

Upgrading from v12.2.5 or v12.2.6 ¶

If you used v12.2.5 or v12.2.6 in combination with erasure coded pools, there is a small risk of corruption under certain workloads. Specifically, when:

An erasure coded pool is in use
The pool is busy with successful writes
The pool is also busy with updates that result in an error result to the librados user. RGW garbage collection is the most common example of this (it sends delete operations on objects that don’t always exist.)
Some OSDs are reasonably busy. One known example of such load is FileStore splitting, although in principle any load on the cluster could also trigger the behavior.
One or more OSDs restarts.

This combination can trigger an OSD crash and possibly leave PGs in a state where they fail to peer.

Notably, upgrading a cluster involves OSD restarts and as such may increase the risk of encountering this bug. For this reason, for clusters with erasure coded pools, we recommend the following upgrade procedure to minimize risk:

Install the v12.2.7 packages.
Temporarily quiesce IO to cluster:
ceph osd pause
Restart all OSDs and wait for all PGs to become active.
Resume IO:
ceph osd unpause

This will cause an availability outage for the duration of the OSD restarts. If this in unacceptable, an more risky alternative is to disable RGW garbage collection (the primary known cause of these rados operations) for the duration of the upgrade:

1. Set ``rgw_enable_gc_threads = false`` in ceph.conf

2. Restart all radosgw daemons

3. Upgrade and restart all OSDs

4. Remove ``rgw_enable_gc_threads = false`` from ceph.conf

5. Restart all radosgw daemons

Upgrading from other versions ¶

If your cluster did not run v12.2.5 or v12.2.6 then none of the above issues apply to you and you should upgrade normally.

v12.2.7 Changelog ¶

mon/AuthMonitor: improve error message (issue#21765, pr#22963, Douglas Fuller)
osd/PG: do not blindly roll forward to log.head (issue#24597, pr#22976, Sage Weil)
osd/PrimaryLogPG: rebuild attrs from clients (issue#24768, pr#22962, Sage Weil)
osd: work around data digest problems in 12.2.6 (version 2) (issue#24922, pr#23055, Sage Weil)
rgw: objects in cache never refresh after rgw_cache_expiry_interval (issue#24346, pr#22369, Casey Bodley, Matt Benjamin)

Notable Changes in v12.2.6 ¶

note:	This is a broken release with serious known regressions. Do not install it. The release notes below are to track the fixes that were a part of 12.2.6 (and hence 12.2.7)

Auth:
- In 12.2.4 and earlier releases, keyring caps were not checked for validity, so the caps string could be anything. As of 12.2.6, caps strings are validated and providing a keyring with an invalid caps string to, e.g., “ceph auth add” will result in an error.
- CVE 2018-1128: auth: cephx authorizer subject to replay attack (issue#24836, Sage Weil)
- CVE 2018-1129: auth: cephx signature check is weak (issue#24837, Sage Weil)
- CVE 2018-10861: mon: auth checks not correct for pool ops (issue#24838, Jason Dillaman)
The config-key interface can store arbitrary binary blobs but JSON can only express printable strings. If binary blobs are present, the ‘ceph config-key dump’ command will show them as something like <<< binary blob of length N >>>.

v12.2.6 Changelog ¶

build/ops: build-integration-branch script (issue#24003, pr#21919, Nathan Cutler, Kefu Chai, Sage Weil)
cephfs-journal-tool: wait prezero ops before destroying journal (issue#20549, pr#21874, “Yan, Zheng”)
cephfs: MDSMonitor: cleanup and protect fsmap access (issue#23762, pr#21732, Patrick Donnelly)
cephfs: MDSMonitor: crash after assigning standby-replay daemon in multifs setup (issue#23762, issue#23658, pr#22603, “Yan, Zheng”)
cephfs: MDSMonitor: fix mds health printed in bad format (issue#23582, pr#21447, Patrick Donnelly)
cephfs: MDSMonitor: initialize new Filesystem epoch from pending (issue#23764, pr#21512, Patrick Donnelly)
ceph-fuse: missing dentries in readdir result (issue#23894, pr#22119, “Yan, Zheng”)
ceph-fuse: return proper exit code (issue#23665, pr#21495, Patrick Donnelly)
ceph-fuse: trim ceph-fuse -V output (issue#23248, pr#21600, Jos Collin)
ceph_test_rados_api_aio: fix race with full pool and osdmap (issue#23917, issue#23876, pr#21778, Sage Weil)
ceph-volume: error on commands that need ceph.conf to operate (issue#23941, pr#22746, Andrew Schoen)
ceph-volume: failed ceph-osd –mkfs command doesn’t halt the OSD creation process (issue#23874, pr#21746, Alfredo Deza)
client: add ceph_ll_sync_inode (issue#23291, pr#21109, Jeff Layton)
client: add client option descriptions (issue#22933, pr#21589, Patrick Donnelly)
client: anchor dentries for trimming to make cap traversal safe (issue#24137, pr#22201, Patrick Donnelly)
client: avoid freeing inode when it contains TX buffer head (issue#23837, pr#22168, Guan yunfei, “Yan, Zheng”, Jason Dillaman)
client: dirty caps may never get the chance to flush (issue#22546, pr#21278, dongdong tao)
client: fix issue of revoking non-auth caps (issue#24172, pr#22221, “Yan, Zheng”)
client: fix request send_to_auth was never really used (issue#23541, pr#21354, Zhi Zhang)
client: Fix the gid_count check (issue#23652, pr#21596, Jos Collin)
client: flush the mdlog in _fsync before waiting on unstable reqs (issue#23714, pr#21542, Jeff Layton)
client: hangs on umount if it had an MDS session evicted (issue#10915, pr#22018, Rishabh Dave)
client: void sending mds request while holding cap reference (issue#24369, pr#22354, “Yan, Zheng”)
cmake: fix the cepfs java binding build on Bionic (issue#23458, issue#24012, pr#21872, Kefu Chai, Shengjing Zhu)
cmake/modules/BuildRocksDB.cmake: enable compressions for rocksdb (issue#24025, pr#22215, Kefu Chai)
common: ARMv8 feature detection broken, leading to illegal instruction crashes (issue#23464, pr#22567, Adam Kupczyk)
common: fix BoundedKeyCounter const_pointer_iterator (issue#22139, pr#21083, Casey Bodley)
common: fix typo in rados bench write JSON output (issue#24199, pr#22391, Sandor Zeestraten)
common: partially revert 95fc248 to make get_process_name work (issue#24123, pr#22290, Mykola Golub)
core: Deleting a pool with active notify linger ops can result in seg fault (issue#23966, pr#22143, Kefu Chai, Jason Dillaman)
core: mon/MgrMonitor: change ‘unresponsive’ message to info level (issue#24222, pr#22331, Sage Weil)
core: Wip scrub omap (issue#24366, pr#22375, xie xingguo, David Zafman)
crush: fix device_class_clone for unpopulated/empty weight-sets (issue#23386, pr#22381, Sage Weil)
crush, osd: handle multiple parents properly when applying pg upmaps (issue#23921, pr#22115, xiexingguo)
doc: Fix -d description in ceph-fuse (issue#23214, pr#21616, Jos Collin)
doc:Update ceph-fuse doc (issue#23084, pr#21603, Jos Collin)
fuse: wire up fuse_ll_access (issue#23509, pr#21475, Jeff Layton)
kceph: umount on evicted client blocks forever (issue#24053, issue#24054, pr#22208, Yan, Zheng, “Yan, Zheng”)
librbd: commit IO as safe when complete if writeback cache is disabled (issue#23516, pr#22370, Jason Dillaman)
librbd: prevent watcher from unregistering with in-flight actions (issue#23955, pr#21938, Jason Dillaman)
lvm: when osd creation fails log the exception (issue#24456, pr#22641, Andrew Schoen)
mds: avoid calling rejoin_gather_finish() two times successively (issue#24047, pr#22171, “Yan, Zheng”)
mds: broadcast quota to relevant clients when quota is explicitly set (issue#24133, pr#22271, Zhi Zhang)
mds: crash when failover (issue#23518, pr#21900, “Yan, Zheng”)
mds: don’t discover inode/dirfrag when mds is in ‘starting’ state (issue#23812, pr#21990, “Yan, Zheng”)
mds: fix occasional dir rstat inconsistency between multi-MDSes (issue#23538, pr#21617, “Yan, Zheng”, Zhi Zhang)
mds: fix some memory leak (issue#24289, pr#22310, “Yan, Zheng”)
mds: fix unhealth heartbeat during rejoin (issue#23530, pr#21366, dongdong tao)
mds: handle imported session race (issue#24072, issue#24087, pr#21989, Patrick Donnelly)
mds: include nfiles/nsubdirs of directory inode in MClientCaps (issue#23855, pr#22118, “Yan, Zheng”)
mds: kick rdlock if waiting for dirfragtreelock (issue#23919, pr#21901, Patrick Donnelly)
mds: make rstat.rctime follow inodes’ ctime (issue#23380, pr#21448, “Yan, Zheng”)
mds: mark damaged if sessions’ preallocated inos don’t match inotable (issue#23452, pr#21372, “Yan, Zheng”)
mds: mark new root inode dirty (issue#23960, pr#21922, Patrick Donnelly)
mds: mds shutdown fixes and optimization (issue#23602, pr#21346, “Yan, Zheng”)
mds: misc load balancer fixes (issue#21745, pr#21412, “Yan, Zheng”, Jianyu Li)
mds: properly check auth subtree count in MDCache::shutdown_pass() (issue#23813, pr#21844, “Yan, Zheng”)
mds: properly dirty sessions opened by journal replay (issue#23625, pr#21441, “Yan, Zheng”)
mds: properly trim log segments after scrub repairs something (issue#23880, pr#21840, “Yan, Zheng”)
mds: set could_consume to false when no purge queue item actually exe… (issue#24073, pr#22176, Xuehan Xu)
mds: trim log during shutdown to clean metadata (issue#23923, pr#21899, Patrick Donnelly)
mds: underwater dentry check in CDir::_omap_fetched is racy (issue#23032, pr#21187, Yan, Zheng)
mg_read() call has wrong arguments (issue#23596, pr#21382, Nathan Cutler)
mgr/influx: Only split string on first occurence of dot (.) (issue#23996, pr#21965, Wido den Hollander)
mgr: Module ‘balancer’ has failed: could not find bucket -14 (issue#24167, pr#22308, Sage Weil)
mon: add ‘ceph osd pool get erasure allow_ec_overwrites’ command (issue#23487, pr#21378, Mykola Golub)
mon: enable level_compaction_dynamic_level_bytes for rocksdb (issue#24361, pr#22360, Kefu Chai)
mon: handle bad snapshot removal reqs gracefully (issue#18746, pr#21717, Paul Emmerich)
mon: High MON cpu usage when cluster is changing (issue#23713, pr#21968, Sage Weil, Xiaoxi CHEN)
mon/MDSMonitor: do not send redundant MDS health messages to cluster log (issue#24308, pr#22558, Sage Weil)
msg/async/AsyncConnection: Fix FPE in process_connection (issue#23618, pr#21376, Brad Hubbard)
os/bluestore: alter the allow_eio policy regarding kernel’s error list (issue#23333, pr#21405, Radoslaw Zarzynski)
os/bluestore/bluefs_types: make block_mask 64-bit (issue#23840, pr#21740, Sage Weil)
os/bluestore: fix exceeding the max IO queue depth in KernelDevice (issue#23246, pr#21407, Radoslaw Zarzynski)
os/bluestore: fix SharedBlobSet refcounting race (issue#24319, pr#22650, Sage Weil)
os/bluestore: simplify and fix SharedBlob::put() (issue#24211, pr#22351, Sage Weil)
osdc/Objecter: fix recursive locking in _finish_command (issue#23940, pr#21939, Sage Weil)
osdc/Objecter: prevent double-invocation of linger op callback (issue#23872, pr#21752, Jason Dillaman)
osd: do not crash on empty snapset (issue#23851, pr#21638, Mykola Golub, Igor Fedotov)
osd: Don’t evict even when preemption has restarted with smaller chunk (issue#22881, issue#23909, issue#23646, pr#22044, Sage Weil, fang yuxiang, Jianpeng Ma, kungf, xie xingguo, David Zafman)
osd/ECBackend: only check required shards when finishing recovery reads (issue#23195, pr#21911, Josh Durgin, Kefu Chai)
osd: increase default hard pg limit (issue#24243, pr#22592, Josh Durgin)
osd/OSDMap: check against cluster topology changing before applying pg upmaps (issue#23878, pr#21818, xiexingguo)
osd/PG: fix DeferRecovery vs AllReplicasRecovered race (issue#23860, pr#21964, Sage Weil)
osd/PG: fix uninit read in Incomplete::react(AdvMap&) (issue#23980, pr#21993, Sage Weil)
osd/PrimaryLogPG: avoid infinite loop when flush collides with write … (issue#23664, pr#21764, Sage Weil)
osd: publish osdmap to OSDService before starting wq threads (issue#21977, pr#21737, Sage Weil)
osd: Warn about objects with too many omap entries (issue#23784, pr#21518, Brad Hubbard)
qa: disable -Werror when compiling env_librados_test (issue#23786, pr#21655, Kefu Chai)
qa: fix blacklisted check for test_lifecycle (issue#23975, pr#21921, Patrick Donnelly)
qa: remove racy/buggy test_purge_queue_op_rate (issue#23829, pr#21841, Patrick Donnelly)
qa/suites/rbd/basic/msgr-failures: remove many.yaml (issue#23789, pr#22128, Sage Weil)
qa: wait longer for osd to flush pg stats (issue#24321, pr#22296, Kefu Chai)
qa/workunits/mon/test_mon_config_key.py fails on master (issue#23622, pr#21368, Sage Weil)
qa/workunits/rbd: adapt import_export test to handle multiple units (issue#24733, pr#22911, Jason Dillaman)
qa/workunits/rbd: potential race in mirror disconnect test (issue#23938, pr#21869, Mykola Golub)
radosgw-admin sync status improvements (issue#20473, pr#21908, lvshanchun, Casey Bodley)
rbd: improve ‘import-diff’ corrupt input error messages (issue#18844, issue#23038, pr#21316, PCzhangPC, songweibin, Jason Dillaman)
rbd-mirror: ensure remote demotion is replayed locally (issue#24009, pr#22142, Jason Dillaman)
rbd-nbd can deadlock in logging thread (issue#23143, pr#21705, Sage Weil)
rbd: python bindings fixes and improvements (issue#23609, pr#21725, Ricardo Dias)
rbd: [rbd-mirror] asok hook for image replayer not re-registered after bootstrap (issue#23888, pr#21726, Jason Dillaman)
rbd: [rbd-mirror] local tag predecessor mirror uuid is incorrectly replaced with remote (issue#23876, pr#21741, Jason Dillaman)
rbd: [rbd-mirror] potential deadlock when running asok ‘flush’ command (issue#24141, pr#22180, Mykola Golub)
rbd: [rbd-mirror] potential races during PoolReplayer shut-down (issue#24008, pr#22172, Jason Dillaman)
rgw: add buffering filter to compression for fetch_remote_obj (issue#23547, pr#21758, Casey Bodley)
rgw: add configurable AWS-compat invalid range get behavior (issue#24317, pr#22302, Matt Benjamin)
rgw: admin rest api shouldn’t return error when getting user’s stats if (issue#23821, pr#21661, Zhang Shaowen)
rgw: Allow swift acls to be deleted (issue#22897, pr#22465, Marcus Watts)
rgw: aws4 auth supports PutBucketRequestPayment (issue#23803, pr#21660, Casey Bodley)
rgw: beast frontend can listen on multiple endpoints (issue#22779, pr#21568, Casey Bodley)
rgw: Bucket lifecycles stick around after buckets are deleted (issue#19632, pr#22551, Wei Qiaomiao)
rgw: Do not modify email if argument is not set (issue#24142, pr#22352, Volker Theile)
rgw: do not reflect period if not current (issue#22844, pr#21735, Tianshan Qu)
rgw: es module: set compression type correctly (issue#22758, pr#21736, Abhishek Lekshmanan)
rgw_file: conditionally unlink handles when direct deleted (issue#23299, pr#21438, Matt Benjamin)
rgw: fix bi_list to reset is_truncated flag if it skips entires (issue#22721, pr#21669, Orit Wasserman)
rgw: fix ‘copy part’ without ‘x-amz-copy-source-range’ when compressi… (issue#23196, pr#22438, fang yuxiang)
rgw: fix error handling for GET with ?torrent (issue#23506, pr#21674, Casey Bodley)
rgw: fix use of libcurl with empty header values (issue#23663, pr#21738, Casey Bodley)
rgw:lc: RGWPutLC return ERR_MALFORMED_XML when missingtag in… (issue#21377, pr#19884, Shasha Lu)
rgw: making implicit_tenants backwards compatible (issue#24348, pr#22363, Marcus Watts)
rgw: Misnamed S3 operation (issue#24061, pr#21917, xiangxiang)
rgw: move all pool creation into rgw_init_ioctx (issue#23480, pr#21675, Casey Bodley)
rgw: radosgw-admin should not use metadata cache for readonly commands (issue#23468, pr#21437, Orit Wasserman)
rgw: raise log level on coroutine shutdown errors (issue#23974, pr#21792, Casey Bodley)
rgw: return EINVAL if max_keys can not convert correctly (issue#23586, pr#21435, yuliyang)
rgw: rgw_statfs should report the correct stats (issue#22202, pr#21724, Supriti Singh)
rgw: trim all spaces inside a metadata value (issue#23301, pr#22177, Orit Wasserman)
slow mon ops from osd_failure (issue#24322, pr#22568, Sage Weil)
table of contents doesn’t render for luminous/jewel docs (issue#23780, pr#21502, Alfredo Deza)
test/librados: increase pgp_num along with pg_num (issue#23763, pr#21556, Kefu Chai)
test/rgw: fix for bucket checkpoints (issue#24212, pr#22541, Casey Bodley)
tests: filestore journal replay does not guard omap operations (issue#22920, pr#21547, Sage Weil)
tools: ceph-disk: write log to /var/log/ceph not to /var/run/ceph (issue#24041, pr#21870, Kefu Chai)
tools: ceph-fuse: getgroups failure causes exception (issue#23446, pr#21687, Jeff Layton)