CVE-2024-53054 Information

Description

In the Linux kernel the following vulnerability has been resolved:

cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction

A hung_task problem shown below was found:

INFO: task kworker/0:0:8 blocked for more than 327 seconds. cho 0 > /proc/sys/kernel/hung_task_timeout_secs\ disables this message. Workqueue: events cgroup_bpf_release Call Trace: __schedule+0x5a2/0x2050 ? find_held_lock+0x33/0x100 ? wq_worker_sleeping+0x9e/0xe0 schedule+0x9f/0x180 schedule_preempt_disabled+0x25/0x50 __mutex_lock+0x512/0x740 ? cgroup_bpf_release+0x1e/0x4d0 ? cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? cgroup_bpf_release+0x1e/0x4d0 ? mutex_lock_nested+0x2b/0x40 ? __pfx_delay_tsc+0x10/0x10 mutex_lock_nested+0x2b/0x40 cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0 ? process_scheduled_works+0x161/0x8a0 process_scheduled_works+0x23a/0x8a0 worker_thread+0x231/0x5b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x14d/0x1c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x59/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30

This issue can be reproduced by the following pressuse test:

  1. A large number of cpuset cgroups are deleted.
  2. Set cpu on and off repeatly.
  3. Set watchdog_thresh repeatly. The scripts can be obtained at LINK mentioned above the signature.

The reason for this issue is cgroup_mutex and cpu_hotplug_lock are acquired in different tasks which may lead to deadlock. It can lead to a deadlock through the following steps:

  1. A large number of cpusets are deleted asynchronously which puts a large number of cgroup_bpf_release works into system_wq. The max_active of system_wq is WQ_DFL_ACTIVE(256). Consequently all active works are cgroup_bpf_release works and many cgroup_bpf_release works will be put into inactive queue. As illustrated in the diagram there are 256 (in the acvtive queue) + n (in the inactive queue) works.
  2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put smp_call_on_cpu work into system_wq. However step 1 has already filled system_wq ‘sscs.work’ is put into inactive queue. ‘sscs.work’ has to wait until the works that were put into the inacvtive queue earlier have executed (n cgroup_bpf_release) so it will be blocked for a while.
  3. Cpu offline requires cpu_hotplug_lock.write which is blocked by step 2.
  4. Cpusets that were deleted at step 1 put cgroup_release works into cgroup_destroy_wq. They are competing to get cgroup_mutex all the time. When cgroup_metux is acqured by work at css_killed_work_fn it will call cpuset_css_offline which needs to acqure cpu_hotplug_lock.read. However cpuset_css_offline will be blocked for step 3.
  5. At this moment there are 256 works in active queue that are cgroup_bpf_release they are attempting to acquire cgroup_mutex and as a result all of them are blocked. Consequently sscs.work can not be executed. Ultimately this situation leads to four processes being blocked forming a deadlock.

system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4) … 2000+ cgroups deleted asyn 256 actives + n inactives __lockup_detector_reconfigure P(cpu_hotplug_lock.read) put sscs.work into system_wq 256 + n + 1(sscs.work) sscs.work wait to be executed warting sscs.work finish percpu_down_write P(cpu_hotplug_lock.write) …blocking… css_killed_work_fn P(cgroup_mutex) cpuset_css_offline P(cpu_hotplug_lock.read) …blocking… 256 cgroup_bpf_release mutex_lock(&cgroup_mutex); ..blocking…

To fix the problem place cgroup_bpf_release works on a dedicated workqueue which can break the loop and solve the problem. System wqs are for misc things which shouldn’t create a large number of concurrent work items. If something is going to generate >

truncated—

Reference

https://git.kernel.org/stable/c/71f14a9f5c7db72fdbc56e667d4ed42a1a760494 https://git.kernel.org/stable/c/0d86cd70fc6a7ba18becb52ad8334d5ad3eca530 https://git.kernel.org/stable/c/6dab3331523ba73db1345d19e6f586dcd5f6efb4 https://git.kernel.org/stable/c/117932eea99b729ee5d12783601a4f7f5fd58a23

Share on: