CVE-2024-26628 Information
Description
In the Linux kernel the following vulnerability has been resolved:
drm/amdkfd: Fix lock dependency warning
====================================================== WARNING: possible circular locking dependency detected 6.5.0-kfd-fkuehlin 276 Not tainted
kworker/8:2/2676 is trying to acquire lock: ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work))+.+.-0:0 at: __flush_work+0x52/0x550
but task is already holding lock: ffff9435cd8e1720 (&svms->lock)+.+.-3:3 at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> 2 (&svms->lock)+.+.-3:3: __mutex_lock+0x97/0xd30 kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu] kfd_ioctl+0x1b2/0x5d0 [amdgpu] __x64_sys_ioctl+0x86/0xc0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> 1 (&mm->mmap_lock)++++-3:3: down_read+0x42/0x160 svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20
-> 0 ((work_completion)(&svm_bo->eviction_work))+.+.-0:0: __lock_acquire+0x1426/0x2200 lock_acquire+0xc1/0x2b0 __flush_work+0x80/0x550 __cancel_work_timer+0x109/0x190 svm_range_bo_release+0xdc/0x1c0 [amdgpu] svm_range_free+0x175/0x180 [amdgpu] svm_range_deferred_list_work+0x15d/0x340 [amdgpu] process_one_work+0x27a/0x540 worker_thread+0x53/0x3e0 kthread+0xeb/0x120 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x11/0x20
other info that might help us debug this:
Chain exists of: (work_completion)(&svm_bo->eviction_work) –> &mm->mmap_lock –> &svms->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&svms->lock); lock(&mm->mmap_lock); lock(&svms->lock); lock((work_completion)(&svm_bo->eviction_work));
I believe this cannot really lead to a deadlock in practice because svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO refcount is non-0. That means it’s impossible that svm_range_bo_release is running concurrently. However there is no good way to annotate this.
To avoid the problem take a BO reference in svm_range_schedule_evict_svm_bo instead of in the worker. That way it’s impossible for a BO to get freed while eviction work is pending and the cancel_work_sync call in svm_range_bo_release can be eliminated.
v2: Use svm_bo_ref_unless_zero and explained why that’s safe. Also removed redundant checks that are already done in amdkfd_fence_enable_signaling.
Reference
https://git.kernel.org/stable/c/7a70663ba02bd4e19aea8d70c979eb3bd03d839d https://git.kernel.org/stable/c/8b25d397162b0316ceda40afaa63ee0c4a97d28b https://git.kernel.org/stable/c/28d2d623d2fbddcca5c24600474e92f16ebb3a05 https://git.kernel.org/stable/c/cb96e492d72d143d57db2d2bc143a1cee8741807 https://git.kernel.org/stable/c/47bf0f83fc86df1bf42b385a91aadb910137c5c9
Share on: