CVE-2025-21892 Information
Description
In the Linux kernel the following vulnerability has been resolved:
RDMA/mlx5: Fix the recovery flow of the UMR QP
This patch addresses an issue in the recovery flow of the UMR QP ensuring tasks do not get stuck as highlighted by the call trace [1].
During recovery before transitioning the QP to the RESET state the software must wait for all outstanding WRs to complete.
Failing to do so can cause the firmware to skip sending some flushed CQEs with errors and simply discard them upon the RESET as per the IB specification.
This race condition can result in lost CQEs and tasks becoming stuck.
To resolve this the patch sends a final WR which serves only as a barrier before moving the QP state to RESET.
Once a CQE is received for that final WR it guarantees that no outstanding WRs remain making it safe to transition the QP to RESET and subsequently back to RTS restoring proper functionality.
Note: For the barrier WR we simply reuse the failed and ready WR. Since the QP is in an error state it will only receive IB_WC_WR_FLUSH_ERR. However as it serves only as a barrier we don’t care about its status.
[1]
INFO: task rdma_resource_l:1922 blocked for more than 120 seconds.
Tainted: G W 6.12.0-rc7+ 1626
cho 0 > /proc/sys/kernel/hung_task_timeout_secs\ disables this message.
task:rdma_resource_l state:D stack:0 pid:1922 tgid:1922 ppid:1369
flags:0x00004004
Call Trace:
Reference
https://git.kernel.org/stable/c/1d2b84d8d054313deed2b2fcafe1168bbcb9e99f https://git.kernel.org/stable/c/3e3bf255992cc02404e9d209b127c1c9944239cf https://git.kernel.org/stable/c/d97505baea64d93538b16baf14ce7b8c1fbad746
Share on: