CVE-2022-49547 Information
Description
In the Linux kernel the following vulnerability has been resolved:
btrfs: fix deadlock between concurrent dio writes when low on free data space
When reserving data space for a direct IO write we can end up deadlocking if we have multiple tasks attempting a write to the same file range there are multiple extents covered by that file range we are low on available space for data and the writes don’t expand the inode’s i_size.
The deadlock can happen like this:
-
We have a file with an i_size of 1M at offset 0 it has an extent with a size of 128K and at offset 128K it has another extent also with a size of 128K;
-
Task A does a direct IO write against file range [0 256K) and because the write is within the i_size boundary it takes the inode’s lock (VFS level) in shared mode;
-
Task A locks the file range [0 256K) at btrfs_dio_iomap_begin() and then gets the extent map for the extent covering the range [0 128K). At btrfs_get_blocks_direct_write() it creates an ordered extent for that file range ([0 128K));
-
Before returning from btrfs_dio_iomap_begin() it unlocks the file range [0 256K);
-
Task A executes btrfs_dio_iomap_begin() again this time for the file range [128K 256K) and locks the file range [128K 256K);
-
Task B starts a direct IO write against file range [0 256K) as well. It also locks the inode in shared mode as it’s within the i_size limit and then tries to lock file range [0 256K). It is able to lock the subrange [0 128K) but then blocks waiting for the range [128K 256K) as it is currently locked by task A;
-
Task A enters btrfs_get_blocks_direct_write() and tries to reserve data space. Because we are low on available free space it triggers the async data reclaim task and waits for it to reserve data space;
-
The async reclaim task decides to wait for all existing ordered extents to complete (through btrfs_wait_ordered_roots()). It finds the ordered extent previously created by task A for the file range [0 128K) and waits for it to complete;
-
The ordered extent for the file range [0 128K) can not complete because it blocks at btrfs_finish_ordered_io() when trying to lock the file range [0 128K).
This results in a deadlock because:
-
task B is holding the file range [0 128K) locked waiting for the range [128K 256K) to be unlocked by task A;
-
task A is holding the file range [128K 256K) locked and it’s waiting for the async data reclaim task to satisfy its space reservation request;
-
the async data reclaim task is waiting for ordered extent [0 128K) to complete but the ordered extent can not complete because the file range [0 128K) is currently locked by task B which is waiting on task A to unlock file range [128K 256K) and task A waiting on the async data reclaim task.
This results in a deadlock between 4 task: task A task B the async data reclaim task and the task doing ordered extent completion (a work queue task).
-
This type of deadlock can sporadically be triggered by the test case generic/300 from fstests and results in a stack trace like the following:
[12084.033689] INFO: task kworker/u16:7:123749 blocked for more than 241 seconds.
[12084.034877] Not tainted 5.18.0-rc2-btrfs-next-115 1
[12084.035562] cho 0 > /proc/sys/kernel/hung_task_timeout_secs\ disables this message.
[12084.036548] task:kworker/u16:7 state:D stack: 0 pid:123749 ppid: 2 flags:0x00004000
[12084.036554] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[12084.036599] Call Trace:
[12084.036601]
[12084.036606] __schedule+0x3cb/0xed0
[12084.036616] schedule+0x4e/0xb0
[12084.036620] btrfs_start_ordered_extent+0x109/0x1c0 [btrfs]
[12084.036651] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.036659] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs]
[12084.036688] btrfs_work_helper+0xf8/0x400 [btrfs]
[12084.0367
truncated—
Reference
https://git.kernel.org/stable/c/cfae6f765b3c40882ee90dae8fbf9325c8de9c35 https://git.kernel.org/stable/c/f5585f4f0ef5b17026bbd60fbff6fcc91b99d5bf
Share on: