CVE-2025-38358 Information
Description
In the Linux kernel the following vulnerability has been resolved:
btrfs: fix race between async reclaim worker and close_ctree()
Syzbot reported an assertion failure due to an attempt to add a delayed iput after we have set BTRFS_FS_STATE_NO_DELAYED_IPUT in the fs_info state:
WARNING: CPU: 0 PID: 65 at fs/btrfs/inode.c:3420 btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420
Modules linked in:
CPU: 0 UID: 0 PID: 65 Comm: kworker/u8:4 Not tainted 6.15.0-next-20250530-syzkaller 0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine BIOS Google 05/07/2025
Workqueue: btrfs-endio-write btrfs_work_helper
RIP: 0010:btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420
Code: 4e ad 5d (…)
RSP: 0018:ffffc9000213f780 EFLAGS: 00010293
RAX: ffffffff83c635b7 RBX: ffff888058920000 RCX: ffff88801c769e00
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffff888058921b67 R09: 1ffff1100b12436c
R10: dffffc0000000000 R11: ffffed100b12436d R12: 0000000000000001
R13: dffffc0000000000 R14: ffff88807d748000 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffff888125c53000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002000000bd038 CR3: 000000006a142000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
This can happen due to a race with the async reclaim worker like this:
-
The async metadata reclaim worker enters shrink_delalloc() which calls btrfs_start_delalloc_roots() with an nr_pages argument that has a value less than LONG_MAX and that in turn enters start_delalloc_inodes() which sets the local variable ‘full_flush’ to false because wbc->nr_to_write is less than LONG_MAX;
-
There it finds inode X in a root’s delalloc list grabs a reference for inode X (with igrab()) and triggers writeback for it with filemap_fdatawrite_wbc() which creates an ordered extent for inode X;
-
The unmount sequence starts from another task we enter close_ctree() and we flush the workqueue fs_info->endio_write_workers which waits for the ordered extent for inode X to complete and when dropping the last reference of the ordered extent with btrfs_put_ordered_extent() when we call btrfs_add_delayed_iput() we don’t add the inode to the list of delayed iputs because it has a refcount of 2 so we decrement it to 1 and return;
-
Shortly after at close_ctree() we call btrfs_run_delayed_iputs() which runs all delayed iputs and then we set BTRFS_FS_STATE_NO_DELAYED_IPUT in the fs_info state;
-
The async reclaim worker after calling filemap_fdatawrite_wbc() now calls btrfs_add_delayed_iput() for inode X and there we trigger an assertion failure since the fs_info state has the flag BTRFS_FS_STATE_NO_DELAYED_IPUT set.
Fix this by setting BTRFS_FS_STATE_NO_DELAYED_IPUT only after we wait for the async reclaim workers to finish after we call cancel_work_sync() for them at close_ctree() and by running delayed iputs after wait for the reclaim workers to finish and before setting the bit.
This race was recently introduced by commit 19e60b2a95f5 (trfs: add extra warning if delayed iput is added when it’s not allowed). Without the new validation at btrfs_add_delayed_iput()
truncated—
Reference
https://git.kernel.org/stable/c/4693cda2c06039c875f2eef0123b22340c34bfa0 https://git.kernel.org/stable/c/a26bf338cdad3643a6e7c3d78a172baadba15c1a
Related CNNVD
CNNVD-202507-3185 (Published: 2025-07-25)
Share on: