CVE-2025-22034 Information
Description
In the Linux kernel the following vulnerability has been resolved:
mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
Patch series \mm: fixes for device-exclusive entries (hmm)\ v2.
Discussing the PageTail() call in make_device_exclusive_range() with Willy I recently discovered [1] that device-exclusive handling does not properly work with THP making the hmm-tests selftests fail if THPs are enabled on the system.
Looking into more details I found that hugetlb is not properly fenced and I realized that something that was bugging me for longer – how device-exclusive entries interact with mapcounts – completely breaks migration/swapout/split/hwpoison handling of these folios while they have device-exclusive PTEs.
The program below can be used to allocate 1 GiB worth of pages and making them device-exclusive on a kernel with CONFIG_TEST_HMM.
Once they are device-exclusive these folios cannot get swapped out (proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how much one forces memory reclaim) and when having a memory block onlined to ZONE_MOVABLE trying to offline it will loop forever and complain about failed migration of a page that should be movable.
echo offline > /sys/devices/system/memory/memory136/state echo online_movable > /sys/devices/system/memory/memory136/state ./hmm-swap & … wait until everything is device-exclusive echo offline > /sys/devices/system/memory/memory136/state [ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x7f20671f7 pfn:0x442b6a [ 285.196618][T14882] memcg:ffff888179298000 [ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate| dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff) [ 285.201734][T14882] raw: … [ 285.204464][T14882] raw: … [ 285.207196][T14882] page dumped because: migration failure [ 285.209072][T14882] page_owner tracks the page as allocated [ 285.210915][T14882] page last allocated via order 0 migratetype Movable gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO) id 14926 tgid 14926 (hmm-swap) ts 254506295376 free_ts 227402023774 [ 285.216765][T14882] post_alloc_hook+0x197/0x1b0 [ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280 [ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740 [ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540 [ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340 [ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0 [ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0 [ 285.230822][T14882] handle_mm_fault+0x368/0x9f0 …
This series fixes all issues I found so far. There is no easy way to fix without a bigger rework/cleanup. I have a bunch of cleanups on top (some previous sent some the result of the discussion in v1) that I will send out separately once this landed and I get to it.
I wish we could just use some special present PROT_NONE PTEs instead of these (non-present non-none) fake-swap entries; but that just results in the same problem we keep having (lack of spare PTE bits) and staring at other similar fake-swap entries that ship has sailed.
With this series make_device_exclusive() doesn’t actually belong into mm/rmap.c anymore but I’ll leave moving that for another day.
I only tested this series with the hmm-tests selftests due to lack of HW so I’d appreciate some testing especially if the interaction between two GPUs wanting a device-exclusive entry works as expected.
define HMM_DMIRROR_EXCLUSIVE _IOWR(‘H’ 0x05 struct hmm_dmirror_cmd)
struct hmm_dmirror_cmd __u64 addr; __u64 ptr; __u64 npages; __u64 cpages; __u64 faults; ;
const size_t size = 1 1024 1024 1024ul; const size_t chunk_size = 2 1024 1024ul;
int m
truncated—
Reference
https://git.kernel.org/stable/c/2e877ff3492267def06dd50cb165dc9ab8838e7d https://git.kernel.org/stable/c/48d28417c66cce2f3b0ba773fcb6695a56eff220 https://git.kernel.org/stable/c/8977752c8056a6a094a279004a49722da15bace3 https://git.kernel.org/stable/c/fd900832e8440046627b60697687ab5d04398008
Share on: