diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-01-09 11:18:47 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-01-09 11:18:47 -0800 |
| commit | fb46e22a9e3863e08aef8815df9f17d0f4b9aede (patch) | |
| tree | 83e052911fa8d8d90bcf9de2796e17e19040613f /mm/swap_state.c | |
| parent | Merge tag 'slab-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vba... (diff) | |
| parent | mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER (diff) | |
| download | linux-fb46e22a9e3863e08aef8815df9f17d0f4b9aede.tar.gz linux-fb46e22a9e3863e08aef8815df9f17d0f4b9aede.zip | |
Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
Diffstat (limited to 'mm/swap_state.c')
| -rw-r--r-- | mm/swap_state.c | 121 |
1 files changed, 66 insertions, 55 deletions
diff --git a/mm/swap_state.c b/mm/swap_state.c index 85d9e5806a6a..e671266ad772 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -410,13 +410,12 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, return folio; } -struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct mempolicy *mpol, pgoff_t ilx, - bool *new_page_allocated) +struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, + struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated, + bool skip_if_exists) { struct swap_info_struct *si; struct folio *folio; - struct page *page; void *shadow = NULL; *new_page_allocated = false; @@ -433,10 +432,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, */ folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); - if (!IS_ERR(folio)) { - page = folio_file_page(folio, swp_offset(entry)); - goto got_page; - } + if (!IS_ERR(folio)) + goto got_folio; /* * Just skip read ahead for unused swap slot. @@ -450,7 +447,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, goto fail_put_swap; /* - * Get a new page to read into from swap. Allocate it now, + * Get a new folio to read into from swap. Allocate it now, * before marking swap_map SWAP_HAS_CACHE, when -EEXIST will * cause any racers to loop around until we add it to cache. */ @@ -471,17 +468,28 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, goto fail_put_swap; /* + * Protect against a recursive call to __read_swap_cache_async() + * on the same entry waiting forever here because SWAP_HAS_CACHE + * is set but the folio is not the swap cache yet. This can + * happen today if mem_cgroup_swapin_charge_folio() below + * triggers reclaim through zswap, which may call + * __read_swap_cache_async() in the writeback path. + */ + if (skip_if_exists) + goto fail_put_swap; + + /* * We might race against __delete_from_swap_cache(), and * stumble across a swap_map entry whose SWAP_HAS_CACHE * has not yet been cleared. Or race against another * __read_swap_cache_async(), which has set SWAP_HAS_CACHE - * in swap_map, but not yet added its page to swap cache. + * in swap_map, but not yet added its folio to swap cache. */ schedule_timeout_uninterruptible(1); } /* - * The swap entry is ours to swap in. Prepare the new page. + * The swap entry is ours to swap in. Prepare the new folio. */ __folio_set_locked(folio); @@ -502,10 +510,9 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* Caller will initiate read into locked folio */ folio_add_lru(folio); *new_page_allocated = true; - page = &folio->page; -got_page: +got_folio: put_swap_device(si); - return page; + return folio; fail_unlock: put_swap_folio(folio, entry); @@ -523,26 +530,26 @@ fail_put_swap: * the swap entry is no longer in use. * * get/put_swap_device() aren't needed to call this function, because - * __read_swap_cache_async() call them and swap_readpage() holds the + * __read_swap_cache_async() call them and swap_read_folio() holds the * swap cache folio lock. */ -struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, - unsigned long addr, struct swap_iocb **plug) +struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, + struct vm_area_struct *vma, unsigned long addr, + struct swap_iocb **plug) { bool page_allocated; struct mempolicy *mpol; pgoff_t ilx; - struct page *page; + struct folio *folio; mpol = get_vma_policy(vma, addr, 0, &ilx); - page = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, - &page_allocated); + folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + &page_allocated, false); mpol_cond_put(mpol); if (page_allocated) - swap_readpage(page, false, plug); - return page; + swap_read_folio(folio, false, plug); + return folio; } static unsigned int __swapin_nr_pages(unsigned long prev_offset, @@ -613,7 +620,7 @@ static unsigned long swapin_nr_pages(unsigned long offset) * @mpol: NUMA memory allocation policy to be applied * @ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * - * Returns the struct page for entry and addr, after queueing swapin. + * Returns the struct folio for entry and addr, after queueing swapin. * * Primitive swap readahead code. We simply read an aligned block of * (1 << page_cluster) entries in the swap area. This method is chosen @@ -624,10 +631,10 @@ static unsigned long swapin_nr_pages(unsigned long offset) * are used for every page of the readahead: neighbouring pages on swap * are fairly likely to have been swapped out from the same node. */ -struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, +struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx) { - struct page *page; + struct folio *folio; unsigned long entry_offset = swp_offset(entry); unsigned long offset = entry_offset; unsigned long start_offset, end_offset; @@ -652,30 +659,31 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, blk_start_plug(&plug); for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ - page = __read_swap_cache_async( + folio = __read_swap_cache_async( swp_entry(swp_type(entry), offset), - gfp_mask, mpol, ilx, &page_allocated); - if (!page) + gfp_mask, mpol, ilx, &page_allocated, false); + if (!folio) continue; if (page_allocated) { - swap_readpage(page, false, &splug); + swap_read_folio(folio, false, &splug); if (offset != entry_offset) { - SetPageReadahead(page); + folio_set_readahead(folio); count_vm_event(SWAP_RA); } } - put_page(page); + folio_put(folio); } blk_finish_plug(&plug); swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ skip: /* The page was likely read above, so no need for plugging here */ - page = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, - &page_allocated); + folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + &page_allocated, false); if (unlikely(page_allocated)) - swap_readpage(page, false, NULL); - return page; + swap_read_folio(folio, false, NULL); + zswap_folio_swapin(folio); + return folio; } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -779,7 +787,7 @@ static void swap_ra_info(struct vm_fault *vmf, * @targ_ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * @vmf: fault information * - * Returns the struct page for entry and addr, after queueing swapin. + * Returns the struct folio for entry and addr, after queueing swapin. * * Primitive swap readahead code. We simply read in a few pages whose * virtual addresses are around the fault address in the same vma. @@ -787,13 +795,12 @@ static void swap_ra_info(struct vm_fault *vmf, * Caller must hold read mmap_lock if vmf->vma is not NULL. * */ -static struct page *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, - struct mempolicy *mpol, pgoff_t targ_ilx, - struct vm_fault *vmf) +static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, + struct mempolicy *mpol, pgoff_t targ_ilx, struct vm_fault *vmf) { struct blk_plug plug; struct swap_iocb *splug = NULL; - struct page *page; + struct folio *folio; pte_t *pte = NULL, pentry; unsigned long addr; swp_entry_t entry; @@ -826,18 +833,18 @@ static struct page *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, continue; pte_unmap(pte); pte = NULL; - page = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, - &page_allocated); - if (!page) + folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + &page_allocated, false); + if (!folio) continue; if (page_allocated) { - swap_readpage(page, false, &splug); + swap_read_folio(folio, false, &splug); if (i != ra_info.offset) { - SetPageReadahead(page); + folio_set_readahead(folio); count_vm_event(SWAP_RA); } } - put_page(page); + folio_put(folio); } if (pte) pte_unmap(pte); @@ -845,12 +852,13 @@ static struct page *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, swap_read_unplug(splug); lru_add_drain(); skip: - /* The page was likely read above, so no need for plugging here */ - page = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, - &page_allocated); + /* The folio was likely read above, so no need for plugging here */ + folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, + &page_allocated, false); if (unlikely(page_allocated)) - swap_readpage(page, false, NULL); - return page; + swap_read_folio(folio, false, NULL); + zswap_folio_swapin(folio); + return folio; } /** @@ -870,14 +878,17 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, { struct mempolicy *mpol; pgoff_t ilx; - struct page *page; + struct folio *folio; mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); - page = swap_use_vma_readahead() ? + folio = swap_use_vma_readahead() ? swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf) : swap_cluster_readahead(entry, gfp_mask, mpol, ilx); mpol_cond_put(mpol); - return page; + + if (!folio) + return NULL; + return folio_file_page(folio, swp_offset(entry)); } #ifdef CONFIG_SYSFS |
