BACKPORT: FROMGIT: userfaultfd: update documentation to describe minor fault handling

Reword / reorganize things a little bit into "lists", so new features /
modes / ioctls can sort of just be appended.

Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to
intercept and resolve minor faults.  Make it clear that COPY and ZEROPAGE
are used for MISSING faults, whereas CONTINUE is used for MINOR faults.

Link: https://lkml.kernel.org/r/20210301222728.176417-6-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit d08ba026886f0161e2bdd3dbd75c4da0fc62a284
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388137/

Conflicts: Documentation/admin-guide/mm/userfaultfd.rst
(Manual rebase by removing text related to write-protect feature)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ib59504247d38034e4c86f692dd63f2b3706fe554
diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt
index bb2f945..d0bad1c 100644
--- a/Documentation/vm/userfaultfd.txt
+++ b/Documentation/vm/userfaultfd.txt
@@ -56,36 +56,36 @@
 
 The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
 defines what memory types are supported by the userfaultfd and what
-events, except page fault notifications, may be generated.
+events, except page fault notifications, may be generated:
 
-If the kernel supports registering userfaultfd ranges on hugetlbfs
-virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
-uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
-set if the kernel supports registering userfaultfd ranges on shared
-memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
-MAP_SHARED, memfd_create, etc).
+- The UFFD_FEATURE_EVENT_* flags indicate that various other events
+  other than page faults are supported. These events are described in more
+  detail below in the Non-cooperative userfaultfd section.
 
-The userland application that wants to use userfaultfd with hugetlbfs
-or shared memory need to set the corresponding flag in
-uffdio_api.features to enable those features.
+- UFFD_FEATURE_MISSING_HUGETLBFS and UFFD_FEATURE_MISSING_SHMEM
+  indicate that the kernel supports UFFDIO_REGISTER_MODE_MISSING
+  registrations for hugetlbfs and shared memory (covering all shmem APIs,
+  i.e. tmpfs, IPCSHM, /dev/zero, MAP_SHARED, memfd_create,
+  etc) virtual memory areas, respectively.
 
-If the userland desires to receive notifications for events other than
-page faults, it has to verify that uffdio_api.features has appropriate
-UFFD_FEATURE_EVENT_* bits set. These events are described in more
-detail below in "Non-cooperative userfaultfd" section.
+- UFFD_FEATURE_MINOR_HUGETLBFS indicates that the kernel supports
+  UFFDIO_REGISTER_MODE_MINOR registration for hugetlbfs virtual memory
+  areas.
 
-Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
-be invoked (if present in the returned uffdio_api.ioctls bitmask) to
-register a memory range in the userfaultfd by setting the
+The userland application should set the feature flags it intends to use
+when invoking the UFFDIO_API ioctl, to request that those features be
+enabled if supported.
+
+Once the userfaultfd API has been enabled the UFFDIO_REGISTER
+ioctl should be invoked (if present in the returned uffdio_api.ioctls
+bitmask) to register a memory range in the userfaultfd by setting the
 uffdio_register structure accordingly. The uffdio_register.mode
 bitmask will specify to the kernel which kind of faults to track for
-the range (UFFDIO_REGISTER_MODE_MISSING would track missing
-pages). The UFFDIO_REGISTER ioctl will return the
+the range. The UFFDIO_REGISTER ioctl will return the
 uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
 userfaults on the range registered. Not all ioctls will necessarily be
-supported for all memory types depending on the underlying virtual
-memory backend (anonymous memory vs tmpfs vs real filebacked
-mappings).
+supported for all memory types (e.g. anonymous memory vs. shmem vs.
+hugetlbfs), or all types of intercepted faults.
 
 Userland can use the uffdio_register.ioctls to manage the virtual
 address space in the background (to add or potentially also remove
@@ -93,13 +93,60 @@
 could be triggering just before userland maps in the background the
 user-faulted page.
 
-The primary ioctl to resolve userfaults is UFFDIO_COPY. That
-atomically copies a page into the userfault registered range and wakes
-up the blocked userfaults (unless uffdio_copy.mode &
-UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
-UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
-half copied page since it'll keep userfaulting until the copy has
-finished.
+Resolving Userfaults
+--------------------
+
+There are three basic ways to resolve userfaults:
+
+- UFFDIO_COPY atomically copies some existing page contents from
+  userspace.
+
+- UFFDIO_ZEROPAGE atomically zeros the new page.
+
+- UFFDIO_CONTINUE maps an existing, previously-populated page.
+
+These operations are atomic in the sense that they guarantee nothing can
+see a half-populated page, since readers will keep userfaulting until the
+operation has finished.
+
+By default, these wake up userfaults blocked on the range in question.
+They support a UFFDIO_*_MODE_DONTWAKE mode flag, which indicates
+that waking will be done separately at some later time.
+
+Which ioctl to choose depends on the kind of page fault, and what we'd
+like to do to resolve it:
+
+- For UFFDIO_REGISTER_MODE_MISSING faults, the fault needs to be
+  resolved by either providing a new page (UFFDIO_COPY), or mapping
+  the zero page (UFFDIO_ZEROPAGE). By default, the kernel would map
+  the zero page for a missing fault. With userfaultfd, userspace can
+  decide what content to provide before the faulting thread continues.
+
+- For UFFDIO_REGISTER_MODE_MINOR faults, there is an existing page (in
+  the page cache). Userspace has the option of modifying the page's
+  contents before resolving the fault. Once the contents are correct
+  (modified or not), userspace asks the kernel to map the page and let the
+  faulting thread continue with UFFDIO_CONTINUE.
+
+Notes:
+
+- You can tell which kind of fault occurred by examining
+  pagefault.flags within the uffd_msg, checking for the
+  UFFD_PAGEFAULT_FLAG_* flags.
+
+- None of the page-delivering ioctls default to the range that you
+  registered with.  You must fill in all fields for the appropriate
+  ioctl struct including the range.
+
+- You get the address of the access that triggered the missing page
+  event out of a struct uffd_msg that you read in the thread from the
+  uffd.  You can supply as many pages as you want with these IOCTLs.
+  Keep in mind that unless you used DONTWAKE then the first of any of
+  those IOCTLs wakes up the faulting thread.
+
+- Be sure to test for all errors including
+  (pollfd[0].revents & POLLERR).  This can happen, e.g. when ranges
+  supplied were incorrect.
 
 == QEMU/KVM ==