Blame - Documentation/virtual/kvm/vcpu-requests.rst - LeafOS-Devices/android_kernel_samsung_exynos9820

blob: 5feb3706a7ae58278bc9c747123d042f8b6f8958 [file] [log] [blame]

Andrew Jones	3bb9614	2017-06-04 14:43:53 +0200	[diff] [blame]	1	=================
				2	KVM VCPU Requests
				3	=================
				4
				5	Overview
				6	========
				7
				8	KVM supports an internal API enabling threads to request a VCPU thread to
				9	perform some activity. For example, a thread may request a VCPU to flush
				10	its TLB with a VCPU request. The API consists of the following functions::
				11
				12	/* Check if any requests are pending for VCPU @vcpu. */
				13	bool kvm_request_pending(struct kvm_vcpu *vcpu);
				14
				15	/* Check if VCPU @vcpu has request @req pending. */
				16	bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
				17
				18	/* Clear request @req for VCPU @vcpu. */
				19	void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
				20
				21	/*
				22	* Check if VCPU @vcpu has request @req pending. When the request is
				23	* pending it will be cleared and a memory barrier, which pairs with
				24	* another in kvm_make_request(), will be issued.
				25	*/
				26	bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
				27
				28	/*
				29	* Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
				30	* with another in kvm_check_request(), prior to setting the request.
				31	*/
				32	void kvm_make_request(int req, struct kvm_vcpu *vcpu);
				33
				34	/* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
				35	bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
				36
				37	Typically a requester wants the VCPU to perform the activity as soon
				38	as possible after making the request. This means most requests
				39	(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
				40	and kvm_make_all_cpus_request() has the kicking of all VCPUs built
				41	into it.
				42
				43	VCPU Kicks
				44	----------
				45
				46	The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
				47	order to perform some KVM maintenance. To do so, an IPI is sent, forcing
				48	a guest mode exit. However, a VCPU thread may not be in guest mode at the
				49	time of the kick. Therefore, depending on the mode and state of the VCPU
				50	thread, there are two other actions a kick may take. All three actions
				51	are listed below:
				52
				53	1) Send an IPI. This forces a guest mode exit.
				54	2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest
				55	mode that wait on waitqueues. Waking them removes the threads from
				56	the waitqueues, allowing the threads to run again. This behavior
				57	may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
				58	3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not
				59	sleeping, then there is nothing to do.
				60
				61	VCPU Mode
				62	---------
				63
				64	VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
				65	guest is running in guest mode or not, as well as some specific
				66	outside guest mode states. The architecture may use ``vcpu->mode`` to
				67	ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
				68	as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
				69	even to ensure IPI acknowledgements are waited upon (see "Waiting for
				70	Acknowledgements"). The following modes are defined:
				71
				72	OUTSIDE_GUEST_MODE
				73
				74	The VCPU thread is outside guest mode.
				75
				76	IN_GUEST_MODE
				77
				78	The VCPU thread is in guest mode.
				79
				80	EXITING_GUEST_MODE
				81
				82	The VCPU thread is transitioning from IN_GUEST_MODE to
				83	OUTSIDE_GUEST_MODE.
				84
				85	READING_SHADOW_PAGE_TABLES
				86
				87	The VCPU thread is outside guest mode, but it wants the sender of
				88	certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
				89	thread is done reading the page tables.
				90
				91	VCPU Request Internals
				92	======================
				93
				94	VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
				95	This means general bitops, like those documented in [atomic-ops]_ could
				96	also be used, e.g. ::
				97
				98	clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
				99
				100	However, VCPU request users should refrain from doing so, as it would
				101	break the abstraction. The first 8 bits are reserved for architecture
				102	independent requests, all additional bits are available for architecture
				103	dependent requests.
				104
				105	Architecture Independent Requests
				106	---------------------------------
				107
				108	KVM_REQ_TLB_FLUSH
				109
				110	KVM's common MMU notifier may need to flush all of a guest's TLB
				111	entries, calling kvm_flush_remote_tlbs() to do so. Architectures that
				112	choose to use the common kvm_flush_remote_tlbs() implementation will
				113	need to handle this VCPU request.
				114
				115	KVM_REQ_MMU_RELOAD
				116
				117	When shadow page tables are used and memory slots are removed it's
				118	necessary to inform each VCPU to completely refresh the tables. This
				119	request is used for that.
				120
				121	KVM_REQ_PENDING_TIMER
				122
				123	This request may be made from a timer handler run on the host on behalf
				124	of a VCPU. It informs the VCPU thread to inject a timer interrupt.
				125
				126	KVM_REQ_UNHALT
				127
				128	This request may be made from the KVM common function kvm_vcpu_block(),
				129	which is used to emulate an instruction that causes a CPU to halt until
				130	one of an architectural specific set of events and/or interrupts is
				131	received (determined by checking kvm_arch_vcpu_runnable()). When that
				132	event or interrupt arrives kvm_vcpu_block() makes the request. This is
				133	in contrast to when kvm_vcpu_block() returns due to any other reason,
				134	such as a pending signal, which does not indicate the VCPU's halt
				135	emulation should stop, and therefore does not make the request.
				136
				137	KVM_REQUEST_MASK
				138	----------------
				139
				140	VCPU requests should be masked by KVM_REQUEST_MASK before using them with
				141	bitops. This is because only the lower 8 bits are used to represent the
				142	request's number. The upper bits are used as flags. Currently only two
				143	flags are defined.
				144
				145	VCPU Request Flags
				146	------------------
				147
				148	KVM_REQUEST_NO_WAKEUP
				149
				150	This flag is applied to requests that only need immediate attention
				151	from VCPUs running in guest mode. That is, sleeping VCPUs do not need
				152	to be awaken for these requests. Sleeping VCPUs will handle the
				153	requests when they are awaken later for some other reason.
				154
				155	KVM_REQUEST_WAIT
				156
				157	When requests with this flag are made with kvm_make_all_cpus_request(),
				158	then the caller will wait for each VCPU to acknowledge its IPI before
				159	proceeding. This flag only applies to VCPUs that would receive IPIs.
				160	If, for example, the VCPU is sleeping, so no IPI is necessary, then
				161	the requesting thread does not wait. This means that this flag may be
				162	safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for
				163	Acknowledgements" for more information about requests with
				164	KVM_REQUEST_WAIT.
				165
				166	VCPU Requests with Associated State
				167	===================================
				168
				169	Requesters that want the receiving VCPU to handle new state need to ensure
				170	the newly written state is observable to the receiving VCPU thread's CPU
				171	by the time it observes the request. This means a write memory barrier
				172	must be inserted after writing the new state and before setting the VCPU
				173	request bit. Additionally, on the receiving VCPU thread's side, a
				174	corresponding read barrier must be inserted after reading the request bit
				175	and before proceeding to read the new state associated with it. See
				176	scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
				177	[memory-barriers]_.
				178
				179	The pair of functions, kvm_check_request() and kvm_make_request(), provide
				180	the memory barriers, allowing this requirement to be handled internally by
				181	the API.
				182
				183	Ensuring Requests Are Seen
				184	==========================
				185
				186	When making requests to VCPUs, we want to avoid the receiving VCPU
				187	executing in guest mode for an arbitrary long time without handling the
				188	request. We can be sure this won't happen as long as we ensure the VCPU
				189	thread checks kvm_request_pending() before entering guest mode and that a
				190	kick will send an IPI to force an exit from guest mode when necessary.
				191	Extra care must be taken to cover the period after the VCPU thread's last
				192	kvm_request_pending() check and before it has entered guest mode, as kick
				193	IPIs will only trigger guest mode exits for VCPU threads that are in guest
				194	mode or at least have already disabled interrupts in order to prepare to
				195	enter guest mode. This means that an optimized implementation (see "IPI
				196	Reduction") must be certain when it's safe to not send the IPI. One
				197	solution, which all architectures except s390 apply, is to:
				198
				199	- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
				200	the last kvm_request_pending() check;
				201	- enable interrupts atomically when entering the guest.
				202
				203	This solution also requires memory barriers to be placed carefully in both
				204	the requesting thread and the receiving VCPU. With the memory barriers we
				205	can exclude the possibility of a VCPU thread observing
				206	!kvm_request_pending() on its last check and then not receiving an IPI for
				207	the next request made of it, even if the request is made immediately after
				208	the check. This is done by way of the Dekker memory barrier pattern
				209	(scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables,
				210	this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting
				211	them into the pattern gives::
				212
				213	CPU1 CPU2
				214	================= =================
				215	local_irq_disable();
				216	WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu);
				217	smp_mb(); smp_mb();
				218	if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) ==
				219	IN_GUEST_MODE) {
				220	...abort guest entry... ...send IPI...
				221	} }
				222
				223	As stated above, the IPI is only useful for VCPU threads in guest mode or
				224	that have already disabled interrupts. This is why this specific case of
				225	the Dekker pattern has been extended to disable interrupts before setting
				226	``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to
				227	pedantically implement the memory barrier pattern, guaranteeing the
				228	compiler doesn't interfere with ``vcpu->mode``'s carefully planned
				229	accesses.
				230
				231	IPI Reduction
				232	-------------
				233
				234	As only one IPI is needed to get a VCPU to check for any/all requests,
				235	then they may be coalesced. This is easily done by having the first IPI
				236	sending kick also change the VCPU mode to something !IN_GUEST_MODE. The
				237	transitional state, EXITING_GUEST_MODE, is used for this purpose.
				238
				239	Waiting for Acknowledgements
				240	----------------------------
				241
				242	Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
				243	be sent, and the acknowledgements to be waited upon, even when the target
				244	VCPU threads are in modes other than IN_GUEST_MODE. For example, one case
				245	is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
				246	is set after disabling interrupts. To support these cases, the
				247	KVM_REQUEST_WAIT flag changes the condition for sending an IPI from
				248	checking that the VCPU is IN_GUEST_MODE to checking that it is not
				249	OUTSIDE_GUEST_MODE.
				250
				251	Request-less VCPU Kicks
				252	-----------------------
				253
				254	As the determination of whether or not to send an IPI depends on the
				255	two-variable Dekker memory barrier pattern, then it's clear that
				256	request-less VCPU kicks are almost never correct. Without the assurance
				257	that a non-IPI generating kick will still result in an action by the
				258	receiving VCPU, as the final kvm_request_pending() check does for
				259	request-accompanying kicks, then the kick may not do anything useful at
				260	all. If, for instance, a request-less kick was made to a VCPU that was
				261	just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
				262	the VCPU thread may continue its entry without actually having done
				263	whatever it was the kick was meant to initiate.
				264
				265	One exception is x86's posted interrupt mechanism. In this case, however,
				266	even the request-less VCPU kick is coupled with the same
				267	local_irq_disable() + smp_mb() pattern described above; the ON bit
				268	(Outstanding Notification) in the posted interrupt descriptor takes the
				269	role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is
				270	set before reading ``vcpu->mode``; dually, in the VCPU thread,
				271	vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
				272	IN_GUEST_MODE.
				273
				274	Additional Considerations
				275	=========================
				276
				277	Sleeping VCPUs
				278	--------------
				279
				280	VCPU threads may need to consider requests before and/or after calling
				281	functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they
				282	do or not, and, if they do, which requests need consideration, is
				283	architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
				284	to check if it should awaken. One reason to do so is to provide
				285	architectures a function where requests may be checked if necessary.
				286
				287	Clearing Requests
				288	-----------------
				289
				290	Generally it only makes sense for the receiving VCPU thread to clear a
				291	request. However, in some circumstances, such as when the requesting
				292	thread and the receiving VCPU thread are executed serially, such as when
				293	they are the same thread, or when they are using some form of concurrency
				294	control to temporarily execute synchronously, then it's possible to know
				295	that the request may be cleared immediately, rather than waiting for the
				296	receiving VCPU thread to handle the request in VCPU RUN. The only current
				297	examples of this are kvm_vcpu_block() calls made by VCPUs to block
				298	themselves. A possible side-effect of that call is to make the
				299	KVM_REQ_UNHALT request, which may then be cleared immediately when the
				300	VCPU returns from the call.
				301
				302	References
				303	==========
				304
				305	.. [atomic-ops] Documentation/core-api/atomic_ops.rst
				306	.. [memory-barriers] Documentation/memory-barriers.txt
				307	.. [lwn-mb] https://lwn.net/Articles/573436/