Blame - Documentation/x86/intel_mpx.txt - LeafOS-Devices/android_kernel_samsung_universal7904

blob: 818518a3ff01a124acaf07ba0e8a109c5a1ce1b3 [file] [log] [blame]

Qiaowei Ren	5776563	2014-11-14 07:18:32 -0800	[diff] [blame]	1	1. Intel(R) MPX Overview
				2	========================
				3
				4	Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
				5	introduced into Intel Architecture. Intel MPX provides hardware features
				6	that can be used in conjunction with compiler changes to check memory
				7	references, for those references whose compile-time normal intentions are
				8	usurped at runtime due to buffer overflow or underflow.
				9
Dave Hansen	72e9b5f	2014-12-12 10:38:36 -0800	[diff] [blame]	10	You can tell if your CPU supports MPX by looking in /proc/cpuinfo:
				11
				12	cat /proc/cpuinfo \| grep ' mpx '
				13
Qiaowei Ren	5776563	2014-11-14 07:18:32 -0800	[diff] [blame]	14	For more information, please refer to Intel(R) Architecture Instruction
				15	Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
				16	Extensions.
				17
Dave Hansen	72e9b5f	2014-12-12 10:38:36 -0800	[diff] [blame]	18	Note: As of December 2014, no hardware with MPX is available but it is
Qiaowei Ren	5776563	2014-11-14 07:18:32 -0800	[diff] [blame]	19	possible to use SDE (Intel(R) Software Development Emulator) instead, which
				20	can be downloaded from
				21	http://software.intel.com/en-us/articles/intel-software-development-emulator
				22
				23
				24	2. How to get the advantage of MPX
				25	==================================
				26
				27	For MPX to work, changes are required in the kernel, binutils and compiler.
				28	No source changes are required for applications, just a recompile.
				29
				30	There are a lot of moving parts of this to all work right. The following
				31	is how we expect the compiler, application and kernel to work together.
				32
				33	1) Application developer compiles with -fmpx. The compiler will add the
				34	instrumentation as well as some setup code called early after the app
				35	starts. New instruction prefixes are noops for old CPUs.
				36	2) That setup code allocates (virtual) space for the "bounds directory",
Dave Hansen	010e593	2014-12-12 10:38:35 -0800	[diff] [blame]	37	points the "bndcfgu" register to the directory (must also set the valid
				38	bit) and notifies the kernel (via the new prctl(PR_MPX_ENABLE_MANAGEMENT))
				39	that the app will be using MPX. The app must be careful not to access
				40	the bounds tables between the time when it populates "bndcfgu" and
				41	when it calls the prctl(). This might be hard to guarantee if the app
				42	is compiled with MPX. You can add "__attribute__((bnd_legacy))" to
				43	the function to disable MPX instrumentation to help guarantee this.
				44	Also be careful not to call out to any other code which might be
				45	MPX-instrumented.
Qiaowei Ren	5776563	2014-11-14 07:18:32 -0800	[diff] [blame]	46	3) The kernel detects that the CPU has MPX, allows the new prctl() to
				47	succeed, and notes the location of the bounds directory. Userspace is
				48	expected to keep the bounds directory at that locationWe note it
				49	instead of reading it each time because the 'xsave' operation needed
				50	to access the bounds directory register is an expensive operation.
				51	4) If the application needs to spill bounds out of the 4 registers, it
				52	issues a bndstx instruction. Since the bounds directory is empty at
				53	this point, a bounds fault (#BR) is raised, the kernel allocates a
				54	bounds table (in the user address space) and makes the relevant entry
				55	in the bounds directory point to the new table.
				56	5) If the application violates the bounds specified in the bounds registers,
				57	a separate kind of #BR is raised which will deliver a signal with
				58	information about the violation in the 'struct siginfo'.
				59	6) Whenever memory is freed, we know that it can no longer contain valid
				60	pointers, and we attempt to free the associated space in the bounds
				61	tables. If an entire table becomes unused, we will attempt to free
				62	the table and remove the entry in the directory.
				63
				64	To summarize, there are essentially three things interacting here:
				65
				66	GCC with -fmpx:
				67	* enables annotation of code with MPX instructions and prefixes
				68	* inserts code early in the application to call in to the "gcc runtime"
				69	GCC MPX Runtime:
				70	* Checks for hardware MPX support in cpuid leaf
				71	* allocates virtual space for the bounds directory (malloc() essentially)
				72	* points the hardware BNDCFGU register at the directory
				73	* calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
				74	start managing the bounds directories
				75	Kernel MPX Code:
				76	* Checks for hardware MPX support in cpuid leaf
				77	* Handles #BR exceptions and sends SIGSEGV to the app when it violates
				78	bounds, like during a buffer overflow.
				79	* When bounds are spilled in to an unallocated bounds table, the kernel
				80	notices in the #BR exception, allocates the virtual space, then
				81	updates the bounds directory to point to the new table. It keeps
				82	special track of the memory with a VM_MPX flag.
				83	* Frees unused bounds tables at the time that the memory they described
				84	is unmapped.
				85
				86
				87	3. How does MPX kernel code work
				88	================================
				89
				90	Handling #BR faults caused by MPX
				91	---------------------------------
				92
				93	When MPX is enabled, there are 2 new situations that can generate
				94	#BR faults.
				95	* new bounds tables (BT) need to be allocated to save bounds.
				96	* bounds violation caused by MPX instructions.
				97
				98	We hook #BR handler to handle these two new situations.
				99
				100	On-demand kernel allocation of bounds tables
				101	--------------------------------------------
				102
				103	MPX only has 4 hardware registers for storing bounds information. If
				104	MPX-enabled code needs more than these 4 registers, it needs to spill
				105	them somewhere. It has two special instructions for this which allow
				106	the bounds to be moved between the bounds registers and some new "bounds
				107	tables".
				108
				109	#BR exceptions are a new class of exceptions just for MPX. They are
				110	similar conceptually to a page fault and will be raised by the MPX
				111	hardware during both bounds violations or when the tables are not
				112	present. The kernel handles those #BR exceptions for not-present tables
				113	by carving the space out of the normal processes address space and then
				114	pointing the bounds-directory over to it.
				115
				116	The tables need to be accessed and controlled by userspace because
				117	the instructions for moving bounds in and out of them are extremely
				118	frequent. They potentially happen every time a register points to
				119	memory. Any direct kernel involvement (like a syscall) to access the
				120	tables would obviously destroy performance.
				121
				122	Why not do this in userspace? MPX does not strictly require anything in
				123	the kernel. It can theoretically be done completely from userspace. Here
				124	are a few ways this could be done. We don't think any of them are practical
				125	in the real-world, but here they are.
				126
				127	Q: Can virtual space simply be reserved for the bounds tables so that we
				128	never have to allocate them?
				129	A: MPX-enabled application will possibly create a lot of bounds tables in
				130	process address space to save bounds information. These tables can take
				131	up huge swaths of memory (as much as 80% of the memory on the system)
				132	even if we clean them up aggressively. In the worst-case scenario, the
				133	tables can be 4x the size of the data structure being tracked. IOW, a
				134	1-page structure can require 4 bounds-table pages. An X-GB virtual
				135	area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
				136	If we were to preallocate them for the 128TB of user virtual address
				137	space, we would need to reserve 512TB+2GB, which is larger than the
				138	entire virtual address space today. This means they can not be reserved
				139	ahead of time. Also, a single process's pre-popualated bounds directory
				140	consumes 2GB of virtual AND physical memory. IOW, it's completely
				141	infeasible to prepopulate bounds directories.
				142
				143	Q: Can we preallocate bounds table space at the same time memory is
				144	allocated which might contain pointers that might eventually need
				145	bounds tables?
				146	A: This would work if we could hook the site of each and every memory
				147	allocation syscall. This can be done for small, constrained applications.
				148	But, it isn't practical at a larger scale since a given app has no
				149	way of controlling how all the parts of the app might allocate memory
				150	(think libraries). The kernel is really the only place to intercept
				151	these calls.
				152
				153	Q: Could a bounds fault be handed to userspace and the tables allocated
				154	there in a signal handler intead of in the kernel?
				155	A: mmap() is not on the list of safe async handler functions and even
				156	if mmap() would work it still requires locking or nasty tricks to
				157	keep track of the allocation state there.
				158
				159	Having ruled out all of the userspace-only approaches for managing
				160	bounds tables that we could think of, we create them on demand in
				161	the kernel.
				162
				163	Decoding MPX instructions
				164	-------------------------
				165
				166	If a #BR is generated due to a bounds violation caused by MPX.
				167	We need to decode MPX instructions to get violation address and
				168	set this address into extended struct siginfo.
				169
				170	The _sigfault feild of struct siginfo is extended as follow:
				171
				172	87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
				173	88 struct {
				174	89 void __user _addr; / faulting insn/memory ref. */
				175	90 #ifdef __ARCH_SI_TRAPNO
				176	91 int _trapno; /* TRAP # which caused the signal */
				177	92 #endif
				178	93 short _addr_lsb; /* LSB of the reported address */
				179	94 struct {
				180	95 void __user *_lower;
				181	96 void __user *_upper;
				182	97 } _addr_bnd;
				183	98 } _sigfault;
				184
				185	The '_addr' field refers to violation address, and new '_addr_and'
				186	field refers to the upper/lower bounds when a #BR is caused.
				187
				188	Glibc will be also updated to support this new siginfo. So user
				189	can get violation address and bounds when bounds violations occur.
				190
				191	Cleanup unused bounds tables
				192	----------------------------
				193
				194	When a BNDSTX instruction attempts to save bounds to a bounds directory
				195	entry marked as invalid, a #BR is generated. This is an indication that
				196	no bounds table exists for this entry. In this case the fault handler
				197	will allocate a new bounds table on demand.
				198
				199	Since the kernel allocated those tables on-demand without userspace
				200	knowledge, it is also responsible for freeing them when the associated
				201	mappings go away.
				202
				203	Here, the solution for this issue is to hook do_munmap() to check
				204	whether one process is MPX enabled. If yes, those bounds tables covered
				205	in the virtual address region which is being unmapped will be freed also.
				206
				207	Adding new prctl commands
				208	-------------------------
				209
				210	Two new prctl commands are added to enable and disable MPX bounds tables
				211	management in kernel.
				212
				213	155 #define PR_MPX_ENABLE_MANAGEMENT 43
				214	156 #define PR_MPX_DISABLE_MANAGEMENT 44
				215
				216	Runtime library in userspace is responsible for allocation of bounds
				217	directory. So kernel have to use XSAVE instruction to get the base
				218	of bounds directory from BNDCFG register.
				219
				220	But XSAVE is expected to be very expensive. In order to do performance
				221	optimization, we have to get the base of bounds directory and save it
				222	into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
				223	command execution.
				224
				225
				226	4. Special rules
				227	================
				228
				229	1) If userspace is requesting help from the kernel to do the management
				230	of bounds tables, it may not create or modify entries in the bounds directory.
				231
				232	Certainly users can allocate bounds tables and forcibly point the bounds
				233	directory at them through XSAVE instruction, and then set valid bit
				234	of bounds entry to have this entry valid. But, the kernel will decline
				235	to assist in managing these tables.
				236
				237	2) Userspace may not take multiple bounds directory entries and point
				238	them at the same bounds table.
				239
				240	This is allowed architecturally. See more information "Intel(R) Architecture
				241	Instruction Set Extensions Programming Reference" (9.3.4).
				242
				243	However, if users did this, the kernel might be fooled in to unmaping an
				244	in-use bounds table since it does not recognize sharing.