[PATCH] uml: TLB operation batching
This adds VM op batching to skas0. Rather than having a context switch to and
from the userspace stub for each address space change, we write a number of
operations to the stub data page and invoke a different stub which loops over
them and executes them all in one go.
The operations are stored as [ system call number, arg1, arg2, ... ] tuples.
The set is terminated by a system call number of 0. Single operations, i.e.
page faults, are handled in the old way, since that is slightly more
efficient.
For a kernel build, a minority (~1/4) of the operations are part of a set.
These sets averaged ~100 in length, so for this quarter, the context switching
overhead is greatly reduced.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/um/sys-x86_64/stub.S b/arch/um/sys-x86_64/stub.S
index 31c1492..957f2ef 100644
--- a/arch/um/sys-x86_64/stub.S
+++ b/arch/um/sys-x86_64/stub.S
@@ -13,3 +13,24 @@
or %rcx, %rbx
movq %rax, (%rbx)
int3
+
+ .globl batch_syscall_stub
+batch_syscall_stub:
+ movq $(UML_CONFIG_STUB_DATA >> 32), %rbx
+ salq $32, %rbx
+ movq $(UML_CONFIG_STUB_DATA & 0xffffffff), %rcx
+ or %rcx, %rbx
+ movq %rbx, %rsp
+again: pop %rax
+ cmpq $0, %rax
+jz done
+ pop %rdi
+ pop %rsi
+ pop %rdx
+ pop %r10
+ pop %r8
+ pop %r9
+ syscall
+ mov %rax, (%rbx)
+ jmp again
+done: int3