KVM: vmx: speed up emulation of invalid guest state

About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction.  However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).

In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment).  So there is no need to
call guest_state_valid in the emulation loop.

Emulation performance test results indicate 1650-2600 cycles
for common instructions, versus 2300-3200 before this patch on
a Sandy Bridge Xeon.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8748c2e..5c7bbde 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3672,7 +3672,7 @@
 	vmcs_write32(sf->ar_bytes, vmx_segment_access_rights(var));
 
 out:
-	vmx->emulation_required |= emulation_required(vcpu);
+	vmx->emulation_required = emulation_required(vcpu);
 }
 
 static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
@@ -5640,7 +5640,7 @@
 	cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
 	intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
 
-	while (!guest_state_valid(vcpu) && count-- != 0) {
+	while (vmx->emulation_required && count-- != 0) {
 		if (intr_window_requested && vmx_interrupt_allowed(vcpu))
 			return handle_interrupt_window(&vmx->vcpu);
 
@@ -5674,7 +5674,6 @@
 			schedule();
 	}
 
-	vmx->emulation_required = emulation_required(vcpu);
 out:
 	return ret;
 }