PCI: iommu: iotlb flushing

This patch is for batching up the flushing of the IOTLB for the DMAR
implementation found in the Intel VT-d hardware.  It works by building a list
of to be flushed IOTLB entries and a bitmap list of which DMAR engine they are
from.

After either a high water mark (250 accessible via debugfs) or 10ms the list
of iova's will be reclaimed and the DMAR engines associated are IOTLB-flushed.

This approach recovers 15 to 20% of the performance lost when using the IOMMU
for my netperf udp stream benchmark with small packets.  It can be disabled
with a kernel boot parameter "intel_iommu=strict".

Its use does weaken the IOMMU protections a bit.

Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e30d8fe..f7492cd 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -847,6 +847,10 @@
 			than 32 bit addressing. The default is to look
 			for translation below 32 bit and if not available
 			then look in the higher range.
+		strict [Default Off]
+			With this option on every unmap_single operation will
+			result in a hardware IOTLB flush operation as opposed
+			to batching them for performance.
 
 	io_delay=	[X86-32,X86-64] I/O delay method
 		0x80