Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: filesystem hang caused by incorrect lock order
GFS2: Don't try to deallocate unlinked inodes when mounted ro
GFS2: directly write blocks past i_size
GFS2: write_end error path fails to unlock transaction lock
diff --git a/Documentation/input/event-codes.txt b/Documentation/input/event-codes.txt
new file mode 100644
index 0000000..23fcb05
--- /dev/null
+++ b/Documentation/input/event-codes.txt
@@ -0,0 +1,262 @@
+The input protocol uses a map of types and codes to express input device values
+to userspace. This document describes the types and codes and how and when they
+may be used.
+
+A single hardware event generates multiple input events. Each input event
+contains the new value of a single data item. A special event type, EV_SYN, is
+used to separate input events into packets of input data changes occurring at
+the same moment in time. In the following, the term "event" refers to a single
+input event encompassing a type, code, and value.
+
+The input protocol is a stateful protocol. Events are emitted only when values
+of event codes have changed. However, the state is maintained within the Linux
+input subsystem; drivers do not need to maintain the state and may attempt to
+emit unchanged values without harm. Userspace may obtain the current state of
+event code values using the EVIOCG* ioctls defined in linux/input.h. The event
+reports supported by a device are also provided by sysfs in
+class/input/event*/device/capabilities/, and the properties of a device are
+provided in class/input/event*/device/properties.
+
+Types:
+==========
+Types are groupings of codes under a logical input construct. Each type has a
+set of applicable codes to be used in generating events. See the Codes section
+for details on valid codes for each type.
+
+* EV_SYN:
+ - Used as markers to separate events. Events may be separated in time or in
+ space, such as with the multitouch protocol.
+
+* EV_KEY:
+ - Used to describe state changes of keyboards, buttons, or other key-like
+ devices.
+
+* EV_REL:
+ - Used to describe relative axis value changes, e.g. moving the mouse 5 units
+ to the left.
+
+* EV_ABS:
+ - Used to describe absolute axis value changes, e.g. describing the
+ coordinates of a touch on a touchscreen.
+
+* EV_MSC:
+ - Used to describe miscellaneous input data that do not fit into other types.
+
+* EV_SW:
+ - Used to describe binary state input switches.
+
+* EV_LED:
+ - Used to turn LEDs on devices on and off.
+
+* EV_SND:
+ - Used to output sound to devices.
+
+* EV_REP:
+ - Used for autorepeating devices.
+
+* EV_FF:
+ - Used to send force feedback commands to an input device.
+
+* EV_PWR:
+ - A special type for power button and switch input.
+
+* EV_FF_STATUS:
+ - Used to receive force feedback device status.
+
+Codes:
+==========
+Codes define the precise type of event.
+
+EV_SYN:
+----------
+EV_SYN event values are undefined. Their usage is defined only by when they are
+sent in the evdev event stream.
+
+* SYN_REPORT:
+ - Used to synchronize and separate events into packets of input data changes
+ occurring at the same moment in time. For example, motion of a mouse may set
+ the REL_X and REL_Y values for one motion, then emit a SYN_REPORT. The next
+ motion will emit more REL_X and REL_Y values and send another SYN_REPORT.
+
+* SYN_CONFIG:
+ - TBD
+
+* SYN_MT_REPORT:
+ - Used to synchronize and separate touch events. See the
+ multi-touch-protocol.txt document for more information.
+
+* SYN_DROPPED:
+ - Used to indicate buffer overrun in the evdev client's event queue.
+ Client should ignore all events up to and including next SYN_REPORT
+ event and query the device (using EVIOCG* ioctls) to obtain its
+ current state.
+
+EV_KEY:
+----------
+EV_KEY events take the form KEY_<name> or BTN_<name>. For example, KEY_A is used
+to represent the 'A' key on a keyboard. When a key is depressed, an event with
+the key's code is emitted with value 1. When the key is released, an event is
+emitted with value 0. Some hardware send events when a key is repeated. These
+events have a value of 2. In general, KEY_<name> is used for keyboard keys, and
+BTN_<name> is used for other types of momentary switch events.
+
+A few EV_KEY codes have special meanings:
+
+* BTN_TOOL_<name>:
+ - These codes are used in conjunction with input trackpads, tablets, and
+ touchscreens. These devices may be used with fingers, pens, or other tools.
+ When an event occurs and a tool is used, the corresponding BTN_TOOL_<name>
+ code should be set to a value of 1. When the tool is no longer interacting
+ with the input device, the BTN_TOOL_<name> code should be reset to 0. All
+ trackpads, tablets, and touchscreens should use at least one BTN_TOOL_<name>
+ code when events are generated.
+
+* BTN_TOUCH:
+ BTN_TOUCH is used for touch contact. While an input tool is determined to be
+ within meaningful physical contact, the value of this property must be set
+ to 1. Meaningful physical contact may mean any contact, or it may mean
+ contact conditioned by an implementation defined property. For example, a
+ touchpad may set the value to 1 only when the touch pressure rises above a
+ certain value. BTN_TOUCH may be combined with BTN_TOOL_<name> codes. For
+ example, a pen tablet may set BTN_TOOL_PEN to 1 and BTN_TOUCH to 0 while the
+ pen is hovering over but not touching the tablet surface.
+
+Note: For appropriate function of the legacy mousedev emulation driver,
+BTN_TOUCH must be the first evdev code emitted in a synchronization frame.
+
+Note: Historically a touch device with BTN_TOOL_FINGER and BTN_TOUCH was
+interpreted as a touchpad by userspace, while a similar device without
+BTN_TOOL_FINGER was interpreted as a touchscreen. For backwards compatibility
+with current userspace it is recommended to follow this distinction. In the
+future, this distinction will be deprecated and the device properties ioctl
+EVIOCGPROP, defined in linux/input.h, will be used to convey the device type.
+
+* BTN_TOOL_FINGER, BTN_TOOL_DOUBLETAP, BTN_TOOL_TRIPLETAP, BTN_TOOL_QUADTAP:
+ - These codes denote one, two, three, and four finger interaction on a
+ trackpad or touchscreen. For example, if the user uses two fingers and moves
+ them on the touchpad in an effort to scroll content on screen,
+ BTN_TOOL_DOUBLETAP should be set to value 1 for the duration of the motion.
+ Note that all BTN_TOOL_<name> codes and the BTN_TOUCH code are orthogonal in
+ purpose. A trackpad event generated by finger touches should generate events
+ for one code from each group. At most only one of these BTN_TOOL_<name>
+ codes should have a value of 1 during any synchronization frame.
+
+Note: Historically some drivers emitted multiple of the finger count codes with
+a value of 1 in the same synchronization frame. This usage is deprecated.
+
+Note: In multitouch drivers, the input_mt_report_finger_count() function should
+be used to emit these codes. Please see multi-touch-protocol.txt for details.
+
+EV_REL:
+----------
+EV_REL events describe relative changes in a property. For example, a mouse may
+move to the left by a certain number of units, but its absolute position in
+space is unknown. If the absolute position is known, EV_ABS codes should be used
+instead of EV_REL codes.
+
+A few EV_REL codes have special meanings:
+
+* REL_WHEEL, REL_HWHEEL:
+ - These codes are used for vertical and horizontal scroll wheels,
+ respectively.
+
+EV_ABS:
+----------
+EV_ABS events describe absolute changes in a property. For example, a touchpad
+may emit coordinates for a touch location.
+
+A few EV_ABS codes have special meanings:
+
+* ABS_DISTANCE:
+ - Used to describe the distance of a tool from an interaction surface. This
+ event should only be emitted while the tool is hovering, meaning in close
+ proximity of the device and while the value of the BTN_TOUCH code is 0. If
+ the input device may be used freely in three dimensions, consider ABS_Z
+ instead.
+
+* ABS_MT_<name>:
+ - Used to describe multitouch input events. Please see
+ multi-touch-protocol.txt for details.
+
+EV_SW:
+----------
+EV_SW events describe stateful binary switches. For example, the SW_LID code is
+used to denote when a laptop lid is closed.
+
+Upon binding to a device or resuming from suspend, a driver must report
+the current switch state. This ensures that the device, kernel, and userspace
+state is in sync.
+
+Upon resume, if the switch state is the same as before suspend, then the input
+subsystem will filter out the duplicate switch state reports. The driver does
+not need to keep the state of the switch at any time.
+
+EV_MSC:
+----------
+EV_MSC events are used for input and output events that do not fall under other
+categories.
+
+EV_LED:
+----------
+EV_LED events are used for input and output to set and query the state of
+various LEDs on devices.
+
+EV_REP:
+----------
+EV_REP events are used for specifying autorepeating events.
+
+EV_SND:
+----------
+EV_SND events are used for sending sound commands to simple sound output
+devices.
+
+EV_FF:
+----------
+EV_FF events are used to initialize a force feedback capable device and to cause
+such device to feedback.
+
+EV_PWR:
+----------
+EV_PWR events are a special type of event used specifically for power
+mangement. Its usage is not well defined. To be addressed later.
+
+Guidelines:
+==========
+The guidelines below ensure proper single-touch and multi-finger functionality.
+For multi-touch functionality, see the multi-touch-protocol.txt document for
+more information.
+
+Mice:
+----------
+REL_{X,Y} must be reported when the mouse moves. BTN_LEFT must be used to report
+the primary button press. BTN_{MIDDLE,RIGHT,4,5,etc.} should be used to report
+further buttons of the device. REL_WHEEL and REL_HWHEEL should be used to report
+scroll wheel events where available.
+
+Touchscreens:
+----------
+ABS_{X,Y} must be reported with the location of the touch. BTN_TOUCH must be
+used to report when a touch is active on the screen.
+BTN_{MOUSE,LEFT,MIDDLE,RIGHT} must not be reported as the result of touch
+contact. BTN_TOOL_<name> events should be reported where possible.
+
+Trackpads:
+----------
+Legacy trackpads that only provide relative position information must report
+events like mice described above.
+
+Trackpads that provide absolute touch position must report ABS_{X,Y} for the
+location of the touch. BTN_TOUCH should be used to report when a touch is active
+on the trackpad. Where multi-finger support is available, BTN_TOOL_<name> should
+be used to report the number of touches active on the trackpad.
+
+Tablets:
+----------
+BTN_TOOL_<name> events must be reported when a stylus or other tool is active on
+the tablet. ABS_{X,Y} must be reported with the location of the tool. BTN_TOUCH
+should be used to report when the tool is in contact with the tablet.
+BTN_{STYLUS,STYLUS2} should be used to report buttons on the tool itself. Any
+button may be used for buttons on the tablet except BTN_{MOUSE,LEFT}.
+BTN_{0,1,2,etc} are good generic codes for unlabeled buttons. Do not use
+meaningful buttons, like BTN_FORWARD, unless the button is labeled for that
+purpose on the device.
diff --git a/Makefile b/Makefile
index 322e733..b967b96 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 39
-EXTRAVERSION = -rc3
+EXTRAVERSION = -rc4
NAME = Flesh-Eating Bats with Fangs
# *DOCUMENTATION*
diff --git a/arch/arm/mach-msm/board-qsd8x50.c b/arch/arm/mach-msm/board-qsd8x50.c
index 7f56861..6a96911 100644
--- a/arch/arm/mach-msm/board-qsd8x50.c
+++ b/arch/arm/mach-msm/board-qsd8x50.c
@@ -160,10 +160,7 @@
static void __init qsd8x50_init_mmc(void)
{
- if (machine_is_qsd8x50_ffa() || machine_is_qsd8x50a_ffa())
- vreg_mmc = vreg_get(NULL, "gp6");
- else
- vreg_mmc = vreg_get(NULL, "gp5");
+ vreg_mmc = vreg_get(NULL, "gp5");
if (IS_ERR(vreg_mmc)) {
pr_err("vreg get for vreg_mmc failed (%ld)\n",
diff --git a/arch/arm/mach-msm/timer.c b/arch/arm/mach-msm/timer.c
index 56f920c..38b95e9 100644
--- a/arch/arm/mach-msm/timer.c
+++ b/arch/arm/mach-msm/timer.c
@@ -269,7 +269,7 @@
/* Use existing clock_event for cpu 0 */
if (!smp_processor_id())
- return;
+ return 0;
writel(DGT_CLK_CTL_DIV_4, MSM_TMR_BASE + DGT_CLK_CTL);
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b6ff882..8f4d50b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -209,7 +209,7 @@
config ARCH_SUSPEND_POSSIBLE
def_bool y
depends on ADB_PMU || PPC_EFIKA || PPC_LITE5200 || PPC_83xx || \
- PPC_85xx || PPC_86xx || PPC_PSERIES || 44x || 40x
+ (PPC_85xx && !SMP) || PPC_86xx || PPC_PSERIES || 44x || 40x
config PPC_DCR_NATIVE
bool
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index be3cdf9..1833d1a 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -382,10 +382,12 @@
#define CPU_FTRS_E500_2 (CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
CPU_FTR_SPE_COMP | CPU_FTR_MAYBE_CAN_NAP | \
CPU_FTR_NODSISRALIGN | CPU_FTR_NOEXECUTE)
-#define CPU_FTRS_E500MC (CPU_FTR_MAYBE_CAN_DOZE | CPU_FTR_USE_TB | \
- CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_NODSISRALIGN | \
+#define CPU_FTRS_E500MC (CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | \
CPU_FTR_L2CSR | CPU_FTR_LWSYNC | CPU_FTR_NOEXECUTE | \
CPU_FTR_DBELL)
+#define CPU_FTRS_E5500 (CPU_FTR_USE_TB | CPU_FTR_NODSISRALIGN | \
+ CPU_FTR_L2CSR | CPU_FTR_LWSYNC | CPU_FTR_NOEXECUTE | \
+ CPU_FTR_DBELL | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD)
#define CPU_FTRS_GENERIC_32 (CPU_FTR_COMMON | CPU_FTR_NODSISRALIGN)
/* 64-bit CPUs */
@@ -435,11 +437,15 @@
#define CPU_FTRS_COMPATIBLE (CPU_FTR_USE_TB | CPU_FTR_PPCAS_ARCH_V2)
#ifdef __powerpc64__
+#ifdef CONFIG_PPC_BOOK3E
+#define CPU_FTRS_POSSIBLE (CPU_FTRS_E5500)
+#else
#define CPU_FTRS_POSSIBLE \
(CPU_FTRS_POWER3 | CPU_FTRS_RS64 | CPU_FTRS_POWER4 | \
CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | CPU_FTRS_POWER6 | \
CPU_FTRS_POWER7 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
CPU_FTR_1T_SEGMENT | CPU_FTR_VSX)
+#endif
#else
enum {
CPU_FTRS_POSSIBLE =
@@ -473,16 +479,21 @@
#endif
#ifdef CONFIG_E500
CPU_FTRS_E500 | CPU_FTRS_E500_2 | CPU_FTRS_E500MC |
+ CPU_FTRS_E5500 |
#endif
0,
};
#endif /* __powerpc64__ */
#ifdef __powerpc64__
+#ifdef CONFIG_PPC_BOOK3E
+#define CPU_FTRS_ALWAYS (CPU_FTRS_E5500)
+#else
#define CPU_FTRS_ALWAYS \
(CPU_FTRS_POWER3 & CPU_FTRS_RS64 & CPU_FTRS_POWER4 & \
CPU_FTRS_PPC970 & CPU_FTRS_POWER5 & CPU_FTRS_POWER6 & \
CPU_FTRS_POWER7 & CPU_FTRS_CELL & CPU_FTRS_PA6T & CPU_FTRS_POSSIBLE)
+#endif
#else
enum {
CPU_FTRS_ALWAYS =
@@ -513,6 +524,7 @@
#endif
#ifdef CONFIG_E500
CPU_FTRS_E500 & CPU_FTRS_E500_2 & CPU_FTRS_E500MC &
+ CPU_FTRS_E5500 &
#endif
CPU_FTRS_POSSIBLE,
};
diff --git a/arch/powerpc/include/asm/pte-common.h b/arch/powerpc/include/asm/pte-common.h
index 811f04a..8d1569c 100644
--- a/arch/powerpc/include/asm/pte-common.h
+++ b/arch/powerpc/include/asm/pte-common.h
@@ -162,7 +162,7 @@
* on platforms where such control is possible.
*/
#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\
- defined(CONFIG_KPROBES)
+ defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
#define PAGE_KERNEL_TEXT PAGE_KERNEL_X
#else
#define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index c9b68d0..b9602ee 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -1973,7 +1973,7 @@
.pvr_mask = 0xffff0000,
.pvr_value = 0x80240000,
.cpu_name = "e5500",
- .cpu_features = CPU_FTRS_E500MC,
+ .cpu_features = CPU_FTRS_E5500,
.cpu_user_features = COMMON_USER_BOOKE,
.mmu_features = MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS |
MMU_FTR_USE_TLBILX,
diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index 3d3d416..5b5e1f0 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -163,7 +163,7 @@
}
/* wait for all the CPUs to hit real mode but timeout if they don't come in */
-#if defined(CONFIG_PPC_STD_MMU_64) && defined(CONFIG_SMP)
+#ifdef CONFIG_PPC_STD_MMU_64
static void crash_kexec_wait_realmode(int cpu)
{
unsigned int msecs;
@@ -188,9 +188,7 @@
}
mb();
}
-#else
-static inline void crash_kexec_wait_realmode(int cpu) {}
-#endif
+#endif /* CONFIG_PPC_STD_MMU_64 */
/*
* This function will be called by secondary cpus or by kexec cpu
@@ -235,7 +233,9 @@
crash_ipi_callback(regs);
}
-#else
+#else /* ! CONFIG_SMP */
+static inline void crash_kexec_wait_realmode(int cpu) {}
+
static void crash_kexec_prepare_cpus(int cpu)
{
/*
@@ -255,7 +255,7 @@
{
cpus_in_sr = CPU_MASK_NONE;
}
-#endif
+#endif /* CONFIG_SMP */
/*
* Register a function to be called on shutdown. Only use this if you
diff --git a/arch/powerpc/kernel/legacy_serial.c b/arch/powerpc/kernel/legacy_serial.c
index c834757..2b97b80 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -330,9 +330,11 @@
if (!parent)
continue;
if (of_match_node(legacy_serial_parents, parent) != NULL) {
- index = add_legacy_soc_port(np, np);
- if (index >= 0 && np == stdout)
- legacy_serial_console = index;
+ if (of_device_is_available(np)) {
+ index = add_legacy_soc_port(np, np);
+ if (index >= 0 && np == stdout)
+ legacy_serial_console = index;
+ }
}
of_node_put(parent);
}
diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index c4063b7..822f630 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -398,6 +398,25 @@
return 0;
}
+static u64 check_and_compute_delta(u64 prev, u64 val)
+{
+ u64 delta = (val - prev) & 0xfffffffful;
+
+ /*
+ * POWER7 can roll back counter values, if the new value is smaller
+ * than the previous value it will cause the delta and the counter to
+ * have bogus values unless we rolled a counter over. If a coutner is
+ * rolled back, it will be smaller, but within 256, which is the maximum
+ * number of events to rollback at once. If we dectect a rollback
+ * return 0. This can lead to a small lack of precision in the
+ * counters.
+ */
+ if (prev > val && (prev - val) < 256)
+ delta = 0;
+
+ return delta;
+}
+
static void power_pmu_read(struct perf_event *event)
{
s64 val, delta, prev;
@@ -416,10 +435,11 @@
prev = local64_read(&event->hw.prev_count);
barrier();
val = read_pmc(event->hw.idx);
+ delta = check_and_compute_delta(prev, val);
+ if (!delta)
+ return;
} while (local64_cmpxchg(&event->hw.prev_count, prev, val) != prev);
- /* The counters are only 32 bits wide */
- delta = (val - prev) & 0xfffffffful;
local64_add(delta, &event->count);
local64_sub(delta, &event->hw.period_left);
}
@@ -449,8 +469,9 @@
val = (event->hw.idx == 5) ? pmc5 : pmc6;
prev = local64_read(&event->hw.prev_count);
event->hw.idx = 0;
- delta = (val - prev) & 0xfffffffful;
- local64_add(delta, &event->count);
+ delta = check_and_compute_delta(prev, val);
+ if (delta)
+ local64_add(delta, &event->count);
}
}
@@ -458,14 +479,16 @@
unsigned long pmc5, unsigned long pmc6)
{
struct perf_event *event;
- u64 val;
+ u64 val, prev;
int i;
for (i = 0; i < cpuhw->n_limited; ++i) {
event = cpuhw->limited_counter[i];
event->hw.idx = cpuhw->limited_hwidx[i];
val = (event->hw.idx == 5) ? pmc5 : pmc6;
- local64_set(&event->hw.prev_count, val);
+ prev = local64_read(&event->hw.prev_count);
+ if (check_and_compute_delta(prev, val))
+ local64_set(&event->hw.prev_count, val);
perf_event_update_userpage(event);
}
}
@@ -1197,7 +1220,7 @@
/* we don't have to worry about interrupts here */
prev = local64_read(&event->hw.prev_count);
- delta = (val - prev) & 0xfffffffful;
+ delta = check_and_compute_delta(prev, val);
local64_add(delta, &event->count);
/*
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 375480c..f33acfd 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -229,6 +229,9 @@
u64 stolen = 0;
u64 dtb;
+ if (!dtl)
+ return 0;
+
if (i == vpa->dtl_idx)
return 0;
while (i < vpa->dtl_idx) {
diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index a830c5e..bc5f0dc 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -842,6 +842,7 @@
mpic_setup_this_cpu();
}
+#ifdef CONFIG_PPC64
#ifdef CONFIG_HOTPLUG_CPU
static int smp_core99_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
@@ -879,7 +880,6 @@
static void __init smp_core99_bringup_done(void)
{
-#ifdef CONFIG_PPC64
extern void g5_phy_disable_cpu1(void);
/* Close i2c bus if it was used for tb sync */
@@ -894,14 +894,14 @@
set_cpu_present(1, false);
g5_phy_disable_cpu1();
}
-#endif /* CONFIG_PPC64 */
-
#ifdef CONFIG_HOTPLUG_CPU
register_cpu_notifier(&smp_core99_cpu_nb);
#endif
+
if (ppc_md.progress)
ppc_md.progress("smp_core99_bringup_done", 0x349);
}
+#endif /* CONFIG_PPC64 */
#ifdef CONFIG_HOTPLUG_CPU
@@ -975,7 +975,9 @@
struct smp_ops_t core99_smp_ops = {
.message_pass = smp_mpic_message_pass,
.probe = smp_core99_probe,
+#ifdef CONFIG_PPC64
.bringup_done = smp_core99_bringup_done,
+#endif
.kick_cpu = smp_core99_kick_cpu,
.setup_cpu = smp_core99_setup_cpu,
.give_timebase = smp_core99_give_timebase,
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 0007241..6c42cfd 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -287,14 +287,22 @@
int cpu, ret;
struct paca_struct *pp;
struct dtl_entry *dtl;
+ struct kmem_cache *dtl_cache;
if (!firmware_has_feature(FW_FEATURE_SPLPAR))
return 0;
+ dtl_cache = kmem_cache_create("dtl", DISPATCH_LOG_BYTES,
+ DISPATCH_LOG_BYTES, 0, NULL);
+ if (!dtl_cache) {
+ pr_warn("Failed to create dispatch trace log buffer cache\n");
+ pr_warn("Stolen time statistics will be unreliable\n");
+ return 0;
+ }
+
for_each_possible_cpu(cpu) {
pp = &paca[cpu];
- dtl = kmalloc_node(DISPATCH_LOG_BYTES, GFP_KERNEL,
- cpu_to_node(cpu));
+ dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL);
if (!dtl) {
pr_warn("Failed to allocate dispatch trace log for cpu %d\n",
cpu);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index f8f7f28..68ca929 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -324,6 +324,11 @@
struct resource rsrc;
const int *bus_range;
+ if (!of_device_is_available(dev)) {
+ pr_warning("%s: disabled\n", dev->full_name);
+ return -ENODEV;
+ }
+
pr_debug("Adding PCI host bridge %s\n", dev->full_name);
/* Fetch host bridge registers address */
diff --git a/block/blk-core.c b/block/blk-core.c
index 78b7b0c..5fa3dd2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -204,7 +204,7 @@
q = container_of(work, struct request_queue, delay_work.work);
spin_lock_irq(q->queue_lock);
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
spin_unlock_irq(q->queue_lock);
}
@@ -220,7 +220,8 @@
*/
void blk_delay_queue(struct request_queue *q, unsigned long msecs)
{
- schedule_delayed_work(&q->delay_work, msecs_to_jiffies(msecs));
+ queue_delayed_work(kblockd_workqueue, &q->delay_work,
+ msecs_to_jiffies(msecs));
}
EXPORT_SYMBOL(blk_delay_queue);
@@ -238,7 +239,7 @@
WARN_ON(!irqs_disabled());
queue_flag_clear(QUEUE_FLAG_STOPPED, q);
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
}
EXPORT_SYMBOL(blk_start_queue);
@@ -296,9 +297,8 @@
* Description:
* See @blk_run_queue. This variant must be called with the queue lock
* held and interrupts disabled.
- *
*/
-void __blk_run_queue(struct request_queue *q, bool force_kblockd)
+void __blk_run_queue(struct request_queue *q)
{
if (unlikely(blk_queue_stopped(q)))
return;
@@ -307,7 +307,7 @@
* Only recurse once to avoid overrunning the stack, let the unplug
* handling reinvoke the handler shortly if we already got there.
*/
- if (!force_kblockd && !queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) {
+ if (!queue_flag_test_and_set(QUEUE_FLAG_REENTER, q)) {
q->request_fn(q);
queue_flag_clear(QUEUE_FLAG_REENTER, q);
} else
@@ -316,6 +316,20 @@
EXPORT_SYMBOL(__blk_run_queue);
/**
+ * blk_run_queue_async - run a single device queue in workqueue context
+ * @q: The queue to run
+ *
+ * Description:
+ * Tells kblockd to perform the equivalent of @blk_run_queue on behalf
+ * of us.
+ */
+void blk_run_queue_async(struct request_queue *q)
+{
+ if (likely(!blk_queue_stopped(q)))
+ queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
+}
+
+/**
* blk_run_queue - run a single device queue
* @q: The queue to run
*
@@ -328,7 +342,7 @@
unsigned long flags;
spin_lock_irqsave(q->queue_lock, flags);
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
}
EXPORT_SYMBOL(blk_run_queue);
@@ -977,7 +991,7 @@
blk_queue_end_tag(q, rq);
add_acct_request(q, rq, where);
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
}
EXPORT_SYMBOL(blk_insert_request);
@@ -1321,7 +1335,7 @@
} else {
spin_lock_irq(q->queue_lock);
add_acct_request(q, req, where);
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
out_unlock:
spin_unlock_irq(q->queue_lock);
}
@@ -2638,6 +2652,7 @@
plug->magic = PLUG_MAGIC;
INIT_LIST_HEAD(&plug->list);
+ INIT_LIST_HEAD(&plug->cb_list);
plug->should_sort = 0;
/*
@@ -2670,12 +2685,41 @@
*/
static void queue_unplugged(struct request_queue *q, unsigned int depth,
bool from_schedule)
+ __releases(q->queue_lock)
{
trace_block_unplug(q, depth, !from_schedule);
- __blk_run_queue(q, from_schedule);
- if (q->unplugged_fn)
- q->unplugged_fn(q);
+ /*
+ * If we are punting this to kblockd, then we can safely drop
+ * the queue_lock before waking kblockd (which needs to take
+ * this lock).
+ */
+ if (from_schedule) {
+ spin_unlock(q->queue_lock);
+ blk_run_queue_async(q);
+ } else {
+ __blk_run_queue(q);
+ spin_unlock(q->queue_lock);
+ }
+
+}
+
+static void flush_plug_callbacks(struct blk_plug *plug)
+{
+ LIST_HEAD(callbacks);
+
+ if (list_empty(&plug->cb_list))
+ return;
+
+ list_splice_init(&plug->cb_list, &callbacks);
+
+ while (!list_empty(&callbacks)) {
+ struct blk_plug_cb *cb = list_first_entry(&callbacks,
+ struct blk_plug_cb,
+ list);
+ list_del(&cb->list);
+ cb->callback(cb);
+ }
}
void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
@@ -2688,6 +2732,7 @@
BUG_ON(plug->magic != PLUG_MAGIC);
+ flush_plug_callbacks(plug);
if (list_empty(&plug->list))
return;
@@ -2712,10 +2757,11 @@
BUG_ON(!(rq->cmd_flags & REQ_ON_PLUG));
BUG_ON(!rq->q);
if (rq->q != q) {
- if (q) {
+ /*
+ * This drops the queue lock
+ */
+ if (q)
queue_unplugged(q, depth, from_schedule);
- spin_unlock(q->queue_lock);
- }
q = rq->q;
depth = 0;
spin_lock(q->queue_lock);
@@ -2733,10 +2779,11 @@
depth++;
}
- if (q) {
+ /*
+ * This drops the queue lock
+ */
+ if (q)
queue_unplugged(q, depth, from_schedule);
- spin_unlock(q->queue_lock);
- }
local_irq_restore(flags);
}
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 7482b7f..81e3181 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -55,7 +55,7 @@
WARN_ON(irqs_disabled());
spin_lock_irq(q->queue_lock);
__elv_add_request(q, rq, where);
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
/* the queue is stopped so it won't be plugged+unplugged */
if (rq->cmd_type == REQ_TYPE_PM_RESUME)
q->request_fn(q);
diff --git a/block/blk-flush.c b/block/blk-flush.c
index eba4a27..6c9b5e1 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -218,7 +218,7 @@
* request_fn may confuse the driver. Always use kblockd.
*/
if (queued)
- __blk_run_queue(q, true);
+ blk_run_queue_async(q);
}
/**
@@ -274,7 +274,7 @@
* the comment in flush_end_io().
*/
if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error))
- __blk_run_queue(q, true);
+ blk_run_queue_async(q);
}
/**
diff --git a/block/blk-settings.c b/block/blk-settings.c
index eb94904..1fa7692 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -790,22 +790,6 @@
}
EXPORT_SYMBOL_GPL(blk_queue_flush);
-/**
- * blk_queue_unplugged - register a callback for an unplug event
- * @q: the request queue for the device
- * @fn: the function to call
- *
- * Some stacked drivers may need to know when IO is dispatched on an
- * unplug event. By registrering a callback here, they will be notified
- * when someone flushes their on-stack queue plug. The function will be
- * called with the queue lock held.
- */
-void blk_queue_unplugged(struct request_queue *q, unplugged_fn *fn)
-{
- q->unplugged_fn = fn;
-}
-EXPORT_SYMBOL(blk_queue_unplugged);
-
static int __init blk_settings_init(void)
{
blk_max_low_pfn = max_low_pfn - 1;
diff --git a/block/blk.h b/block/blk.h
index 6126346..c9df8fc 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -22,6 +22,7 @@
void blk_delete_timer(struct request *);
void blk_add_timer(struct request *);
void __generic_unplug_device(struct request_queue *);
+void blk_run_queue_async(struct request_queue *q);
/*
* Internal atomic flags for request handling
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 3be881e..46b0a1d 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3368,7 +3368,7 @@
cfqd->busy_queues > 1) {
cfq_del_timer(cfqd, cfqq);
cfq_clear_cfqq_wait_request(cfqq);
- __blk_run_queue(cfqd->queue, false);
+ __blk_run_queue(cfqd->queue);
} else {
cfq_blkiocg_update_idle_time_stats(
&cfqq->cfqg->blkg);
@@ -3383,7 +3383,7 @@
* this new queue is RT and the current one is BE
*/
cfq_preempt_queue(cfqd, cfqq);
- __blk_run_queue(cfqd->queue, false);
+ __blk_run_queue(cfqd->queue);
}
}
@@ -3743,7 +3743,7 @@
struct request_queue *q = cfqd->queue;
spin_lock_irq(q->queue_lock);
- __blk_run_queue(cfqd->queue, false);
+ __blk_run_queue(cfqd->queue);
spin_unlock_irq(q->queue_lock);
}
diff --git a/block/elevator.c b/block/elevator.c
index 0cdb4e7..6f6abc0 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -642,7 +642,7 @@
*/
elv_drain_elevator(q);
while (q->rq.elvpriv) {
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
spin_unlock_irq(q->queue_lock);
msleep(10);
spin_lock_irq(q->queue_lock);
@@ -695,7 +695,7 @@
* with anything. There's no point in delaying queue
* processing.
*/
- __blk_run_queue(q, false);
+ __blk_run_queue(q);
break;
case ELEVATOR_INSERT_SORT_MERGE:
diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 7f42d3a..88d8e4c 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -39,13 +39,13 @@
};
struct evdev_client {
- int head;
- int tail;
+ unsigned int head;
+ unsigned int tail;
spinlock_t buffer_lock; /* protects access to buffer, head and tail */
struct fasync_struct *fasync;
struct evdev *evdev;
struct list_head node;
- int bufsize;
+ unsigned int bufsize;
struct input_event buffer[];
};
@@ -55,16 +55,25 @@
static void evdev_pass_event(struct evdev_client *client,
struct input_event *event)
{
- /*
- * Interrupts are disabled, just acquire the lock.
- * Make sure we don't leave with the client buffer
- * "empty" by having client->head == client->tail.
- */
+ /* Interrupts are disabled, just acquire the lock. */
spin_lock(&client->buffer_lock);
- do {
- client->buffer[client->head++] = *event;
- client->head &= client->bufsize - 1;
- } while (client->head == client->tail);
+
+ client->buffer[client->head++] = *event;
+ client->head &= client->bufsize - 1;
+
+ if (unlikely(client->head == client->tail)) {
+ /*
+ * This effectively "drops" all unconsumed events, leaving
+ * EV_SYN/SYN_DROPPED plus the newest event in the queue.
+ */
+ client->tail = (client->head - 2) & (client->bufsize - 1);
+
+ client->buffer[client->tail].time = event->time;
+ client->buffer[client->tail].type = EV_SYN;
+ client->buffer[client->tail].code = SYN_DROPPED;
+ client->buffer[client->tail].value = 0;
+ }
+
spin_unlock(&client->buffer_lock);
if (event->type == EV_SYN)
diff --git a/drivers/input/input.c b/drivers/input/input.c
index d6e8bd8..ebbceed 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -1746,6 +1746,42 @@
}
EXPORT_SYMBOL(input_set_capability);
+static unsigned int input_estimate_events_per_packet(struct input_dev *dev)
+{
+ int mt_slots;
+ int i;
+ unsigned int events;
+
+ if (dev->mtsize) {
+ mt_slots = dev->mtsize;
+ } else if (test_bit(ABS_MT_TRACKING_ID, dev->absbit)) {
+ mt_slots = dev->absinfo[ABS_MT_TRACKING_ID].maximum -
+ dev->absinfo[ABS_MT_TRACKING_ID].minimum + 1,
+ clamp(mt_slots, 2, 32);
+ } else if (test_bit(ABS_MT_POSITION_X, dev->absbit)) {
+ mt_slots = 2;
+ } else {
+ mt_slots = 0;
+ }
+
+ events = mt_slots + 1; /* count SYN_MT_REPORT and SYN_REPORT */
+
+ for (i = 0; i < ABS_CNT; i++) {
+ if (test_bit(i, dev->absbit)) {
+ if (input_is_mt_axis(i))
+ events += mt_slots;
+ else
+ events++;
+ }
+ }
+
+ for (i = 0; i < REL_CNT; i++)
+ if (test_bit(i, dev->relbit))
+ events++;
+
+ return events;
+}
+
#define INPUT_CLEANSE_BITMASK(dev, type, bits) \
do { \
if (!test_bit(EV_##type, dev->evbit)) \
@@ -1793,6 +1829,10 @@
/* Make sure that bitmasks not mentioned in dev->evbit are clean. */
input_cleanse_bitmasks(dev);
+ if (!dev->hint_events_per_packet)
+ dev->hint_events_per_packet =
+ input_estimate_events_per_packet(dev);
+
/*
* If delay and period are pre-set by the driver, then autorepeating
* is handled by the driver itself and we don't do it in input.c.
diff --git a/drivers/input/keyboard/twl4030_keypad.c b/drivers/input/keyboard/twl4030_keypad.c
index 09bef79..a26922c 100644
--- a/drivers/input/keyboard/twl4030_keypad.c
+++ b/drivers/input/keyboard/twl4030_keypad.c
@@ -332,18 +332,20 @@
static int __devinit twl4030_kp_probe(struct platform_device *pdev)
{
struct twl4030_keypad_data *pdata = pdev->dev.platform_data;
- const struct matrix_keymap_data *keymap_data = pdata->keymap_data;
+ const struct matrix_keymap_data *keymap_data;
struct twl4030_keypad *kp;
struct input_dev *input;
u8 reg;
int error;
- if (!pdata || !pdata->rows || !pdata->cols ||
+ if (!pdata || !pdata->rows || !pdata->cols || !pdata->keymap_data ||
pdata->rows > TWL4030_MAX_ROWS || pdata->cols > TWL4030_MAX_COLS) {
dev_err(&pdev->dev, "Invalid platform_data\n");
return -EINVAL;
}
+ keymap_data = pdata->keymap_data;
+
kp = kzalloc(sizeof(*kp), GFP_KERNEL);
input = input_allocate_device();
if (!kp || !input) {
diff --git a/drivers/input/misc/xen-kbdfront.c b/drivers/input/misc/xen-kbdfront.c
index 7077f9b..62bae99 100644
--- a/drivers/input/misc/xen-kbdfront.c
+++ b/drivers/input/misc/xen-kbdfront.c
@@ -303,7 +303,7 @@
enum xenbus_state backend_state)
{
struct xenkbd_info *info = dev_get_drvdata(&dev->dev);
- int val;
+ int ret, val;
switch (backend_state) {
case XenbusStateInitialising:
@@ -316,6 +316,17 @@
case XenbusStateInitWait:
InitWait:
+ ret = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+ "feature-abs-pointer", "%d", &val);
+ if (ret < 0)
+ val = 0;
+ if (val) {
+ ret = xenbus_printf(XBT_NIL, info->xbdev->nodename,
+ "request-abs-pointer", "1");
+ if (ret)
+ pr_warning("xenkbd: can't request abs-pointer");
+ }
+
xenbus_switch_state(dev, XenbusStateConnected);
break;
diff --git a/drivers/input/touchscreen/h3600_ts_input.c b/drivers/input/touchscreen/h3600_ts_input.c
index efa0688..45f93d0 100644
--- a/drivers/input/touchscreen/h3600_ts_input.c
+++ b/drivers/input/touchscreen/h3600_ts_input.c
@@ -399,31 +399,34 @@
IRQF_SHARED | IRQF_DISABLED, "h3600_action", &ts->dev)) {
printk(KERN_ERR "h3600ts.c: Could not allocate Action Button IRQ!\n");
err = -EBUSY;
- goto fail2;
+ goto fail1;
}
if (request_irq(IRQ_GPIO_BITSY_NPOWER_BUTTON, npower_button_handler,
IRQF_SHARED | IRQF_DISABLED, "h3600_suspend", &ts->dev)) {
printk(KERN_ERR "h3600ts.c: Could not allocate Power Button IRQ!\n");
err = -EBUSY;
- goto fail3;
+ goto fail2;
}
serio_set_drvdata(serio, ts);
err = serio_open(serio, drv);
if (err)
- return err;
+ goto fail3;
//h3600_flite_control(1, 25); /* default brightness */
- input_register_device(ts->dev);
+ err = input_register_device(ts->dev);
+ if (err)
+ goto fail4;
return 0;
-fail3: free_irq(IRQ_GPIO_BITSY_NPOWER_BUTTON, ts->dev);
+fail4: serio_close(serio);
+fail3: serio_set_drvdata(serio, NULL);
+ free_irq(IRQ_GPIO_BITSY_NPOWER_BUTTON, ts->dev);
fail2: free_irq(IRQ_GPIO_BITSY_ACTION_BUTTON, ts->dev);
-fail1: serio_set_drvdata(serio, NULL);
- input_free_device(input_dev);
+fail1: input_free_device(input_dev);
kfree(ts);
return err;
}
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 5ef136c..e5d8904 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -390,13 +390,6 @@
return md_raid5_congested(&rs->md, bits);
}
-static void raid_unplug(struct dm_target_callbacks *cb)
-{
- struct raid_set *rs = container_of(cb, struct raid_set, callbacks);
-
- md_raid5_kick_device(rs->md.private);
-}
-
/*
* Construct a RAID4/5/6 mapping:
* Args:
@@ -487,7 +480,6 @@
}
rs->callbacks.congested_fn = raid_is_congested;
- rs->callbacks.unplug_fn = raid_unplug;
dm_table_add_target_callbacks(ti->table, &rs->callbacks);
return 0;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index b12b377..6e853c6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -447,48 +447,59 @@
/* Support for plugging.
* This mirrors the plugging support in request_queue, but does not
- * require having a whole queue
+ * require having a whole queue or request structures.
+ * We allocate an md_plug_cb for each md device and each thread it gets
+ * plugged on. This links tot the private plug_handle structure in the
+ * personality data where we keep a count of the number of outstanding
+ * plugs so other code can see if a plug is active.
*/
-static void plugger_work(struct work_struct *work)
-{
- struct plug_handle *plug =
- container_of(work, struct plug_handle, unplug_work);
- plug->unplug_fn(plug);
-}
-static void plugger_timeout(unsigned long data)
-{
- struct plug_handle *plug = (void *)data;
- kblockd_schedule_work(NULL, &plug->unplug_work);
-}
-void plugger_init(struct plug_handle *plug,
- void (*unplug_fn)(struct plug_handle *))
-{
- plug->unplug_flag = 0;
- plug->unplug_fn = unplug_fn;
- init_timer(&plug->unplug_timer);
- plug->unplug_timer.function = plugger_timeout;
- plug->unplug_timer.data = (unsigned long)plug;
- INIT_WORK(&plug->unplug_work, plugger_work);
-}
-EXPORT_SYMBOL_GPL(plugger_init);
+struct md_plug_cb {
+ struct blk_plug_cb cb;
+ mddev_t *mddev;
+};
-void plugger_set_plug(struct plug_handle *plug)
+static void plugger_unplug(struct blk_plug_cb *cb)
{
- if (!test_and_set_bit(PLUGGED_FLAG, &plug->unplug_flag))
- mod_timer(&plug->unplug_timer, jiffies + msecs_to_jiffies(3)+1);
+ struct md_plug_cb *mdcb = container_of(cb, struct md_plug_cb, cb);
+ if (atomic_dec_and_test(&mdcb->mddev->plug_cnt))
+ md_wakeup_thread(mdcb->mddev->thread);
+ kfree(mdcb);
}
-EXPORT_SYMBOL_GPL(plugger_set_plug);
-int plugger_remove_plug(struct plug_handle *plug)
+/* Check that an unplug wakeup will come shortly.
+ * If not, wakeup the md thread immediately
+ */
+int mddev_check_plugged(mddev_t *mddev)
{
- if (test_and_clear_bit(PLUGGED_FLAG, &plug->unplug_flag)) {
- del_timer(&plug->unplug_timer);
- return 1;
- } else
+ struct blk_plug *plug = current->plug;
+ struct md_plug_cb *mdcb;
+
+ if (!plug)
return 0;
-}
-EXPORT_SYMBOL_GPL(plugger_remove_plug);
+ list_for_each_entry(mdcb, &plug->cb_list, cb.list) {
+ if (mdcb->cb.callback == plugger_unplug &&
+ mdcb->mddev == mddev) {
+ /* Already on the list, move to top */
+ if (mdcb != list_first_entry(&plug->cb_list,
+ struct md_plug_cb,
+ cb.list))
+ list_move(&mdcb->cb.list, &plug->cb_list);
+ return 1;
+ }
+ }
+ /* Not currently on the callback list */
+ mdcb = kmalloc(sizeof(*mdcb), GFP_ATOMIC);
+ if (!mdcb)
+ return 0;
+
+ mdcb->mddev = mddev;
+ mdcb->cb.callback = plugger_unplug;
+ atomic_inc(&mddev->plug_cnt);
+ list_add(&mdcb->cb.list, &plug->cb_list);
+ return 1;
+}
+EXPORT_SYMBOL_GPL(mddev_check_plugged);
static inline mddev_t *mddev_get(mddev_t *mddev)
{
@@ -538,6 +549,7 @@
atomic_set(&mddev->active, 1);
atomic_set(&mddev->openers, 0);
atomic_set(&mddev->active_io, 0);
+ atomic_set(&mddev->plug_cnt, 0);
spin_lock_init(&mddev->write_lock);
atomic_set(&mddev->flush_pending, 0);
init_waitqueue_head(&mddev->sb_wait);
@@ -4723,7 +4735,6 @@
mddev->bitmap_info.chunksize = 0;
mddev->bitmap_info.daemon_sleep = 0;
mddev->bitmap_info.max_write_behind = 0;
- mddev->plug = NULL;
}
static void __md_stop_writes(mddev_t *mddev)
@@ -6688,12 +6699,6 @@
}
EXPORT_SYMBOL_GPL(md_allow_write);
-void md_unplug(mddev_t *mddev)
-{
- if (mddev->plug)
- mddev->plug->unplug_fn(mddev->plug);
-}
-
#define SYNC_MARKS 10
#define SYNC_MARK_STEP (3*HZ)
void md_do_sync(mddev_t *mddev)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 52b4073..0b1fd3f 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -29,26 +29,6 @@
typedef struct mddev_s mddev_t;
typedef struct mdk_rdev_s mdk_rdev_t;
-/* generic plugging support - like that provided with request_queue,
- * but does not require a request_queue
- */
-struct plug_handle {
- void (*unplug_fn)(struct plug_handle *);
- struct timer_list unplug_timer;
- struct work_struct unplug_work;
- unsigned long unplug_flag;
-};
-#define PLUGGED_FLAG 1
-void plugger_init(struct plug_handle *plug,
- void (*unplug_fn)(struct plug_handle *));
-void plugger_set_plug(struct plug_handle *plug);
-int plugger_remove_plug(struct plug_handle *plug);
-static inline void plugger_flush(struct plug_handle *plug)
-{
- del_timer_sync(&plug->unplug_timer);
- cancel_work_sync(&plug->unplug_work);
-}
-
/*
* MD's 'extended' device
*/
@@ -199,6 +179,9 @@
int delta_disks, new_level, new_layout;
int new_chunk_sectors;
+ atomic_t plug_cnt; /* If device is expecting
+ * more bios soon.
+ */
struct mdk_thread_s *thread; /* management thread */
struct mdk_thread_s *sync_thread; /* doing resync or reconstruct */
sector_t curr_resync; /* last block scheduled */
@@ -336,7 +319,6 @@
struct list_head all_mddevs;
struct attribute_group *to_remove;
- struct plug_handle *plug; /* if used by personality */
struct bio_set *bio_set;
@@ -516,7 +498,6 @@
extern void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
extern int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale);
extern void restore_bitmap_write_access(struct file *file);
-extern void md_unplug(mddev_t *mddev);
extern void mddev_init(mddev_t *mddev);
extern int md_run(mddev_t *mddev);
@@ -530,4 +511,5 @@
mddev_t *mddev);
extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
mddev_t *mddev);
+extern int mddev_check_plugged(mddev_t *mddev);
#endif /* _MD_MD_H */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c2a21ae5..2b7a7ff 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -565,12 +565,6 @@
spin_unlock_irq(&conf->device_lock);
}
-static void md_kick_device(mddev_t *mddev)
-{
- blk_flush_plug(current);
- md_wakeup_thread(mddev->thread);
-}
-
/* Barriers....
* Sometimes we need to suspend IO while we do something else,
* either some resync/recovery, or reconfigure the array.
@@ -600,7 +594,7 @@
/* Wait until no block IO is waiting */
wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
- conf->resync_lock, md_kick_device(conf->mddev));
+ conf->resync_lock, );
/* block any new IO from starting */
conf->barrier++;
@@ -608,7 +602,7 @@
/* Now wait for all pending IO to complete */
wait_event_lock_irq(conf->wait_barrier,
!conf->nr_pending && conf->barrier < RESYNC_DEPTH,
- conf->resync_lock, md_kick_device(conf->mddev));
+ conf->resync_lock, );
spin_unlock_irq(&conf->resync_lock);
}
@@ -630,7 +624,7 @@
conf->nr_waiting++;
wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
conf->resync_lock,
- md_kick_device(conf->mddev));
+ );
conf->nr_waiting--;
}
conf->nr_pending++;
@@ -666,8 +660,7 @@
wait_event_lock_irq(conf->wait_barrier,
conf->nr_pending == conf->nr_queued+1,
conf->resync_lock,
- ({ flush_pending_writes(conf);
- md_kick_device(conf->mddev); }));
+ flush_pending_writes(conf));
spin_unlock_irq(&conf->resync_lock);
}
static void unfreeze_array(conf_t *conf)
@@ -729,6 +722,7 @@
const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA));
mdk_rdev_t *blocked_rdev;
+ int plugged;
/*
* Register the new request and wait if the reconstruction
@@ -820,6 +814,8 @@
* inc refcount on their rdev. Record them by setting
* bios[x] to bio
*/
+ plugged = mddev_check_plugged(mddev);
+
disks = conf->raid_disks;
retry_write:
blocked_rdev = NULL;
@@ -925,7 +921,7 @@
/* In case raid1d snuck in to freeze_array */
wake_up(&conf->wait_barrier);
- if (do_sync || !bitmap)
+ if (do_sync || !bitmap || !plugged)
md_wakeup_thread(mddev->thread);
return 0;
@@ -1516,13 +1512,16 @@
conf_t *conf = mddev->private;
struct list_head *head = &conf->retry_list;
mdk_rdev_t *rdev;
+ struct blk_plug plug;
md_check_recovery(mddev);
-
+
+ blk_start_plug(&plug);
for (;;) {
char b[BDEVNAME_SIZE];
- flush_pending_writes(conf);
+ if (atomic_read(&mddev->plug_cnt) == 0)
+ flush_pending_writes(conf);
spin_lock_irqsave(&conf->device_lock, flags);
if (list_empty(head)) {
@@ -1593,6 +1592,7 @@
}
cond_resched();
}
+ blk_finish_plug(&plug);
}
@@ -2039,7 +2039,6 @@
md_unregister_thread(mddev->thread);
mddev->thread = NULL;
- blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
if (conf->r1bio_pool)
mempool_destroy(conf->r1bio_pool);
kfree(conf->mirrors);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 2da83d5..8e94626 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -634,12 +634,6 @@
spin_unlock_irq(&conf->device_lock);
}
-static void md_kick_device(mddev_t *mddev)
-{
- blk_flush_plug(current);
- md_wakeup_thread(mddev->thread);
-}
-
/* Barriers....
* Sometimes we need to suspend IO while we do something else,
* either some resync/recovery, or reconfigure the array.
@@ -669,15 +663,15 @@
/* Wait until no block IO is waiting (unless 'force') */
wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting,
- conf->resync_lock, md_kick_device(conf->mddev));
+ conf->resync_lock, );
/* block any new IO from starting */
conf->barrier++;
- /* No wait for all pending IO to complete */
+ /* Now wait for all pending IO to complete */
wait_event_lock_irq(conf->wait_barrier,
!conf->nr_pending && conf->barrier < RESYNC_DEPTH,
- conf->resync_lock, md_kick_device(conf->mddev));
+ conf->resync_lock, );
spin_unlock_irq(&conf->resync_lock);
}
@@ -698,7 +692,7 @@
conf->nr_waiting++;
wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
conf->resync_lock,
- md_kick_device(conf->mddev));
+ );
conf->nr_waiting--;
}
conf->nr_pending++;
@@ -734,8 +728,8 @@
wait_event_lock_irq(conf->wait_barrier,
conf->nr_pending == conf->nr_queued+1,
conf->resync_lock,
- ({ flush_pending_writes(conf);
- md_kick_device(conf->mddev); }));
+ flush_pending_writes(conf));
+
spin_unlock_irq(&conf->resync_lock);
}
@@ -762,6 +756,7 @@
const unsigned long do_fua = (bio->bi_rw & REQ_FUA);
unsigned long flags;
mdk_rdev_t *blocked_rdev;
+ int plugged;
if (unlikely(bio->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bio);
@@ -870,6 +865,8 @@
* inc refcount on their rdev. Record them by setting
* bios[x] to bio
*/
+ plugged = mddev_check_plugged(mddev);
+
raid10_find_phys(conf, r10_bio);
retry_write:
blocked_rdev = NULL;
@@ -946,9 +943,8 @@
/* In case raid10d snuck in to freeze_array */
wake_up(&conf->wait_barrier);
- if (do_sync || !mddev->bitmap)
+ if (do_sync || !mddev->bitmap || !plugged)
md_wakeup_thread(mddev->thread);
-
return 0;
}
@@ -1640,9 +1636,11 @@
conf_t *conf = mddev->private;
struct list_head *head = &conf->retry_list;
mdk_rdev_t *rdev;
+ struct blk_plug plug;
md_check_recovery(mddev);
+ blk_start_plug(&plug);
for (;;) {
char b[BDEVNAME_SIZE];
@@ -1716,6 +1714,7 @@
}
cond_resched();
}
+ blk_finish_plug(&plug);
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e867ee4..f301e6a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -27,12 +27,12 @@
*
* We group bitmap updates into batches. Each batch has a number.
* We may write out several batches at once, but that isn't very important.
- * conf->bm_write is the number of the last batch successfully written.
- * conf->bm_flush is the number of the last batch that was closed to
+ * conf->seq_write is the number of the last batch successfully written.
+ * conf->seq_flush is the number of the last batch that was closed to
* new additions.
* When we discover that we will need to write to any block in a stripe
* (in add_stripe_bio) we update the in-memory bitmap and record in sh->bm_seq
- * the number of the batch it will be in. This is bm_flush+1.
+ * the number of the batch it will be in. This is seq_flush+1.
* When we are ready to do a write, if that batch hasn't been written yet,
* we plug the array and queue the stripe for later.
* When an unplug happens, we increment bm_flush, thus closing the current
@@ -199,14 +199,12 @@
BUG_ON(!list_empty(&sh->lru));
BUG_ON(atomic_read(&conf->active_stripes)==0);
if (test_bit(STRIPE_HANDLE, &sh->state)) {
- if (test_bit(STRIPE_DELAYED, &sh->state)) {
+ if (test_bit(STRIPE_DELAYED, &sh->state))
list_add_tail(&sh->lru, &conf->delayed_list);
- plugger_set_plug(&conf->plug);
- } else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
- sh->bm_seq - conf->seq_write > 0) {
+ else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
+ sh->bm_seq - conf->seq_write > 0)
list_add_tail(&sh->lru, &conf->bitmap_list);
- plugger_set_plug(&conf->plug);
- } else {
+ else {
clear_bit(STRIPE_BIT_DELAY, &sh->state);
list_add_tail(&sh->lru, &conf->handle_list);
}
@@ -461,7 +459,7 @@
< (conf->max_nr_stripes *3/4)
|| !conf->inactive_blocked),
conf->device_lock,
- md_raid5_kick_device(conf));
+ );
conf->inactive_blocked = 0;
} else
init_stripe(sh, sector, previous);
@@ -1470,7 +1468,7 @@
wait_event_lock_irq(conf->wait_for_stripe,
!list_empty(&conf->inactive_list),
conf->device_lock,
- blk_flush_plug(current));
+ );
osh = get_free_stripe(conf);
spin_unlock_irq(&conf->device_lock);
atomic_set(&nsh->count, 1);
@@ -3623,8 +3621,7 @@
atomic_inc(&conf->preread_active_stripes);
list_add_tail(&sh->lru, &conf->hold_list);
}
- } else
- plugger_set_plug(&conf->plug);
+ }
}
static void activate_bit_delay(raid5_conf_t *conf)
@@ -3641,21 +3638,6 @@
}
}
-void md_raid5_kick_device(raid5_conf_t *conf)
-{
- blk_flush_plug(current);
- raid5_activate_delayed(conf);
- md_wakeup_thread(conf->mddev->thread);
-}
-EXPORT_SYMBOL_GPL(md_raid5_kick_device);
-
-static void raid5_unplug(struct plug_handle *plug)
-{
- raid5_conf_t *conf = container_of(plug, raid5_conf_t, plug);
-
- md_raid5_kick_device(conf);
-}
-
int md_raid5_congested(mddev_t *mddev, int bits)
{
raid5_conf_t *conf = mddev->private;
@@ -3945,6 +3927,7 @@
struct stripe_head *sh;
const int rw = bio_data_dir(bi);
int remaining;
+ int plugged;
if (unlikely(bi->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bi);
@@ -3963,6 +3946,7 @@
bi->bi_next = NULL;
bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
+ plugged = mddev_check_plugged(mddev);
for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
DEFINE_WAIT(w);
int disks, data_disks;
@@ -4057,7 +4041,7 @@
* add failed due to overlap. Flush everything
* and wait a while
*/
- md_raid5_kick_device(conf);
+ md_wakeup_thread(mddev->thread);
release_stripe(sh);
schedule();
goto retry;
@@ -4077,6 +4061,9 @@
}
}
+ if (!plugged)
+ md_wakeup_thread(mddev->thread);
+
spin_lock_irq(&conf->device_lock);
remaining = raid5_dec_bi_phys_segments(bi);
spin_unlock_irq(&conf->device_lock);
@@ -4478,24 +4465,30 @@
struct stripe_head *sh;
raid5_conf_t *conf = mddev->private;
int handled;
+ struct blk_plug plug;
pr_debug("+++ raid5d active\n");
md_check_recovery(mddev);
+ blk_start_plug(&plug);
handled = 0;
spin_lock_irq(&conf->device_lock);
while (1) {
struct bio *bio;
- if (conf->seq_flush != conf->seq_write) {
- int seq = conf->seq_flush;
+ if (atomic_read(&mddev->plug_cnt) == 0 &&
+ !list_empty(&conf->bitmap_list)) {
+ /* Now is a good time to flush some bitmap updates */
+ conf->seq_flush++;
spin_unlock_irq(&conf->device_lock);
bitmap_unplug(mddev->bitmap);
spin_lock_irq(&conf->device_lock);
- conf->seq_write = seq;
+ conf->seq_write = conf->seq_flush;
activate_bit_delay(conf);
}
+ if (atomic_read(&mddev->plug_cnt) == 0)
+ raid5_activate_delayed(conf);
while ((bio = remove_bio_from_retry(conf))) {
int ok;
@@ -4525,6 +4518,7 @@
spin_unlock_irq(&conf->device_lock);
async_tx_issue_pending_all();
+ blk_finish_plug(&plug);
pr_debug("--- raid5d inactive\n");
}
@@ -5141,8 +5135,6 @@
mdname(mddev));
md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
- plugger_init(&conf->plug, raid5_unplug);
- mddev->plug = &conf->plug;
if (mddev->queue) {
int chunk_size;
/* read-ahead size must cover two whole stripes, which
@@ -5192,7 +5184,6 @@
mddev->thread = NULL;
if (mddev->queue)
mddev->queue->backing_dev_info.congested_fn = NULL;
- plugger_flush(&conf->plug); /* the unplug fn references 'conf'*/
free_conf(conf);
mddev->private = NULL;
mddev->to_remove = &raid5_attrs_group;
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 8d563a4..3ca77a2 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -400,8 +400,6 @@
* Cleared when a sync completes.
*/
- struct plug_handle plug;
-
/* per cpu variables */
struct raid5_percpu {
struct page *spare_page; /* Used when checking P/Q in raid6 */
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6d5c7ff..ab55c2f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -443,7 +443,7 @@
&sdev->request_queue->queue_flags);
if (flagset)
queue_flag_set(QUEUE_FLAG_REENTER, sdev->request_queue);
- __blk_run_queue(sdev->request_queue, false);
+ __blk_run_queue(sdev->request_queue);
if (flagset)
queue_flag_clear(QUEUE_FLAG_REENTER, sdev->request_queue);
spin_unlock(sdev->request_queue->queue_lock);
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index fdf3fa6..28c3350 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3829,7 +3829,7 @@
!test_bit(QUEUE_FLAG_REENTER, &rport->rqst_q->queue_flags);
if (flagset)
queue_flag_set(QUEUE_FLAG_REENTER, rport->rqst_q);
- __blk_run_queue(rport->rqst_q, false);
+ __blk_run_queue(rport->rqst_q);
if (flagset)
queue_flag_clear(QUEUE_FLAG_REENTER, rport->rqst_q);
spin_unlock_irqrestore(rport->rqst_q->queue_lock, flags);
diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index de34bfa..5d505aa 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -178,16 +178,17 @@
if (value) {
acl = posix_acl_from_xattr(value, size);
- if (acl == NULL) {
- value = NULL;
- size = 0;
+ if (acl) {
+ ret = posix_acl_valid(acl);
+ if (ret)
+ goto out;
} else if (IS_ERR(acl)) {
return PTR_ERR(acl);
}
}
ret = btrfs_set_acl(NULL, dentry->d_inode, acl, type);
-
+out:
posix_acl_release(acl);
return ret;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3458b57..2e61fe1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -740,8 +740,10 @@
*/
unsigned long reservation_progress;
- int full; /* indicates that we cannot allocate any more
+ int full:1; /* indicates that we cannot allocate any more
chunks for this space */
+ int chunk_alloc:1; /* set if we are allocating a chunk */
+
int force_alloc; /* set if we need to force a chunk alloc for
this space */
@@ -2576,6 +2578,11 @@
int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
struct inode *inode, u64 start, u64 end);
int btrfs_release_file(struct inode *inode, struct file *file);
+void btrfs_drop_pages(struct page **pages, size_t num_pages);
+int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
+ struct page **pages, size_t num_pages,
+ loff_t pos, size_t write_bytes,
+ struct extent_state **cached);
/* tree-defrag.c */
int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8f1d44b..68c84c8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3057,7 +3057,7 @@
btrfs_destroy_pinned_extent(root,
root->fs_info->pinned_extents);
- t->use_count = 0;
+ atomic_set(&t->use_count, 0);
list_del_init(&t->list);
memset(t, 0, sizeof(*t));
kmem_cache_free(btrfs_transaction_cachep, t);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..31f33ba 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -33,6 +33,25 @@
#include "locking.h"
#include "free-space-cache.h"
+/* control flags for do_chunk_alloc's force field
+ * CHUNK_ALLOC_NO_FORCE means to only allocate a chunk
+ * if we really need one.
+ *
+ * CHUNK_ALLOC_FORCE means it must try to allocate one
+ *
+ * CHUNK_ALLOC_LIMITED means to only try and allocate one
+ * if we have very few chunks already allocated. This is
+ * used as part of the clustering code to help make sure
+ * we have a good pool of storage to cluster in, without
+ * filling the FS with empty chunks
+ *
+ */
+enum {
+ CHUNK_ALLOC_NO_FORCE = 0,
+ CHUNK_ALLOC_FORCE = 1,
+ CHUNK_ALLOC_LIMITED = 2,
+};
+
static int update_block_group(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
u64 bytenr, u64 num_bytes, int alloc);
@@ -3019,7 +3038,8 @@
found->bytes_readonly = 0;
found->bytes_may_use = 0;
found->full = 0;
- found->force_alloc = 0;
+ found->force_alloc = CHUNK_ALLOC_NO_FORCE;
+ found->chunk_alloc = 0;
*space_info = found;
list_add_rcu(&found->list, &info->space_info);
atomic_set(&found->caching_threads, 0);
@@ -3150,7 +3170,7 @@
if (!data_sinfo->full && alloc_chunk) {
u64 alloc_target;
- data_sinfo->force_alloc = 1;
+ data_sinfo->force_alloc = CHUNK_ALLOC_FORCE;
spin_unlock(&data_sinfo->lock);
alloc:
alloc_target = btrfs_get_alloc_profile(root, 1);
@@ -3160,7 +3180,8 @@
ret = do_chunk_alloc(trans, root->fs_info->extent_root,
bytes + 2 * 1024 * 1024,
- alloc_target, 0);
+ alloc_target,
+ CHUNK_ALLOC_NO_FORCE);
btrfs_end_transaction(trans, root);
if (ret < 0) {
if (ret != -ENOSPC)
@@ -3239,31 +3260,56 @@
rcu_read_lock();
list_for_each_entry_rcu(found, head, list) {
if (found->flags & BTRFS_BLOCK_GROUP_METADATA)
- found->force_alloc = 1;
+ found->force_alloc = CHUNK_ALLOC_FORCE;
}
rcu_read_unlock();
}
static int should_alloc_chunk(struct btrfs_root *root,
- struct btrfs_space_info *sinfo, u64 alloc_bytes)
+ struct btrfs_space_info *sinfo, u64 alloc_bytes,
+ int force)
{
u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
+ u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved;
u64 thresh;
- if (sinfo->bytes_used + sinfo->bytes_reserved +
- alloc_bytes + 256 * 1024 * 1024 < num_bytes)
+ if (force == CHUNK_ALLOC_FORCE)
+ return 1;
+
+ /*
+ * in limited mode, we want to have some free space up to
+ * about 1% of the FS size.
+ */
+ if (force == CHUNK_ALLOC_LIMITED) {
+ thresh = btrfs_super_total_bytes(&root->fs_info->super_copy);
+ thresh = max_t(u64, 64 * 1024 * 1024,
+ div_factor_fine(thresh, 1));
+
+ if (num_bytes - num_allocated < thresh)
+ return 1;
+ }
+
+ /*
+ * we have two similar checks here, one based on percentage
+ * and once based on a hard number of 256MB. The idea
+ * is that if we have a good amount of free
+ * room, don't allocate a chunk. A good mount is
+ * less than 80% utilized of the chunks we have allocated,
+ * or more than 256MB free
+ */
+ if (num_allocated + alloc_bytes + 256 * 1024 * 1024 < num_bytes)
return 0;
- if (sinfo->bytes_used + sinfo->bytes_reserved +
- alloc_bytes < div_factor(num_bytes, 8))
+ if (num_allocated + alloc_bytes < div_factor(num_bytes, 8))
return 0;
thresh = btrfs_super_total_bytes(&root->fs_info->super_copy);
+
+ /* 256MB or 5% of the FS */
thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 5));
if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 3))
return 0;
-
return 1;
}
@@ -3273,10 +3319,9 @@
{
struct btrfs_space_info *space_info;
struct btrfs_fs_info *fs_info = extent_root->fs_info;
+ int wait_for_alloc = 0;
int ret = 0;
- mutex_lock(&fs_info->chunk_mutex);
-
flags = btrfs_reduce_alloc_profile(extent_root, flags);
space_info = __find_space_info(extent_root->fs_info, flags);
@@ -3287,21 +3332,40 @@
}
BUG_ON(!space_info);
+again:
spin_lock(&space_info->lock);
if (space_info->force_alloc)
- force = 1;
+ force = space_info->force_alloc;
if (space_info->full) {
spin_unlock(&space_info->lock);
- goto out;
+ return 0;
}
- if (!force && !should_alloc_chunk(extent_root, space_info,
- alloc_bytes)) {
+ if (!should_alloc_chunk(extent_root, space_info, alloc_bytes, force)) {
spin_unlock(&space_info->lock);
- goto out;
+ return 0;
+ } else if (space_info->chunk_alloc) {
+ wait_for_alloc = 1;
+ } else {
+ space_info->chunk_alloc = 1;
}
+
spin_unlock(&space_info->lock);
+ mutex_lock(&fs_info->chunk_mutex);
+
+ /*
+ * The chunk_mutex is held throughout the entirety of a chunk
+ * allocation, so once we've acquired the chunk_mutex we know that the
+ * other guy is done and we need to recheck and see if we should
+ * allocate.
+ */
+ if (wait_for_alloc) {
+ mutex_unlock(&fs_info->chunk_mutex);
+ wait_for_alloc = 0;
+ goto again;
+ }
+
/*
* If we have mixed data/metadata chunks we want to make sure we keep
* allocating mixed chunks instead of individual chunks.
@@ -3327,9 +3391,10 @@
space_info->full = 1;
else
ret = 1;
- space_info->force_alloc = 0;
+
+ space_info->force_alloc = CHUNK_ALLOC_NO_FORCE;
+ space_info->chunk_alloc = 0;
spin_unlock(&space_info->lock);
-out:
mutex_unlock(&extent_root->fs_info->chunk_mutex);
return ret;
}
@@ -5303,11 +5368,13 @@
if (allowed_chunk_alloc) {
ret = do_chunk_alloc(trans, root, num_bytes +
- 2 * 1024 * 1024, data, 1);
+ 2 * 1024 * 1024, data,
+ CHUNK_ALLOC_LIMITED);
allowed_chunk_alloc = 0;
done_chunk_alloc = 1;
- } else if (!done_chunk_alloc) {
- space_info->force_alloc = 1;
+ } else if (!done_chunk_alloc &&
+ space_info->force_alloc == CHUNK_ALLOC_NO_FORCE) {
+ space_info->force_alloc = CHUNK_ALLOC_LIMITED;
}
if (loop < LOOP_NO_EMPTY_SIZE) {
@@ -5393,7 +5460,8 @@
*/
if (empty_size || root->ref_cows)
ret = do_chunk_alloc(trans, root->fs_info->extent_root,
- num_bytes + 2 * 1024 * 1024, data, 0);
+ num_bytes + 2 * 1024 * 1024, data,
+ CHUNK_ALLOC_NO_FORCE);
WARN_ON(num_bytes < root->sectorsize);
ret = find_free_extent(trans, root, num_bytes, empty_size,
@@ -5405,7 +5473,7 @@
num_bytes = num_bytes & ~(root->sectorsize - 1);
num_bytes = max(num_bytes, min_alloc_size);
do_chunk_alloc(trans, root->fs_info->extent_root,
- num_bytes, data, 1);
+ num_bytes, data, CHUNK_ALLOC_FORCE);
goto again;
}
if (ret == -ENOSPC && btrfs_test_opt(root, ENOSPC_DEBUG)) {
@@ -8109,13 +8177,15 @@
alloc_flags = update_block_group_flags(root, cache->flags);
if (alloc_flags != cache->flags)
- do_chunk_alloc(trans, root, 2 * 1024 * 1024, alloc_flags, 1);
+ do_chunk_alloc(trans, root, 2 * 1024 * 1024, alloc_flags,
+ CHUNK_ALLOC_FORCE);
ret = set_block_group_ro(cache);
if (!ret)
goto out;
alloc_flags = get_alloc_profile(root, cache->space_info->flags);
- ret = do_chunk_alloc(trans, root, 2 * 1024 * 1024, alloc_flags, 1);
+ ret = do_chunk_alloc(trans, root, 2 * 1024 * 1024, alloc_flags,
+ CHUNK_ALLOC_FORCE);
if (ret < 0)
goto out;
ret = set_block_group_ro(cache);
@@ -8128,7 +8198,8 @@
struct btrfs_root *root, u64 type)
{
u64 alloc_flags = get_alloc_profile(root, type);
- return do_chunk_alloc(trans, root, 2 * 1024 * 1024, alloc_flags, 1);
+ return do_chunk_alloc(trans, root, 2 * 1024 * 1024, alloc_flags,
+ CHUNK_ALLOC_FORCE);
}
/*
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 20ddb28..3151386 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -690,6 +690,15 @@
}
}
+static void uncache_state(struct extent_state **cached_ptr)
+{
+ if (cached_ptr && (*cached_ptr)) {
+ struct extent_state *state = *cached_ptr;
+ *cached_ptr = NULL;
+ free_extent_state(state);
+ }
+}
+
/*
* set some bits on a range in the tree. This may require allocations or
* sleeping, so the gfp mask is used to indicate what is allowed.
@@ -940,10 +949,10 @@
}
int set_extent_uptodate(struct extent_io_tree *tree, u64 start, u64 end,
- gfp_t mask)
+ struct extent_state **cached_state, gfp_t mask)
{
- return set_extent_bit(tree, start, end, EXTENT_UPTODATE, 0, NULL,
- NULL, mask);
+ return set_extent_bit(tree, start, end, EXTENT_UPTODATE, 0,
+ NULL, cached_state, mask);
}
static int clear_extent_uptodate(struct extent_io_tree *tree, u64 start,
@@ -1012,8 +1021,7 @@
mask);
}
-int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end,
- gfp_t mask)
+int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end, gfp_t mask)
{
return clear_extent_bit(tree, start, end, EXTENT_LOCKED, 1, 0, NULL,
mask);
@@ -1735,6 +1743,9 @@
do {
struct page *page = bvec->bv_page;
+ struct extent_state *cached = NULL;
+ struct extent_state *state;
+
tree = &BTRFS_I(page->mapping->host)->io_tree;
start = ((u64)page->index << PAGE_CACHE_SHIFT) +
@@ -1749,9 +1760,20 @@
if (++bvec <= bvec_end)
prefetchw(&bvec->bv_page->flags);
+ spin_lock(&tree->lock);
+ state = find_first_extent_bit_state(tree, start, EXTENT_LOCKED);
+ if (state && state->start == start) {
+ /*
+ * take a reference on the state, unlock will drop
+ * the ref
+ */
+ cache_state(state, &cached);
+ }
+ spin_unlock(&tree->lock);
+
if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) {
ret = tree->ops->readpage_end_io_hook(page, start, end,
- NULL);
+ state);
if (ret)
uptodate = 0;
}
@@ -1764,15 +1786,16 @@
test_bit(BIO_UPTODATE, &bio->bi_flags);
if (err)
uptodate = 0;
+ uncache_state(&cached);
continue;
}
}
if (uptodate) {
- set_extent_uptodate(tree, start, end,
+ set_extent_uptodate(tree, start, end, &cached,
GFP_ATOMIC);
}
- unlock_extent(tree, start, end, GFP_ATOMIC);
+ unlock_extent_cached(tree, start, end, &cached, GFP_ATOMIC);
if (whole_page) {
if (uptodate) {
@@ -1811,6 +1834,7 @@
do {
struct page *page = bvec->bv_page;
+ struct extent_state *cached = NULL;
tree = &BTRFS_I(page->mapping->host)->io_tree;
start = ((u64)page->index << PAGE_CACHE_SHIFT) +
@@ -1821,13 +1845,14 @@
prefetchw(&bvec->bv_page->flags);
if (uptodate) {
- set_extent_uptodate(tree, start, end, GFP_ATOMIC);
+ set_extent_uptodate(tree, start, end, &cached,
+ GFP_ATOMIC);
} else {
ClearPageUptodate(page);
SetPageError(page);
}
- unlock_extent(tree, start, end, GFP_ATOMIC);
+ unlock_extent_cached(tree, start, end, &cached, GFP_ATOMIC);
} while (bvec >= bio->bi_io_vec);
@@ -2016,14 +2041,17 @@
while (cur <= end) {
if (cur >= last_byte) {
char *userpage;
+ struct extent_state *cached = NULL;
+
iosize = PAGE_CACHE_SIZE - page_offset;
userpage = kmap_atomic(page, KM_USER0);
memset(userpage + page_offset, 0, iosize);
flush_dcache_page(page);
kunmap_atomic(userpage, KM_USER0);
set_extent_uptodate(tree, cur, cur + iosize - 1,
- GFP_NOFS);
- unlock_extent(tree, cur, cur + iosize - 1, GFP_NOFS);
+ &cached, GFP_NOFS);
+ unlock_extent_cached(tree, cur, cur + iosize - 1,
+ &cached, GFP_NOFS);
break;
}
em = get_extent(inode, page, page_offset, cur,
@@ -2063,14 +2091,17 @@
/* we've found a hole, just zero and go on */
if (block_start == EXTENT_MAP_HOLE) {
char *userpage;
+ struct extent_state *cached = NULL;
+
userpage = kmap_atomic(page, KM_USER0);
memset(userpage + page_offset, 0, iosize);
flush_dcache_page(page);
kunmap_atomic(userpage, KM_USER0);
set_extent_uptodate(tree, cur, cur + iosize - 1,
- GFP_NOFS);
- unlock_extent(tree, cur, cur + iosize - 1, GFP_NOFS);
+ &cached, GFP_NOFS);
+ unlock_extent_cached(tree, cur, cur + iosize - 1,
+ &cached, GFP_NOFS);
cur = cur + iosize;
page_offset += iosize;
continue;
@@ -2789,9 +2820,12 @@
iocount++;
block_start = block_start + iosize;
} else {
- set_extent_uptodate(tree, block_start, cur_end,
+ struct extent_state *cached = NULL;
+
+ set_extent_uptodate(tree, block_start, cur_end, &cached,
GFP_NOFS);
- unlock_extent(tree, block_start, cur_end, GFP_NOFS);
+ unlock_extent_cached(tree, block_start, cur_end,
+ &cached, GFP_NOFS);
block_start = cur_end + 1;
}
page_offset = block_start & (PAGE_CACHE_SIZE - 1);
@@ -3457,7 +3491,7 @@
num_pages = num_extent_pages(eb->start, eb->len);
set_extent_uptodate(tree, eb->start, eb->start + eb->len - 1,
- GFP_NOFS);
+ NULL, GFP_NOFS);
for (i = 0; i < num_pages; i++) {
page = extent_buffer_page(eb, i);
if ((i == 0 && (eb->start & (PAGE_CACHE_SIZE - 1))) ||
@@ -3885,6 +3919,12 @@
kunmap_atomic(dst_kaddr, KM_USER0);
}
+static inline bool areas_overlap(unsigned long src, unsigned long dst, unsigned long len)
+{
+ unsigned long distance = (src > dst) ? src - dst : dst - src;
+ return distance < len;
+}
+
static void copy_pages(struct page *dst_page, struct page *src_page,
unsigned long dst_off, unsigned long src_off,
unsigned long len)
@@ -3892,10 +3932,12 @@
char *dst_kaddr = kmap_atomic(dst_page, KM_USER0);
char *src_kaddr;
- if (dst_page != src_page)
+ if (dst_page != src_page) {
src_kaddr = kmap_atomic(src_page, KM_USER1);
- else
+ } else {
src_kaddr = dst_kaddr;
+ BUG_ON(areas_overlap(src_off, dst_off, len));
+ }
memcpy(dst_kaddr + dst_off, src_kaddr + src_off, len);
kunmap_atomic(dst_kaddr, KM_USER0);
@@ -3970,7 +4012,7 @@
"len %lu len %lu\n", dst_offset, len, dst->len);
BUG_ON(1);
}
- if (dst_offset < src_offset) {
+ if (!areas_overlap(src_offset, dst_offset, len)) {
memcpy_extent_buffer(dst, dst_offset, src_offset, len);
return;
}
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index f62c5442..af2d717 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -208,7 +208,7 @@
int bits, int exclusive_bits, u64 *failed_start,
struct extent_state **cached_state, gfp_t mask);
int set_extent_uptodate(struct extent_io_tree *tree, u64 start, u64 end,
- gfp_t mask);
+ struct extent_state **cached_state, gfp_t mask);
int set_extent_new(struct extent_io_tree *tree, u64 start, u64 end,
gfp_t mask);
int set_extent_dirty(struct extent_io_tree *tree, u64 start, u64 end,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e621ea5..75899a0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -104,7 +104,7 @@
/*
* unlocks pages after btrfs_file_write is done with them
*/
-static noinline void btrfs_drop_pages(struct page **pages, size_t num_pages)
+void btrfs_drop_pages(struct page **pages, size_t num_pages)
{
size_t i;
for (i = 0; i < num_pages; i++) {
@@ -127,16 +127,13 @@
* this also makes the decision about creating an inline extent vs
* doing real data extents, marking pages dirty and delalloc as required.
*/
-static noinline int dirty_and_release_pages(struct btrfs_root *root,
- struct file *file,
- struct page **pages,
- size_t num_pages,
- loff_t pos,
- size_t write_bytes)
+int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
+ struct page **pages, size_t num_pages,
+ loff_t pos, size_t write_bytes,
+ struct extent_state **cached)
{
int err = 0;
int i;
- struct inode *inode = fdentry(file)->d_inode;
u64 num_bytes;
u64 start_pos;
u64 end_of_last_block;
@@ -149,7 +146,7 @@
end_of_last_block = start_pos + num_bytes - 1;
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
- NULL);
+ cached);
if (err)
return err;
@@ -992,9 +989,9 @@
}
if (copied > 0) {
- ret = dirty_and_release_pages(root, file, pages,
- dirty_pages, pos,
- copied);
+ ret = btrfs_dirty_pages(root, inode, pages,
+ dirty_pages, pos, copied,
+ NULL);
if (ret) {
btrfs_delalloc_release_space(inode,
dirty_pages << PAGE_CACHE_SHIFT);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index f561c95..11d2e9c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -508,6 +508,7 @@
struct inode *inode;
struct rb_node *node;
struct list_head *pos, *n;
+ struct page **pages;
struct page *page;
struct extent_state *cached_state = NULL;
struct btrfs_free_cluster *cluster = NULL;
@@ -517,13 +518,13 @@
u64 start, end, len;
u64 bytes = 0;
u32 *crc, *checksums;
- pgoff_t index = 0, last_index = 0;
unsigned long first_page_offset;
- int num_checksums;
+ int index = 0, num_pages = 0;
int entries = 0;
int bitmaps = 0;
int ret = 0;
bool next_page = false;
+ bool out_of_space = false;
root = root->fs_info->tree_root;
@@ -551,24 +552,31 @@
return 0;
}
- last_index = (i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT;
+ num_pages = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >>
+ PAGE_CACHE_SHIFT;
filemap_write_and_wait(inode->i_mapping);
btrfs_wait_ordered_range(inode, inode->i_size &
~(root->sectorsize - 1), (u64)-1);
/* We need a checksum per page. */
- num_checksums = i_size_read(inode) / PAGE_CACHE_SIZE;
- crc = checksums = kzalloc(sizeof(u32) * num_checksums, GFP_NOFS);
+ crc = checksums = kzalloc(sizeof(u32) * num_pages, GFP_NOFS);
if (!crc) {
iput(inode);
return 0;
}
+ pages = kzalloc(sizeof(struct page *) * num_pages, GFP_NOFS);
+ if (!pages) {
+ kfree(crc);
+ iput(inode);
+ return 0;
+ }
+
/* Since the first page has all of our checksums and our generation we
* need to calculate the offset into the page that we can start writing
* our entries.
*/
- first_page_offset = (sizeof(u32) * num_checksums) + sizeof(u64);
+ first_page_offset = (sizeof(u32) * num_pages) + sizeof(u64);
/* Get the cluster for this block_group if it exists */
if (!list_empty(&block_group->cluster_list))
@@ -590,20 +598,18 @@
* after find_get_page at this point. Just putting this here so people
* know and don't freak out.
*/
- while (index <= last_index) {
+ while (index < num_pages) {
page = grab_cache_page(inode->i_mapping, index);
if (!page) {
- pgoff_t i = 0;
+ int i;
- while (i < index) {
- page = find_get_page(inode->i_mapping, i);
- unlock_page(page);
- page_cache_release(page);
- page_cache_release(page);
- i++;
+ for (i = 0; i < num_pages; i++) {
+ unlock_page(pages[i]);
+ page_cache_release(pages[i]);
}
goto out_free;
}
+ pages[index] = page;
index++;
}
@@ -631,7 +637,12 @@
offset = start_offset;
}
- page = find_get_page(inode->i_mapping, index);
+ if (index >= num_pages) {
+ out_of_space = true;
+ break;
+ }
+
+ page = pages[index];
addr = kmap(page);
entry = addr + start_offset;
@@ -708,23 +719,6 @@
bytes += PAGE_CACHE_SIZE;
- ClearPageChecked(page);
- set_page_extent_mapped(page);
- SetPageUptodate(page);
- set_page_dirty(page);
-
- /*
- * We need to release our reference we got for grab_cache_page,
- * except for the first page which will hold our checksums, we
- * do that below.
- */
- if (index != 0) {
- unlock_page(page);
- page_cache_release(page);
- }
-
- page_cache_release(page);
-
index++;
} while (node || next_page);
@@ -734,7 +728,11 @@
struct btrfs_free_space *entry =
list_entry(pos, struct btrfs_free_space, list);
- page = find_get_page(inode->i_mapping, index);
+ if (index >= num_pages) {
+ out_of_space = true;
+ break;
+ }
+ page = pages[index];
addr = kmap(page);
memcpy(addr, entry->bitmap, PAGE_CACHE_SIZE);
@@ -745,64 +743,58 @@
crc++;
bytes += PAGE_CACHE_SIZE;
- ClearPageChecked(page);
- set_page_extent_mapped(page);
- SetPageUptodate(page);
- set_page_dirty(page);
- unlock_page(page);
- page_cache_release(page);
- page_cache_release(page);
list_del_init(&entry->list);
index++;
}
+ if (out_of_space) {
+ btrfs_drop_pages(pages, num_pages);
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, 0,
+ i_size_read(inode) - 1, &cached_state,
+ GFP_NOFS);
+ ret = 0;
+ goto out_free;
+ }
+
/* Zero out the rest of the pages just to make sure */
- while (index <= last_index) {
+ while (index < num_pages) {
void *addr;
- page = find_get_page(inode->i_mapping, index);
-
+ page = pages[index];
addr = kmap(page);
memset(addr, 0, PAGE_CACHE_SIZE);
kunmap(page);
- ClearPageChecked(page);
- set_page_extent_mapped(page);
- SetPageUptodate(page);
- set_page_dirty(page);
- unlock_page(page);
- page_cache_release(page);
- page_cache_release(page);
bytes += PAGE_CACHE_SIZE;
index++;
}
- btrfs_set_extent_delalloc(inode, 0, bytes - 1, &cached_state);
-
/* Write the checksums and trans id to the first page */
{
void *addr;
u64 *gen;
- page = find_get_page(inode->i_mapping, 0);
+ page = pages[0];
addr = kmap(page);
- memcpy(addr, checksums, sizeof(u32) * num_checksums);
- gen = addr + (sizeof(u32) * num_checksums);
+ memcpy(addr, checksums, sizeof(u32) * num_pages);
+ gen = addr + (sizeof(u32) * num_pages);
*gen = trans->transid;
kunmap(page);
- ClearPageChecked(page);
- set_page_extent_mapped(page);
- SetPageUptodate(page);
- set_page_dirty(page);
- unlock_page(page);
- page_cache_release(page);
- page_cache_release(page);
}
- BTRFS_I(inode)->generation = trans->transid;
+ ret = btrfs_dirty_pages(root, inode, pages, num_pages, 0,
+ bytes, &cached_state);
+ btrfs_drop_pages(pages, num_pages);
unlock_extent_cached(&BTRFS_I(inode)->io_tree, 0,
i_size_read(inode) - 1, &cached_state, GFP_NOFS);
+ if (ret) {
+ ret = 0;
+ goto out_free;
+ }
+
+ BTRFS_I(inode)->generation = trans->transid;
+
filemap_write_and_wait(inode->i_mapping);
key.objectid = BTRFS_FREE_SPACE_OBJECTID;
@@ -853,6 +845,7 @@
BTRFS_I(inode)->generation = 0;
}
kfree(checksums);
+ kfree(pages);
btrfs_update_inode(trans, root, inode);
iput(inode);
return ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5cc64ab..fcd66b6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1770,9 +1770,12 @@
add_pending_csums(trans, inode, ordered_extent->file_offset,
&ordered_extent->list);
- btrfs_ordered_update_i_size(inode, 0, ordered_extent);
- ret = btrfs_update_inode(trans, root, inode);
- BUG_ON(ret);
+ ret = btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+ if (!ret) {
+ ret = btrfs_update_inode(trans, root, inode);
+ BUG_ON(ret);
+ }
+ ret = 0;
out:
if (nolock) {
if (trans)
@@ -2590,6 +2593,13 @@
struct btrfs_inode_item *item,
struct inode *inode)
{
+ if (!leaf->map_token)
+ map_private_extent_buffer(leaf, (unsigned long)item,
+ sizeof(struct btrfs_inode_item),
+ &leaf->map_token, &leaf->kaddr,
+ &leaf->map_start, &leaf->map_len,
+ KM_USER1);
+
btrfs_set_inode_uid(leaf, item, inode->i_uid);
btrfs_set_inode_gid(leaf, item, inode->i_gid);
btrfs_set_inode_size(leaf, item, BTRFS_I(inode)->disk_i_size);
@@ -2618,6 +2628,11 @@
btrfs_set_inode_rdev(leaf, item, inode->i_rdev);
btrfs_set_inode_flags(leaf, item, BTRFS_I(inode)->flags);
btrfs_set_inode_block_group(leaf, item, BTRFS_I(inode)->block_group);
+
+ if (leaf->map_token) {
+ unmap_extent_buffer(leaf, leaf->map_token, KM_USER1);
+ leaf->map_token = NULL;
+ }
}
/*
@@ -4207,10 +4222,8 @@
struct btrfs_key found_key;
struct btrfs_path *path;
int ret;
- u32 nritems;
struct extent_buffer *leaf;
int slot;
- int advance;
unsigned char d_type;
int over = 0;
u32 di_cur;
@@ -4253,27 +4266,19 @@
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
if (ret < 0)
goto err;
- advance = 0;
while (1) {
leaf = path->nodes[0];
- nritems = btrfs_header_nritems(leaf);
slot = path->slots[0];
- if (advance || slot >= nritems) {
- if (slot >= nritems - 1) {
- ret = btrfs_next_leaf(root, path);
- if (ret)
- break;
- leaf = path->nodes[0];
- nritems = btrfs_header_nritems(leaf);
- slot = path->slots[0];
- } else {
- slot++;
- path->slots[0]++;
- }
+ if (slot >= btrfs_header_nritems(leaf)) {
+ ret = btrfs_next_leaf(root, path);
+ if (ret < 0)
+ goto err;
+ else if (ret > 0)
+ break;
+ continue;
}
- advance = 1;
item = btrfs_item_nr(leaf, slot);
btrfs_item_key_to_cpu(leaf, &found_key, slot);
@@ -4282,7 +4287,7 @@
if (btrfs_key_type(&found_key) != key_type)
break;
if (found_key.offset < filp->f_pos)
- continue;
+ goto next;
filp->f_pos = found_key.offset;
@@ -4335,6 +4340,8 @@
di_cur += di_len;
di = (struct btrfs_dir_item *)((char *)di + di_len);
}
+next:
+ path->slots[0]++;
}
/* Reached end of directory/root. Bump pos past the last item. */
@@ -4527,14 +4534,17 @@
BUG_ON(!path);
inode = new_inode(root->fs_info->sb);
- if (!inode)
+ if (!inode) {
+ btrfs_free_path(path);
return ERR_PTR(-ENOMEM);
+ }
if (dir) {
trace_btrfs_inode_request(dir);
ret = btrfs_set_inode_index(dir, index);
if (ret) {
+ btrfs_free_path(path);
iput(inode);
return ERR_PTR(ret);
}
@@ -4834,9 +4844,6 @@
if (inode->i_nlink == ~0U)
return -EMLINK;
- btrfs_inc_nlink(inode);
- inode->i_ctime = CURRENT_TIME;
-
err = btrfs_set_inode_index(dir, &index);
if (err)
goto fail;
@@ -4852,6 +4859,9 @@
goto fail;
}
+ btrfs_inc_nlink(inode);
+ inode->i_ctime = CURRENT_TIME;
+
btrfs_set_trans_block_group(trans, dir);
ihold(inode);
@@ -5221,7 +5231,7 @@
btrfs_mark_buffer_dirty(leaf);
}
set_extent_uptodate(io_tree, em->start,
- extent_map_end(em) - 1, GFP_NOFS);
+ extent_map_end(em) - 1, NULL, GFP_NOFS);
goto insert;
} else {
printk(KERN_ERR "btrfs unknown found_type %d\n", found_type);
@@ -5428,17 +5438,30 @@
}
static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
+ struct extent_map *em,
u64 start, u64 len)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_trans_handle *trans;
- struct extent_map *em;
struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
struct btrfs_key ins;
u64 alloc_hint;
int ret;
+ bool insert = false;
- btrfs_drop_extent_cache(inode, start, start + len - 1, 0);
+ /*
+ * Ok if the extent map we looked up is a hole and is for the exact
+ * range we want, there is no reason to allocate a new one, however if
+ * it is not right then we need to free this one and drop the cache for
+ * our range.
+ */
+ if (em->block_start != EXTENT_MAP_HOLE || em->start != start ||
+ em->len != len) {
+ free_extent_map(em);
+ em = NULL;
+ insert = true;
+ btrfs_drop_extent_cache(inode, start, start + len - 1, 0);
+ }
trans = btrfs_join_transaction(root, 0);
if (IS_ERR(trans))
@@ -5454,10 +5477,12 @@
goto out;
}
- em = alloc_extent_map(GFP_NOFS);
if (!em) {
- em = ERR_PTR(-ENOMEM);
- goto out;
+ em = alloc_extent_map(GFP_NOFS);
+ if (!em) {
+ em = ERR_PTR(-ENOMEM);
+ goto out;
+ }
}
em->start = start;
@@ -5467,9 +5492,15 @@
em->block_start = ins.objectid;
em->block_len = ins.offset;
em->bdev = root->fs_info->fs_devices->latest_bdev;
+
+ /*
+ * We need to do this because if we're using the original em we searched
+ * for, we could have EXTENT_FLAG_VACANCY set, and we don't want that.
+ */
+ em->flags = 0;
set_bit(EXTENT_FLAG_PINNED, &em->flags);
- while (1) {
+ while (insert) {
write_lock(&em_tree->lock);
ret = add_extent_mapping(em_tree, em);
write_unlock(&em_tree->lock);
@@ -5687,8 +5718,7 @@
* it above
*/
len = bh_result->b_size;
- free_extent_map(em);
- em = btrfs_new_extent_direct(inode, start, len);
+ em = btrfs_new_extent_direct(inode, em, start, len);
if (IS_ERR(em))
return PTR_ERR(em);
len = min(len, em->len - (start - em->start));
@@ -5851,8 +5881,10 @@
}
add_pending_csums(trans, inode, ordered->file_offset, &ordered->list);
- btrfs_ordered_update_i_size(inode, 0, ordered);
- btrfs_update_inode(trans, root, inode);
+ ret = btrfs_ordered_update_i_size(inode, 0, ordered);
+ if (!ret)
+ btrfs_update_inode(trans, root, inode);
+ ret = 0;
out_unlock:
unlock_extent_cached(&BTRFS_I(inode)->io_tree, ordered->file_offset,
ordered->file_offset + ordered->len - 1,
@@ -5938,7 +5970,7 @@
static inline int __btrfs_submit_dio_bio(struct bio *bio, struct inode *inode,
int rw, u64 file_offset, int skip_sum,
- u32 *csums)
+ u32 *csums, int async_submit)
{
int write = rw & REQ_WRITE;
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -5949,13 +5981,24 @@
if (ret)
goto err;
- if (write && !skip_sum) {
+ if (skip_sum)
+ goto map;
+
+ if (write && async_submit) {
ret = btrfs_wq_submit_bio(root->fs_info,
inode, rw, bio, 0, 0,
file_offset,
__btrfs_submit_bio_start_direct_io,
__btrfs_submit_bio_done);
goto err;
+ } else if (write) {
+ /*
+ * If we aren't doing async submit, calculate the csum of the
+ * bio now.
+ */
+ ret = btrfs_csum_one_bio(root, inode, bio, file_offset, 1);
+ if (ret)
+ goto err;
} else if (!skip_sum) {
ret = btrfs_lookup_bio_sums_dio(root, inode, bio,
file_offset, csums);
@@ -5963,7 +6006,8 @@
goto err;
}
- ret = btrfs_map_bio(root, rw, bio, 0, 1);
+map:
+ ret = btrfs_map_bio(root, rw, bio, 0, async_submit);
err:
bio_put(bio);
return ret;
@@ -5985,15 +6029,9 @@
int nr_pages = 0;
u32 *csums = dip->csums;
int ret = 0;
+ int async_submit = 0;
int write = rw & REQ_WRITE;
- bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev, start_sector, GFP_NOFS);
- if (!bio)
- return -ENOMEM;
- bio->bi_private = dip;
- bio->bi_end_io = btrfs_end_dio_bio;
- atomic_inc(&dip->pending_bios);
-
map_length = orig_bio->bi_size;
ret = btrfs_map_block(map_tree, READ, start_sector << 9,
&map_length, NULL, 0);
@@ -6002,6 +6040,19 @@
return -EIO;
}
+ if (map_length >= orig_bio->bi_size) {
+ bio = orig_bio;
+ goto submit;
+ }
+
+ async_submit = 1;
+ bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev, start_sector, GFP_NOFS);
+ if (!bio)
+ return -ENOMEM;
+ bio->bi_private = dip;
+ bio->bi_end_io = btrfs_end_dio_bio;
+ atomic_inc(&dip->pending_bios);
+
while (bvec <= (orig_bio->bi_io_vec + orig_bio->bi_vcnt - 1)) {
if (unlikely(map_length < submit_len + bvec->bv_len ||
bio_add_page(bio, bvec->bv_page, bvec->bv_len,
@@ -6015,7 +6066,7 @@
atomic_inc(&dip->pending_bios);
ret = __btrfs_submit_dio_bio(bio, inode, rw,
file_offset, skip_sum,
- csums);
+ csums, async_submit);
if (ret) {
bio_put(bio);
atomic_dec(&dip->pending_bios);
@@ -6052,8 +6103,9 @@
}
}
+submit:
ret = __btrfs_submit_dio_bio(bio, inode, rw, file_offset, skip_sum,
- csums);
+ csums, async_submit);
if (!ret)
return 0;
@@ -6148,6 +6200,7 @@
unsigned long nr_segs)
{
int seg;
+ int i;
size_t size;
unsigned long addr;
unsigned blocksize_mask = root->sectorsize - 1;
@@ -6162,8 +6215,22 @@
addr = (unsigned long)iov[seg].iov_base;
size = iov[seg].iov_len;
end += size;
- if ((addr & blocksize_mask) || (size & blocksize_mask))
+ if ((addr & blocksize_mask) || (size & blocksize_mask))
goto out;
+
+ /* If this is a write we don't need to check anymore */
+ if (rw & WRITE)
+ continue;
+
+ /*
+ * Check to make sure we don't have duplicate iov_base's in this
+ * iovec, if so return EINVAL, otherwise we'll get csum errors
+ * when reading back.
+ */
+ for (i = seg + 1; i < nr_segs; i++) {
+ if (iov[seg].iov_base == iov[i].iov_base)
+ goto out;
+ }
}
retval = 0;
out:
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cfc264f..ffb48d6c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2287,7 +2287,7 @@
struct btrfs_ioctl_space_info space;
struct btrfs_ioctl_space_info *dest;
struct btrfs_ioctl_space_info *dest_orig;
- struct btrfs_ioctl_space_info *user_dest;
+ struct btrfs_ioctl_space_info __user *user_dest;
struct btrfs_space_info *info;
u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
BTRFS_BLOCK_GROUP_SYSTEM,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 58e7de9..0ac712e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -159,7 +159,7 @@
Opt_compress_type, Opt_compress_force, Opt_compress_force_type,
Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
- Opt_enospc_debug, Opt_err,
+ Opt_enospc_debug, Opt_subvolrootid, Opt_err,
};
static match_table_t tokens = {
@@ -189,6 +189,7 @@
{Opt_clear_cache, "clear_cache"},
{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
{Opt_enospc_debug, "enospc_debug"},
+ {Opt_subvolrootid, "subvolrootid=%d"},
{Opt_err, NULL},
};
@@ -232,6 +233,7 @@
break;
case Opt_subvol:
case Opt_subvolid:
+ case Opt_subvolrootid:
case Opt_device:
/*
* These are parsed by btrfs_parse_early_options
@@ -388,7 +390,7 @@
*/
static int btrfs_parse_early_options(const char *options, fmode_t flags,
void *holder, char **subvol_name, u64 *subvol_objectid,
- struct btrfs_fs_devices **fs_devices)
+ u64 *subvol_rootid, struct btrfs_fs_devices **fs_devices)
{
substring_t args[MAX_OPT_ARGS];
char *opts, *orig, *p;
@@ -429,6 +431,18 @@
*subvol_objectid = intarg;
}
break;
+ case Opt_subvolrootid:
+ intarg = 0;
+ error = match_int(&args[0], &intarg);
+ if (!error) {
+ /* we want the original fs_tree */
+ if (!intarg)
+ *subvol_rootid =
+ BTRFS_FS_TREE_OBJECTID;
+ else
+ *subvol_rootid = intarg;
+ }
+ break;
case Opt_device:
error = btrfs_scan_one_device(match_strdup(&args[0]),
flags, holder, fs_devices);
@@ -736,6 +750,7 @@
fmode_t mode = FMODE_READ;
char *subvol_name = NULL;
u64 subvol_objectid = 0;
+ u64 subvol_rootid = 0;
int error = 0;
if (!(flags & MS_RDONLY))
@@ -743,7 +758,7 @@
error = btrfs_parse_early_options(data, mode, fs_type,
&subvol_name, &subvol_objectid,
- &fs_devices);
+ &subvol_rootid, &fs_devices);
if (error)
return ERR_PTR(error);
@@ -807,15 +822,17 @@
s->s_flags |= MS_ACTIVE;
}
- root = get_default_root(s, subvol_objectid);
- if (IS_ERR(root)) {
- error = PTR_ERR(root);
- deactivate_locked_super(s);
- goto error_free_subvol_name;
- }
/* if they gave us a subvolume name bind mount into that */
if (strcmp(subvol_name, ".")) {
struct dentry *new_root;
+
+ root = get_default_root(s, subvol_rootid);
+ if (IS_ERR(root)) {
+ error = PTR_ERR(root);
+ deactivate_locked_super(s);
+ goto error_free_subvol_name;
+ }
+
mutex_lock(&root->d_inode->i_mutex);
new_root = lookup_one_len(subvol_name, root,
strlen(subvol_name));
@@ -836,6 +853,13 @@
}
dput(root);
root = new_root;
+ } else {
+ root = get_default_root(s, subvol_objectid);
+ if (IS_ERR(root)) {
+ error = PTR_ERR(root);
+ deactivate_locked_super(s);
+ goto error_free_subvol_name;
+ }
}
kfree(subvol_name);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5b158da..c571734 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -32,10 +32,8 @@
static noinline void put_transaction(struct btrfs_transaction *transaction)
{
- WARN_ON(transaction->use_count == 0);
- transaction->use_count--;
- if (transaction->use_count == 0) {
- list_del_init(&transaction->list);
+ WARN_ON(atomic_read(&transaction->use_count) == 0);
+ if (atomic_dec_and_test(&transaction->use_count)) {
memset(transaction, 0, sizeof(*transaction));
kmem_cache_free(btrfs_transaction_cachep, transaction);
}
@@ -60,14 +58,14 @@
if (!cur_trans)
return -ENOMEM;
root->fs_info->generation++;
- cur_trans->num_writers = 1;
+ atomic_set(&cur_trans->num_writers, 1);
cur_trans->num_joined = 0;
cur_trans->transid = root->fs_info->generation;
init_waitqueue_head(&cur_trans->writer_wait);
init_waitqueue_head(&cur_trans->commit_wait);
cur_trans->in_commit = 0;
cur_trans->blocked = 0;
- cur_trans->use_count = 1;
+ atomic_set(&cur_trans->use_count, 1);
cur_trans->commit_done = 0;
cur_trans->start_time = get_seconds();
@@ -88,7 +86,7 @@
root->fs_info->running_transaction = cur_trans;
spin_unlock(&root->fs_info->new_trans_lock);
} else {
- cur_trans->num_writers++;
+ atomic_inc(&cur_trans->num_writers);
cur_trans->num_joined++;
}
@@ -145,7 +143,7 @@
cur_trans = root->fs_info->running_transaction;
if (cur_trans && cur_trans->blocked) {
DEFINE_WAIT(wait);
- cur_trans->use_count++;
+ atomic_inc(&cur_trans->use_count);
while (1) {
prepare_to_wait(&root->fs_info->transaction_wait, &wait,
TASK_UNINTERRUPTIBLE);
@@ -181,6 +179,7 @@
{
struct btrfs_trans_handle *h;
struct btrfs_transaction *cur_trans;
+ int retries = 0;
int ret;
if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR)
@@ -204,7 +203,7 @@
}
cur_trans = root->fs_info->running_transaction;
- cur_trans->use_count++;
+ atomic_inc(&cur_trans->use_count);
if (type != TRANS_JOIN_NOLOCK)
mutex_unlock(&root->fs_info->trans_mutex);
@@ -224,10 +223,18 @@
if (num_items > 0) {
ret = btrfs_trans_reserve_metadata(h, root, num_items);
- if (ret == -EAGAIN) {
+ if (ret == -EAGAIN && !retries) {
+ retries++;
btrfs_commit_transaction(h, root);
goto again;
+ } else if (ret == -EAGAIN) {
+ /*
+ * We have already retried and got EAGAIN, so really we
+ * don't have space, so set ret to -ENOSPC.
+ */
+ ret = -ENOSPC;
}
+
if (ret < 0) {
btrfs_end_transaction(h, root);
return ERR_PTR(ret);
@@ -327,7 +334,7 @@
goto out_unlock; /* nothing committing|committed */
}
- cur_trans->use_count++;
+ atomic_inc(&cur_trans->use_count);
mutex_unlock(&root->fs_info->trans_mutex);
wait_for_commit(root, cur_trans);
@@ -457,18 +464,14 @@
wake_up_process(info->transaction_kthread);
}
- if (lock)
- mutex_lock(&info->trans_mutex);
WARN_ON(cur_trans != info->running_transaction);
- WARN_ON(cur_trans->num_writers < 1);
- cur_trans->num_writers--;
+ WARN_ON(atomic_read(&cur_trans->num_writers) < 1);
+ atomic_dec(&cur_trans->num_writers);
smp_mb();
if (waitqueue_active(&cur_trans->writer_wait))
wake_up(&cur_trans->writer_wait);
put_transaction(cur_trans);
- if (lock)
- mutex_unlock(&info->trans_mutex);
if (current->journal_info == trans)
current->journal_info = NULL;
@@ -1178,7 +1181,7 @@
/* take transaction reference */
mutex_lock(&root->fs_info->trans_mutex);
cur_trans = trans->transaction;
- cur_trans->use_count++;
+ atomic_inc(&cur_trans->use_count);
mutex_unlock(&root->fs_info->trans_mutex);
btrfs_end_transaction(trans, root);
@@ -1237,7 +1240,7 @@
mutex_lock(&root->fs_info->trans_mutex);
if (cur_trans->in_commit) {
- cur_trans->use_count++;
+ atomic_inc(&cur_trans->use_count);
mutex_unlock(&root->fs_info->trans_mutex);
btrfs_end_transaction(trans, root);
@@ -1259,7 +1262,7 @@
prev_trans = list_entry(cur_trans->list.prev,
struct btrfs_transaction, list);
if (!prev_trans->commit_done) {
- prev_trans->use_count++;
+ atomic_inc(&prev_trans->use_count);
mutex_unlock(&root->fs_info->trans_mutex);
wait_for_commit(root, prev_trans);
@@ -1300,14 +1303,14 @@
TASK_UNINTERRUPTIBLE);
smp_mb();
- if (cur_trans->num_writers > 1)
+ if (atomic_read(&cur_trans->num_writers) > 1)
schedule_timeout(MAX_SCHEDULE_TIMEOUT);
else if (should_grow)
schedule_timeout(1);
mutex_lock(&root->fs_info->trans_mutex);
finish_wait(&cur_trans->writer_wait, &wait);
- } while (cur_trans->num_writers > 1 ||
+ } while (atomic_read(&cur_trans->num_writers) > 1 ||
(should_grow && cur_trans->num_joined != joined));
ret = create_pending_snapshots(trans, root->fs_info);
@@ -1394,6 +1397,7 @@
wake_up(&cur_trans->commit_wait);
+ list_del_init(&cur_trans->list);
put_transaction(cur_trans);
put_transaction(cur_trans);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 229a594..e441acc 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -27,11 +27,11 @@
* total writers in this transaction, it must be zero before the
* transaction can end
*/
- unsigned long num_writers;
+ atomic_t num_writers;
unsigned long num_joined;
int in_commit;
- int use_count;
+ atomic_t use_count;
int commit_done;
int blocked;
struct list_head list;
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index a5303b8..cfd6605 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -180,11 +180,10 @@
struct btrfs_path *path;
struct extent_buffer *leaf;
struct btrfs_dir_item *di;
- int ret = 0, slot, advance;
+ int ret = 0, slot;
size_t total_size = 0, size_left = size;
unsigned long name_ptr;
size_t name_len;
- u32 nritems;
/*
* ok we want all objects associated with this id.
@@ -204,34 +203,24 @@
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
if (ret < 0)
goto err;
- advance = 0;
+
while (1) {
leaf = path->nodes[0];
- nritems = btrfs_header_nritems(leaf);
slot = path->slots[0];
/* this is where we start walking through the path */
- if (advance || slot >= nritems) {
+ if (slot >= btrfs_header_nritems(leaf)) {
/*
* if we've reached the last slot in this leaf we need
* to go to the next leaf and reset everything
*/
- if (slot >= nritems-1) {
- ret = btrfs_next_leaf(root, path);
- if (ret)
- break;
- leaf = path->nodes[0];
- nritems = btrfs_header_nritems(leaf);
- slot = path->slots[0];
- } else {
- /*
- * just walking through the slots on this leaf
- */
- slot++;
- path->slots[0]++;
- }
+ ret = btrfs_next_leaf(root, path);
+ if (ret < 0)
+ goto err;
+ else if (ret > 0)
+ break;
+ continue;
}
- advance = 1;
btrfs_item_key_to_cpu(leaf, &found_key, slot);
@@ -250,7 +239,7 @@
/* we are just looking for how big our buffer needs to be */
if (!size)
- continue;
+ goto next;
if (!buffer || (name_len + 1) > size_left) {
ret = -ERANGE;
@@ -263,6 +252,8 @@
size_left -= name_len + 1;
buffer += name_len + 1;
+next:
+ path->slots[0]++;
}
ret = total_size;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index dd6628d..dfa5327 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3124,11 +3124,16 @@
/* for the /proc/ directory itself, after non-process stuff has been done */
int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
- unsigned int nr = filp->f_pos - FIRST_PROCESS_ENTRY;
- struct task_struct *reaper = get_proc_task(filp->f_path.dentry->d_inode);
+ unsigned int nr;
+ struct task_struct *reaper;
struct tgid_iter iter;
struct pid_namespace *ns;
+ if (filp->f_pos >= PID_MAX_LIMIT + TGID_OFFSET)
+ goto out_no_task;
+ nr = filp->f_pos - FIRST_PROCESS_ENTRY;
+
+ reaper = get_proc_task(filp->f_path.dentry->d_inode);
if (!reaper)
goto out_no_task;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ec0357d..cbbfd98 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -196,7 +196,6 @@
typedef int (make_request_fn) (struct request_queue *q, struct bio *bio);
typedef int (prep_rq_fn) (struct request_queue *, struct request *);
typedef void (unprep_rq_fn) (struct request_queue *, struct request *);
-typedef void (unplugged_fn) (struct request_queue *);
struct bio_vec;
struct bvec_merge_data {
@@ -284,7 +283,6 @@
rq_timed_out_fn *rq_timed_out_fn;
dma_drain_needed_fn *dma_drain_needed;
lld_busy_fn *lld_busy_fn;
- unplugged_fn *unplugged_fn;
/*
* Dispatch queue sorting
@@ -699,7 +697,7 @@
extern void blk_stop_queue(struct request_queue *q);
extern void blk_sync_queue(struct request_queue *q);
extern void __blk_stop_queue(struct request_queue *q);
-extern void __blk_run_queue(struct request_queue *q, bool force_kblockd);
+extern void __blk_run_queue(struct request_queue *q);
extern void blk_run_queue(struct request_queue *);
extern int blk_rq_map_user(struct request_queue *, struct request *,
struct rq_map_data *, void __user *, unsigned long,
@@ -843,7 +841,6 @@
extern void blk_queue_update_dma_alignment(struct request_queue *, int);
extern void blk_queue_softirq_done(struct request_queue *, softirq_done_fn *);
extern void blk_queue_rq_timed_out(struct request_queue *, rq_timed_out_fn *);
-extern void blk_queue_unplugged(struct request_queue *, unplugged_fn *);
extern void blk_queue_rq_timeout(struct request_queue *, unsigned int);
extern void blk_queue_flush(struct request_queue *q, unsigned int flush);
extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);
@@ -860,8 +857,13 @@
struct blk_plug {
unsigned long magic;
struct list_head list;
+ struct list_head cb_list;
unsigned int should_sort;
};
+struct blk_plug_cb {
+ struct list_head list;
+ void (*callback)(struct blk_plug_cb *);
+};
extern void blk_start_plug(struct blk_plug *);
extern void blk_finish_plug(struct blk_plug *);
@@ -887,7 +889,7 @@
{
struct blk_plug *plug = tsk->plug;
- return plug && !list_empty(&plug->list);
+ return plug && (!list_empty(&plug->list) || !list_empty(&plug->cb_list));
}
/*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index e276883..32a4423 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -197,7 +197,6 @@
struct dm_target_callbacks {
struct list_head list;
int (*congested_fn) (struct dm_target_callbacks *, int);
- void (*unplug_fn)(struct dm_target_callbacks *);
};
int dm_register_target(struct target_type *t);
diff --git a/include/linux/input.h b/include/linux/input.h
index f3a7794..771d6d8 100644
--- a/include/linux/input.h
+++ b/include/linux/input.h
@@ -167,6 +167,7 @@
#define SYN_REPORT 0
#define SYN_CONFIG 1
#define SYN_MT_REPORT 2
+#define SYN_DROPPED 3
/*
* Keys and buttons
@@ -553,8 +554,8 @@
#define KEY_DVD 0x185 /* Media Select DVD */
#define KEY_AUX 0x186
#define KEY_MP3 0x187
-#define KEY_AUDIO 0x188
-#define KEY_VIDEO 0x189
+#define KEY_AUDIO 0x188 /* AL Audio Browser */
+#define KEY_VIDEO 0x189 /* AL Movie Browser */
#define KEY_DIRECTORY 0x18a
#define KEY_LIST 0x18b
#define KEY_MEMO 0x18c /* Media Select Messages */
@@ -603,8 +604,9 @@
#define KEY_FRAMEFORWARD 0x1b5
#define KEY_CONTEXT_MENU 0x1b6 /* GenDesc - system context menu */
#define KEY_MEDIA_REPEAT 0x1b7 /* Consumer - transport control */
-#define KEY_10CHANNELSUP 0x1b8 /* 10 channels up (10+) */
-#define KEY_10CHANNELSDOWN 0x1b9 /* 10 channels down (10-) */
+#define KEY_10CHANNELSUP 0x1b8 /* 10 channels up (10+) */
+#define KEY_10CHANNELSDOWN 0x1b9 /* 10 channels down (10-) */
+#define KEY_IMAGES 0x1ba /* AL Image Browser */
#define KEY_DEL_EOL 0x1c0
#define KEY_DEL_EOS 0x1c1
diff --git a/include/linux/input/mt.h b/include/linux/input/mt.h
index b3ac06a..318bb82 100644
--- a/include/linux/input/mt.h
+++ b/include/linux/input/mt.h
@@ -48,6 +48,12 @@
input_event(dev, EV_ABS, ABS_MT_SLOT, slot);
}
+static inline bool input_is_mt_axis(int axis)
+{
+ return axis == ABS_MT_SLOT ||
+ (axis >= ABS_MT_FIRST && axis <= ABS_MT_LAST);
+}
+
void input_mt_report_slot_state(struct input_dev *dev,
unsigned int tool_type, bool active);
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 31afb7e..cdced84 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -117,7 +117,7 @@
*/
extern struct pid *find_get_pid(int nr);
extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
-int next_pidmap(struct pid_namespace *pid_ns, int last);
+int next_pidmap(struct pid_namespace *pid_ns, unsigned int last);
extern struct pid *alloc_pid(struct pid_namespace *ns);
extern void free_pid(struct pid *pid);
diff --git a/kernel/pid.c b/kernel/pid.c
index 02f2212..57a8346 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -217,11 +217,14 @@
return -1;
}
-int next_pidmap(struct pid_namespace *pid_ns, int last)
+int next_pidmap(struct pid_namespace *pid_ns, unsigned int last)
{
int offset;
struct pidmap *map, *end;
+ if (last >= PID_MAX_LIMIT)
+ return -1;
+
offset = (last + 1) & BITS_PER_PAGE_MASK;
map = &pid_ns->pidmap[(last + 1)/BITS_PER_PAGE];
end = &pid_ns->pidmap[PIDMAP_ENTRIES];