Commit Graph

29965 Commits

Author SHA1 Message Date
Jiri Olsa
489c75c3b3 ftrace, perf: Add add/del tracepoint perf registration actions
Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Link: http://lkml.kernel.org/r/1329317514-8131-4-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21 11:08:25 -05:00
Jiri Olsa
ceec0b6fc7 ftrace, perf: Add open/close tracepoint perf registration actions
Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the event.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Link: http://lkml.kernel.org/r/1329317514-8131-3-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21 11:08:24 -05:00
Jiri Olsa
e248491ac2 ftrace: Add enable/disable ftrace_ops control interface
Adding a way to temporarily enable/disable ftrace_ops. The change
follows the same way as 'global' ftrace_ops are done.

Introducing 2 global ftrace_ops - control_ops and ftrace_control_list
which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL
flag. In addition new per cpu flag called 'disabled' is also added to
ftrace_ops to provide the control information for each cpu.

When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is
set as disabled for all cpus.

The ftrace_control_list contains all the registered 'control' ftrace_ops.
The control_ops provides function which iterates ftrace_control_list
and does the check for 'disabled' flag on current cpu.

Adding 3 inline functions:
  ftrace_function_local_disable/ftrace_function_local_enable
  - enable/disable the ftrace_ops on current cpu
  ftrace_function_local_disabled
  - get disabled ftrace_ops::disabled value for current cpu

Link: http://lkml.kernel.org/r/1329317514-8131-2-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21 11:08:23 -05:00
Joerg Willmann
88ba136d66 netfilter: ebtables: fix alignment problem in ppc
ebt_among extension of ebtables uses __alignof__(_xt_align) while the
corresponding kernel module uses __alignof__(ebt_replace) to determine
the alignment in EBT_ALIGN().

These are the results of these values on different platforms:

x86 x86_64 ppc
__alignof__(_xt_align) 4 8 8
__alignof__(ebt_replace) 4 8 4

ebtables fails to add rules which use the among extension.

I'm using kernel 2.6.33 and ebtables 2.0.10-4

According to Bart De Schuymer, userspace alignment was changed to
_xt_align to fix an alignment issue on a userspace32-kernel64 system
(he thinks it was for an ARM device). So userspace must be right.
The kernel alignment macro needs to change so it also uses _xt_align
instead of ebt_replace. The userspace changes date back from
June 29, 2009.

Signed-off-by: Joerg Willmann <joe@clnt.de>
Signed-off by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-21 13:29:06 +01:00
Stephen Boyd
2475143444 regulator: Remove ifdefs for debugfs code
If CONFIG_DEBUG_FS=y debugfs functions will never return an
ERR_PTR. Instead they'll return NULL. The intent is to remove
ifdefs in calling code.

Update the code to reflect this. We gain an extra dentry pointer
per struct regulator and struct regulator_dev but that should be
ok because most distros have debugfs compiled in anyway.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-02-21 09:56:51 +00:00
Linus Torvalds
8ebbfb4957 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Assorted fixes, sat in -next for a week or so...

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
  vfs: fix compat_sys_stat() handling of overflows in st_nlink
  quota: Fix deadlock with suspend and quotas
  vfs: Provide function to get superblock and wait for it to thaw
  vfs: fix panic in __d_lookup() with high dentry hashtable counts
  autofs4 - fix lockdep splat in autofs
  vfs: fix d_inode_lookup() dentry ref leak
2012-02-20 16:13:58 -08:00
Yongqiang Yang
0c2022eccb jbd2: allocate transaction from separate slab cache
There is normally only a handful of these active at any one time, but
putting them in a separate slab cache makes debugging memory
corruption problems easier.  Manish Katiyar also wanted this make it
easier to test memory failure scenarios in the jbd2 layer.

Cc: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-02-20 17:53:02 -05:00
Mark Brown
aca1e172a1 Merge branch 'topic/patch' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap into regmap-drivers 2012-02-20 21:21:33 +00:00
Mark Brown
3bf06a1ad9 Merge branch 'topic/devm' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap into regmap-drivers 2012-02-20 21:21:25 +00:00
Mark Brown
a6539c3294 regmap: Allow users to query the size of register values
Generic infrastructure based on top of regmap may want to operate on
blocks of data and therefore find it useful to find the size of the
register values. Provide an accessor operation for this.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-02-20 21:17:08 +00:00
H. Peter Anvin
ff88943a14 aio: Use __kernel_ulong_t to define aio_context_t
Rather than using "unsigned long" which is ABI-dependent, use
__kernel_ulong_t to define the externally visible type aio_context_t.

Note: the change in this form will cause unsigned long/unsigned int
differences on existing ABIs.  If that is unacceptable we may have to
define a new type.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
2012-02-20 12:48:48 -08:00
H. Peter Anvin
6684ba202b compat: Add helper functions to read/write struct timeval, timespec
Add helper functions to read and write struct timeval and struct
timespec from userspace.  We already had helper functions for reading
and writing struct compat_timespec; add a set of functions to do the
same with struct timeval, and add a second suite of functions which
can be sensitive to COMPAT_USE_64BIT_TIME and access either 32- or
64-bit time structures.

This also exports these helper functions to modules.

Rename the existing inlines for converting between struct
compat_timeval and native struct timespec so we can have a saner
naming convention for the exported functions.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-02-20 12:48:47 -08:00
H. J. Lu
45e8778129 compat: Introduce COMPAT_USE_64BIT_TIME
Allow a compatibility ABI to use a 64-bit time_t and 64-bit members in
struct timeval and struct timespec to avoid the Y2038 problem.

This will be used for the x32 ABI.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-02-20 12:48:47 -08:00
H. Peter Anvin
109a1f32d0 sysinfo: Use explicit types in <linux/sysinfo.h>
Change <linux/sysinfo.h> to use explicitly sized types.  Replace
long/unsigned long with __kernel_[u]long_t so that a non-legacy 32-bit
ABI running on a 64-bit kernel can export those as 64-bit types.

Originally-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-02-20 12:48:47 -08:00
H. Peter Anvin
d8e5ddef21 sysinfo: Move struct sysinfo to a separate header file
struct sysinfo is just about the only thing exported to userspace from
<linux/kernel.h>, so move it into a separate header file with a
residual #include in <linux/kernel.h>.

Originally-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-4pr1xnnksprt7t0h3w5fw4rv@git.kernel.org
2012-02-20 12:48:46 -08:00
Dmitry Kasatkin
59cca653a6 digsig: changed type of the timestamp
time_t was used in the signature and key packet headers,
which is typedef of long and is different on 32 and 64 bit architectures.
Signature and key format should be independent of architecture.
Similar to GPG, I have changed the type to uint32_t.

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-20 19:46:36 +11:00
Mark Brown
fa2c8f4017 Merge tag 'v3.3-rc4' into for-3.4 in order to resolve the conflict
resolved below within the FSI driver and allow the application of the
dmaeengine conversion that depends on this resolution.

Linux 3.3-rc4

Conflicts:
	sound/soc/sh/fsi.c
2012-02-19 18:35:12 -08:00
David S. Miller
32efe08d77 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c

Small minor conflict in bnx2x, wherein one commit changed how
statistics were stored in software, and another commit
fixed endianness bugs wrt. reading the values provided by
the chip in memory.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-19 16:03:15 -05:00
Dan Williams
81c757bc69 [SCSI] libsas: execute transport link resets with libata-eh via host workqueue
Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata.  Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).

Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds.  They are not
prepared for it to loop back into eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:13:51 -06:00
David Howells
cf420048b3 Delete the __FD_*() funcs for operating on fd_set from linux/time.h
Delete the __FD_*() functions for operating on fd_set structs from
linux/time.h as they're no longer used within the kernel with the preceding
patch and are not exported to userspace.

Whilst linux/time.h *does* export the FD_*() equivalents as wrappers around
__FD_*(), userspace provides its own definition of __FD_*().

Note that the definition of FD_ZERO() in linux/time.h may not be used with the
fd_sets associated with struct fdtable as the fd_set may have been allocated in
a truncated fashion.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216175006.23314.18984.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2012-02-19 10:32:28 -08:00
David Howells
1fd36adcd9 Replace the fd_sets in struct fdtable with an array of unsigned longs
Replace the fd_sets in struct fdtable with an array of unsigned longs and then
use the standard non-atomic bit operations rather than the FD_* macros.

This:

 (1) Removes the abuses of struct fd_set:

     (a) Since we don't want to allocate a full fd_set the vast majority of the
     	 time, we actually, in effect, just allocate a just-big-enough array of
     	 unsigned longs and cast it to an fd_set type - so why bother with the
     	 fd_set at all?

     (b) Some places outside of the core fdtable handling code (such as
     	 SELinux) want to look inside the array of unsigned longs hidden inside
     	 the fd_set struct for more efficient iteration over the entire set.

 (2) Eliminates the use of FD_*() macros in the kernel completely.

 (3) Permits the __FD_*() macros to be deleted entirely where not exposed to
     userspace.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2012-02-19 10:30:57 -08:00
David Howells
1dce27c5aa Wrap accesses to the fd_sets in struct fdtable
Wrap accesses to the fd_sets in struct fdtable (for recording open files and
close-on-exec flags) so that we can move away from using fd_sets since we
abuse the fd_set structs by not allocating the full-sized structure under
normal circumstances and by non-core code looking at the internals of the
fd_sets.

The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
since that cannot be told about their abnormal lengths.

This introduces six wrapper functions for setting, clearing and testing
close-on-exec flags and fd-is-open flags:

	void __set_close_on_exec(int fd, struct fdtable *fdt);
	void __clear_close_on_exec(int fd, struct fdtable *fdt);
	bool close_on_exec(int fd, const struct fdtable *fdt);
	void __set_open_fd(int fd, struct fdtable *fdt);
	void __clear_open_fd(int fd, struct fdtable *fdt);
	bool fd_is_open(int fd, const struct fdtable *fdt);

Note that I've prepended '__' to the names of the set/clear functions because
they require the caller to hold a lock to use them.

Note also that I haven't added wrappers for looking behind the scenes at the
the array.  Possibly that should exist too.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2012-02-19 10:30:52 -08:00
Paolo Bonzini
4fe74b1cb0 [SCSI] virtio-scsi: SCSI driver for QEMU based virtual machines
The virtio-scsi HBA is the basis of an alternative storage stack
for QEMU-based virtual machines (including KVM).  Compared to
virtio-blk it is more scalable, because it supports many LUNs
on a single PCI slot), more powerful (it more easily supports
passthrough of host devices to the guest) and more easily
extensible (new SCSI features implemented by QEMU should not
require updating the driver in the guest).

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 09:50:20 -06:00
Linus Torvalds
a18d3afefa Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha512 - use standard ror64()
2012-02-18 15:24:05 -08:00
Russell King
3323761677 MFD: ucb1x00-core: add wakeup support
Add genirq wakeup support for the ucb1x00 device.  This allows an
attached gpio_keys driver to wakeup the system.  Touchscreen is also
possible.

When there are no wakeup sources, ask the platform to assert the reset
signal to avoid any unexpected behaviour; this also puts the reset
signal at the right level when power is removed from the device.

Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-18 23:15:43 +00:00
Russell King
a3364409c4 MFD: ucb1x00: convert to use genirq
Convert the ucb1x00 driver to use genirq's interrupt services, rather
than its own private implementation.  This allows a wider range of
drivers to use the GPIO interrupts (such as the gpio_keys driver)
without being aware of the UCB1x00's private IRQ system.

This prevents the UCB1x00 core driver from being built as a module,
so adjust the configuration to add that restriction.

Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-18 23:15:43 +00:00
Russell King
cf4abfcc0d MFD: mcp-core: remove legacy driver suspend/resume methods
The legacy driver suspend/resume methods are no longer used, so get rid
of them.

Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-18 23:15:42 +00:00
Russell King
5a09b7120a MFD: ucb1x00-core: convert to use dev_pm_ops
Convert the ucb1x00-core driver to use dev_pm_ops rather than the legacy
members in the mcp driver.

Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-18 23:15:42 +00:00
Russell King
cae154767a MFD: ucb1x00-core: use mutexes instead of semaphores
Convert the ucb1x00 driver to use mutexes rather than the depreciated
semaphores for exclusive access to the ADC.

Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-18 23:15:32 +00:00
Russell King
2f7510c607 MFD: ucb1x00-core: add handling for ucb1x00 reset
Provide a way to handle the software controlled ucb1x00 reset signal
from the ucb1x00-core driver without having to code platform specifics
into these drivers.

Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-18 23:15:32 +00:00
Russell King
abe06082d0 MFD: mcp/ucb1x00: separate ucb1x00 driver data from the MCP data
Patch taken from 5dd7bf59e0 (ARM: sa11x0: Implement autoloading of codec
and codec pdata for mcp bus.) by Jochen Friedrich <jochen@scram.de>.

This adds just the codec data part of the patch.

Acked-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-02-18 23:15:30 +00:00
Alex Shi
8028dcea8a slub: per cpu partial statistics change
This patch split the cpu_partial_free into 2 parts: cpu_partial_node, PCP refilling
times from node partial; and same name cpu_partial_free, PCP refilling times in
slab_free slow path. A new statistic 'cpu_partial_drain' is added to get PCP
drain to node partial times. These info are useful when do PCP tunning.

The slabinfo.c code is unchanged, since cpu_partial_node is not on slow path.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-02-18 11:00:09 +02:00
Benny Halevy
d24433cdc9 nfsd41: implement NFS4_SHARE_WANT_NO_DELEG, NFS4_OPEN_DELEGATE_NONE_EXT, why_no_deleg
Respect client request for not getting a delegation in NFSv4.1
Appropriately return delegation "type" NFS4_OPEN_DELEGATE_NONE_EXT
and WND4_NOT_WANTED reason.

[nfsd41: add missing break when encoding op_why_no_deleg]
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17 18:38:53 -05:00
Tom Tucker
cec56c8ff5 svcrdma: Cleanup sparse warnings in the svcrdma module
The svcrdma transport was un-marshalling requests in-place. This resulted
in sparse warnings due to __beXX data containing both NBO and HBO data.

The code has been restructured to do byte-swapping as the header is
parsed instead of when the header is validated immediately after receipt.

Also moved extern declarations for the workqueue and memory pools to the
private header file.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-02-17 18:38:50 -05:00
Mark Brown
9cde5fcd43 regmap: Add stub regmap calls as a build crutch for infrastructure users
Make life easier for subsystems which build infrastructure on top of the
regmap API by providing stub definitions for most of the API so that users
can at least build even if the regmap API is not enabled without having to
add ifdefs if they don't want to.

These stubs are not expected to be useful and should never actually get
called, they just exist so that we can link so there are WARN_ONCE()s in
the stubs.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-02-17 14:51:15 -08:00
Rafael J. Wysocki
c48825251c PM: Add comment describing relationships between PM callbacks to pm.h
The UNIVERSAL_DEV_PM_OPS() macro is slightly misleading, because it
may suggest that it's a good idea to point runtime PM callback
pointers to the same routines as system suspend/resume callbacks
.suspend() and .resume(), which is not the case.  For this reason,
add a comment to include/linux/pm.h, next to the definition of
UNIVERSAL_DEV_PM_OPS(), describing how device PM callbacks are
related to each other.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-02-17 23:36:30 +01:00
Rafael J. Wysocki
bc25cf5089 PM / Sleep: Drop suspend_stats_update()
Since suspend_stats_update() is only called from pm_suspend(),
move its code directly into that function and remove the static
inline definition from include/linux/suspend.h.  Clean_up
pm_suspend() in the process.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
2012-02-17 23:36:23 +01:00
Weston Andros Adamson
0a70219523 NFS: include filelayout DS rpc stats in mountstats
Include RPC statistics from all data servers in /proc/self/mountstats for pNFS
filelayout mounts.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 13:39:47 -05:00
Ingo Molnar
09bda4432a Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core 2012-02-17 12:55:07 +01:00
Ingo Molnar
7b2d81d48a uprobes/core: Clean up, refactor and improve the code
Make the uprobes code readable to me:

 - improve the Kconfig text so that a mere mortal gets some idea
   what CONFIG_UPROBES=y is really about

 - do trivial renames to standardize around the uprobes_*() namespace

 - clean up and simplify various code flow details

 - separate basic blocks of functionality

 - line break artifact and white space related removal

 - use standard local varible definition blocks

 - use vertical spacing to make things more readable

 - remove unnecessary volatile

 - restructure comment blocks to make them more uniform and
   more readable in general

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Link: http://lkml.kernel.org/n/tip-ewbwhb8o6navvllsauu7k07p@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-17 10:18:07 +01:00
Srikar Dronamraju
2b14449835 uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints
Add uprobes support to the core kernel, with x86 support.

This commit adds the kernel facilities, the actual uprobes
user-space ABI and perf probe support comes in later commits.

General design:

Uprobes are maintained in an rb-tree indexed by inode and offset
(the offset here is from the start of the mapping). For a unique
(inode, offset) tuple, there can be at most one uprobe in the
rb-tree.

Since the (inode, offset) tuple identifies a unique uprobe, more
than one user may be interested in the same uprobe. This provides
the ability to connect multiple 'consumers' to the same uprobe.

Each consumer defines a handler and a filter (optional). The
'handler' is run every time the uprobe is hit, if it matches the
'filter' criteria.

The first consumer of a uprobe causes the breakpoint to be
inserted at the specified address and subsequent consumers are
appended to this list.  On subsequent probes, the consumer gets
appended to the existing list of consumers. The breakpoint is
removed when the last consumer unregisters. For all other
unregisterations, the consumer is removed from the list of
consumers.

Given a inode, we get a list of the mms that have mapped the
inode. Do the actual registration if mm maps the page where a
probe needs to be inserted/removed.

We use a temporary list to walk through the vmas that map the
inode.

- The number of maps that map the inode, is not known before we
  walk the rmap and keeps changing.
- extending vm_area_struct wasn't recommended, it's a
  size-critical data structure.
- There can be more than one maps of the inode in the same mm.

We add callbacks to the mmap methods to keep an eye on text vmas
that are of interest to uprobes.  When a vma of interest is mapped,
we insert the breakpoint at the right address.

Uprobe works by replacing the instruction at the address defined
by (inode, offset) with the arch specific breakpoint
instruction. We save a copy of the original instruction at the
uprobed address.

This is needed for:

 a. executing the instruction out-of-line (xol).
 b. instruction analysis for any subsequent fixups.
 c. restoring the instruction back when the uprobe is unregistered.

We insert or delete a breakpoint instruction, and this
breakpoint instruction is assumed to be the smallest instruction
available on the platform. For fixed size instruction platforms
this is trivially true, for variable size instruction platforms
the breakpoint instruction is typically the smallest (often a
single byte).

Writing the instruction is done by COWing the page and changing
the instruction during the copy, this even though most platforms
allow atomic writes of the breakpoint instruction. This also
mirrors the behaviour of a ptrace() memory write to a PRIVATE
file map.

The core worker is derived from KSM's replace_page() logic.

In essence, similar to KSM:

 a. allocate a new page and copy over contents of the page that
    has the uprobed vaddr
 b. modify the copy and insert the breakpoint at the required
    address
 c. switch the original page with the copy containing the
    breakpoint
 d. flush page tables.

replace_page() is being replicated here because of some minor
changes in the type of pages and also because Hugh Dickins had
plans to improve replace_page() for KSM specific work.

Instruction analysis on x86 is based on instruction decoder and
determines if an instruction can be probed and determines the
necessary fixups after singlestep.  Instruction analysis is done
at probe insertion time so that we avoid having to repeat the
same analysis every time a probe is hit.

A lot of code here is due to the improvement/suggestions/inputs
from Peter Zijlstra.

Changelog:

(v10):
 - Add code to clear REX.B prefix as suggested by Denys Vlasenko
   and Masami Hiramatsu.

(v9):
 - Use insn_offset_modrm as suggested by Masami Hiramatsu.

(v7):

 Handle comments from Peter Zijlstra:

 - Dont take reference to inode. (expect inode to uprobe_register to be sane).
 - Use PTR_ERR to set the return value.
 - No need to take reference to inode.
 - use PTR_ERR to return error value.
 - register and uprobe_unregister share code.

(v5):

 - Modified del_consumer as per comments from Peter.
 - Drop reference to inode before dropping reference to uprobe.
 - Use i_size_read(inode) instead of inode->i_size.
 - Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
 - Includes errno.h as recommended by Stephen Rothwell to fix a build issue
   on sparc defconfig
 - Remove restrictions while unregistering.
 - Earlier code leaked inode references under some conditions while
   registering/unregistering.
 - Continue the vma-rmap walk even if the intermediate vma doesnt
   meet the requirements.
 - Validate the vma found by find_vma before inserting/removing the
   breakpoint
 - Call del_consumer under mutex_lock.
 - Use hash locks.
 - Handle mremap.
 - Introduce find_least_offset_node() instead of close match logic in
   find_uprobe
 - Uprobes no more depends on MM_OWNER; No reference to task_structs
   while inserting/removing a probe.
 - Uses read_mapping_page instead of grab_cache_page so that the pages
   have valid content.
 - pass NULL to get_user_pages for the task parameter.
 - call SetPageUptodate on the new page allocated in write_opcode.
 - fix leaking a reference to the new page under certain conditions.
 - Include Instruction Decoder if Uprobes gets defined.
 - Remove const attributes for instruction prefix arrays.
 - Uses mm_context to know if the application is 32 bit.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Also-written-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
[ Made various small edits to the commit log ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-17 10:00:01 +01:00
Kim, Milo
fde297bb4d regulator: fix wrong header name in description
The 'mode' is defined in consumer.h.

* patch base version : linux-3.2.4

Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-02-16 23:08:32 -08:00
Yoshihiro Shimoda
150647fb2c net: sh_eth: change the condition of initialization
The SH7757 has 2 Fast Ethernet and 2 Gigabit Ethernet, and the first
Gigabit channel needs the initialization. So, this patch adds the
parameter of "needs_init", and if the sh_eth_plat_data is set it
to 1, the driver will initialize the channel.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-16 17:08:09 -05:00
Chuck Lever
dbb9c2a22d SUNRPC: Use KERN_DEFAULT for debugging printk's
Our dprintk() debugging facility doesn't specify any verbosity level
for it's printk() calls, but it should.

The default verbosity for printk's is KERN_DEFAULT.  You might argue
that these are debugging printk's and thus the verbosity should be
KERN_DEBUG.  That would mean that to see NFS and SUNRPC debugging
output an admin would also have to boost the syslog verbosity, which
would be insufferably noisy.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-16 14:55:30 -05:00
Andy Adamson
15a4520621 SUNRPC: add sending,pending queue and max slot to xprt stats
With static RPC slots, the xprt backlog queue stats were useful in showing
when the transport (TCP) was starved by lack of RPC slots. The new dynamic
RPC slot code, commit d9ba131d8f, always
provides an RPC slot and so only uses the xprt backlog queue when the
tcp_max_slot_table_entries value has been hit or when an allocation error
occurs. All requests are now placed on the xprt sending or pending queue which
need to be monitored for debugging.

The max_slot stat shows the maximum number of dynamic RPC slots reached which is
useful when debugging performance issues.

Add the new fields at the end of the mountstats xprt stanza so that mountstats
outputs the previous correct values and ignores the new fields. Bump
NFS_IOSTATS_VERS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-16 14:55:27 -05:00
Grant Likely
a18dc81bf5 irq_domain: constify irq_domain_ops
Make irq_domain_ops pointer a constant to make it safer for multiple
instances to share the same ops pointer and change the irq_domain code
so that it does not modify the ops.

v4: Fix mismatched type reference in powerpc code

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:24 -07:00
Grant Likely
16b2e6e2f3 irq_domain: Create common xlate functions that device drivers can use
Rather than having each interrupt controller driver creating its own barely
unique .xlate function for irq_domain, create a library of translators which
any driver can use directly.

v5: - Remove irq_domain_xlate_pci().  It was incorrect.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:23 -07:00
Grant Likely
6b783f7c5d irq_domain: Remove irq_domain_add_simple()
irq_domain_add_simple() was a stop-gap measure until complete irq_domain
support was complete.  This patch removes the irq_domain_add_simple()
interface.

This patch also drops the explicit irq_domain initialization performed
by the mach-versatile code because the versatile interrupt controller
already has irq_domain support built into it.  This was a bug that was
hanging around quietly for a while, but with the full irq_domain which
actually verifies that irq_domain ranges are available it would cause
the registration to fail and the system wouldn't boot.

v4: Fixed number of irqs in mx5 gpio code
v2: Updated to pass in host_data pointer on irq_domain allocation.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Cc: Russell King <linux@arm.linux.org.uk>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:23 -07:00
Grant Likely
75294957be irq_domain: Remove 'new' irq_domain in favour of the ppc one
This patch removes the simplistic implementation of irq_domains and enables
the powerpc infrastructure for all irq_domain users.  The powerpc
infrastructure includes support for complex mappings between Linux and
hardware irq numbers, and can manage allocation of irq_descs.

This patch also converts the few users of irq_domain_add()/irq_domain_del()
to call irq_domain_add_legacy() instead.

v3: Fix bug that set up too many irqs in translation range.
v2: Fix removal of irq_alloc_descs() call in gic driver

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>
Tested-by: Olof Johansson <olof@lixom.net>
2012-02-16 06:11:23 -07:00
Grant Likely
a850a75544 of/address: add empty static inlines for !CONFIG_OF
As the title says, this patch adds empty implementations for the address
translation functions so that they can be used when CONFIG_OF is disabled.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
2012-02-16 06:11:23 -07:00