89052e706600df609299bbf6e33297da705e97c8
48186 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c3083eff1b |
libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
[ Upstream commit 5c498950f730aa17c5f8a2cdcb903524e4002ed2 ] Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
f46a46a9de |
gpio: Fix build error of function redefinition
[ Upstream commit 68e03b85474a51ec1921b4d13204782594ef7223 ]
when do randbuilding, I got this error:
In file included from drivers/hwmon/pmbus/ucd9000.c:19:0:
./include/linux/gpio/driver.h:576:1: error: redefinition of gpiochip_add_pin_range
gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name,
^~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/hwmon/pmbus/ucd9000.c:18:0:
./include/linux/gpio.h:245:1: note: previous definition of gpiochip_add_pin_range was here
gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name,
^~~~~~~~~~~~~~~~~~~~~~
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes:
|
||
|
|
66f8c5ff8e |
inet: switch IP ID generator to siphash
commit df453700e8d81b1bdafdf684365ee2b9431fb702 upstream. According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers. Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky. It is time to switch to siphash and its 128bit keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
71b951c85b |
siphash: implement HalfSipHash1-3 for hash tables
commit 1ae2324f732c9c4e2fa4ebd885fa1001b70d52e1 upstream. HalfSipHash, or hsiphash, is a shortened version of SipHash, which generates 32-bit outputs using a weaker 64-bit key. It has *much* lower security margins, and shouldn't be used for anything too sensitive, but it could be used as a hashtable key function replacement, if the output is never exposed, and if the security requirement is not too high. The goal is to make this something that performance-critical jhash users would be willing to use. On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias SipHash1-3 to HalfSipHash1-3 on those systems. 64-bit x86_64: [ 0.509409] test_siphash: SipHash2-4 cycles: 4049181 [ 0.510650] test_siphash: SipHash1-3 cycles: 2512884 [ 0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920 [ 0.512904] test_siphash: JenkinsHash cycles: 978267 So, we map hsiphash() -> SipHash1-3 32-bit x86: [ 0.509868] test_siphash: SipHash2-4 cycles: 14812892 [ 0.513601] test_siphash: SipHash1-3 cycles: 9510710 [ 0.515263] test_siphash: HalfSipHash1-3 cycles: 3856157 [ 0.515952] test_siphash: JenkinsHash cycles: 1148567 So, we map hsiphash() -> HalfSipHash1-3 hsiphash() is roughly 3 times slower than jhash(), but comes with a considerable security improvement. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.4 to avoid regression for WireGuard with only half the siphash API present] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
994fcca7f1 |
siphash: add cryptographically secure PRF
commit 2c956a60778cbb6a27e0c7a8a52a91378c90e1d1 upstream. SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function, or as a general PRF for short input use cases, such as sequence numbers or RNG chaining. For the first usage: There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Currently hashtables use jhash, which is fast but not secure, and some kind of rotating key scheme (or none at all, which isn't good). SipHash is meant as a replacement for jhash in these cases. There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate. While SipHash is extremely fast for a cryptographically secure function, it is likely a bit slower than the insecure jhash, and so replacements will be evaluated on a case-by-case basis based on whether or not the difference in speed is negligible and whether or not the current jhash usage poses a real security risk. For the second usage: A few places in the kernel are using MD5 or SHA1 for creating secure sequence numbers, syn cookies, port numbers, or fast random numbers. SipHash is a faster and more fitting, and more secure replacement for MD5 in those situations. Replacing MD5 and SHA1 with SipHash for these uses is obvious and straight-forward, and so is submitted along with this patch series. There shouldn't be much of a debate over its efficacy. Dozens of languages are already using this internally for their hash tables and PRFs. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known set of problems, and it's time we catch-up. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.4 as dependency of commits df453700e8d8 "inet: switch IP ID generator to siphash" and 3c79107631db "netfilter: ctnetlink: don't use conntrack/expect object addresses as id": - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
170051d60c |
include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
[ Upstream commit a6e60d84989fa0e91db7f236eda40453b0e44afa ] The upcoming GCC 9 release extends the -Wmissing-attributes warnings (enabled by -Wall) to C and aliases: it warns when particular function attributes are missing in the aliases but not in their target. In particular, it triggers for all the init/cleanup_module aliases in the kernel (defined by the module_init/exit macros), ending up being very noisy. These aliases point to the __init/__exit functions of a module, which are defined as __cold (among other attributes). However, the aliases themselves do not have the __cold attribute. Since the compiler behaves differently when compiling a __cold function as well as when compiling paths leading to calls to __cold functions, the warning is trying to point out the possibly-forgotten attribute in the alias. In order to keep the warning enabled, we decided to silence this case. Ideally, we would mark the aliases directly as __init/__exit. However, there are currently around 132 modules in the kernel which are missing __init/__exit in their init/cleanup functions (either because they are missing, or for other reasons, e.g. the functions being called from somewhere else); and a section mismatch is a hard error. A conservative alternative was to mark the aliases as __cold only. However, since we would like to eventually enforce __init/__exit to be always marked, we chose to use the new __copy function attribute (introduced by GCC 9 as well to deal with this). With it, we copy the attributes used by the target functions into the aliases. This way, functions that were not marked as __init/__exit won't have their aliases marked either, and therefore there won't be a section mismatch. Note that the warning would go away marking either the extern declaration, the definition, or both. However, we only mark the definition of the alias, since we do not want callers (which only see the declaration) to be compiled as if the function was __cold (and therefore the paths leading to those calls would be assumed to be unlikely). Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/ Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/ Suggested-by: Martin Sebor <msebor@gcc.gnu.org> Acked-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
edc966de87 |
Backport minimal compiler_attributes.h to support GCC 9
This adds support for __copy to v4.9.y so that we can use it in init/exit_module to avoid -Werror=missing-attributes errors on GCC 9. Link: https://lore.kernel.org/lkml/259986242.BvXPX32bHu@devpool35/ Cc: <stable@vger.kernel.org> Suggested-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
bf60393dbd |
compat_ioctl: pppoe: fix PPPOEIOCSFWD handling
[ Upstream commit 055d88242a6046a1ceac3167290f054c72571cd9 ]
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
linux-2.5.69 along with hundreds of other commands, but was always broken
sincen only the structure is compatible, but the command number is not,
due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
sockaddr_pppox)), which is different on 64-bit architectures.
Guillaume Nault adds:
And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe:
fix reference counting in PPPoE proxy")), and nobody ever noticed. I
should probably have removed this ioctl entirely instead of fixing it.
Clearly, it has never been used.
Fix it by adding a compat_ioctl handler for all pppoe variants that
translates the command number and then calls the regular ioctl function.
All other ioctl commands handled by pppoe are compatible between 32-bit
and 64-bit, and require compat_ptr() conversion.
This should apply to all stable kernels.
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
df8a9e5c9a |
uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers
[ Upstream commit f90fb3c7e2c13ae829db2274b88b845a75038b8a ] Only users of upc_req in kernel side fs/coda/psdev.c and fs/coda/upcall.c already include linux/coda_psdev.h. Suggested by Jan Harkes <jaharkes@cs.cmu.edu> in https://lore.kernel.org/lkml/20150531111913.GA23377@cs.cmu.edu/ Fixes these include/uapi/linux/coda_psdev.h compilation errors in userspace: linux/coda_psdev.h:12:19: error: field `uc_chain' has incomplete type struct list_head uc_chain; ^ linux/coda_psdev.h:13:2: error: unknown type name `caddr_t' caddr_t uc_data; ^ linux/coda_psdev.h:14:2: error: unknown type name `u_short' u_short uc_flags; ^ linux/coda_psdev.h:15:2: error: unknown type name `u_short' u_short uc_inSize; /* Size is at most 5000 bytes */ ^ linux/coda_psdev.h:16:2: error: unknown type name `u_short' u_short uc_outSize; ^ linux/coda_psdev.h:17:2: error: unknown type name `u_short' u_short uc_opcode; /* copied from data to save lookup */ ^ linux/coda_psdev.h:19:2: error: unknown type name `wait_queue_head_t' wait_queue_head_t uc_sleep; /* process' wait queue */ ^ Link: http://lkml.kernel.org/r/9f99f5ce6a0563d5266e6cf7aa9585aac2cae971.1558117389.git.jaharkes@cs.cmu.edu Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Zhouyang Jia <jiazhouyang09@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
6465186ac8 |
coda: fix build using bare-metal toolchain
[ Upstream commit b2a57e334086602be56b74958d9f29b955cd157f ] The kernel is self-contained project and can be built with bare-metal toolchain. But bare-metal toolchain doesn't define __linux__. Because of this u_quad_t type is not defined when using bare-metal toolchain and codafs build fails. This patch fixes it by defining u_quad_t type unconditionally. Link: http://lkml.kernel.org/r/3cbb40b0a57b6f9923a9d67b53473c0b691a3eaa.1558117389.git.jaharkes@cs.cmu.edu Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Zhouyang Jia <jiazhouyang09@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
9a8b327b7e |
ACPI: fix false-positive -Wuninitialized warning
[ Upstream commit dfd6f9ad36368b8dbd5f5a2b2f0a4705ae69a323 ]
clang gets confused by an uninitialized variable in what looks
to it like a never executed code path:
arch/x86/kernel/acpi/boot.c:618:13: error: variable 'polarity' is uninitialized when used here [-Werror,-Wuninitialized]
polarity = polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH;
^~~~~~~~
arch/x86/kernel/acpi/boot.c:606:32: note: initialize the variable 'polarity' to silence this warning
int rc, irq, trigger, polarity;
^
= 0
arch/x86/kernel/acpi/boot.c:617:12: error: variable 'trigger' is uninitialized when used here [-Werror,-Wuninitialized]
trigger = trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE;
^~~~~~~
arch/x86/kernel/acpi/boot.c:606:22: note: initialize the variable 'trigger' to silence this warning
int rc, irq, trigger, polarity;
^
= 0
This is unfortunately a design decision in clang and won't be fixed.
Changing the acpi_get_override_irq() macro to an inline function
reliably avoids the issue.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
da358f365d |
sched/fair: Don't free p->numa_faults with concurrent readers
commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes:
|
||
|
|
204b14581f |
access: avoid the RCU grace period for the temporary subjective credentials
commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream. It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU work because it installs a temporary credential that gets allocated and freed for each system call. The allocation and freeing overhead is mostly benign, but because credentials can be accessed under the RCU read lock, the freeing involves a RCU grace period. Which is not a huge deal normally, but if you have a lot of access() calls, this causes a fair amount of seconday damage: instead of having a nice alloc/free patterns that hits in hot per-CPU slab caches, you have all those delayed free's, and on big machines with hundreds of cores, the RCU overhead can end up being enormous. But it turns out that all of this is entirely unnecessary. Exactly because access() only installs the credential as the thread-local subjective credential, the temporary cred pointer doesn't actually need to be RCU free'd at all. Once we're done using it, we can just free it synchronously and avoid all the RCU overhead. So add a 'non_rcu' flag to 'struct cred', which can be set by users that know they only use it in non-RCU context (there are other potential users for this). We can make it a union with the rcu freeing list head that we need for the RCU case, so this doesn't need any extra storage. Note that this also makes 'get_current_cred()' clear the new non_rcu flag, in case we have filesystems that take a long-term reference to the cred and then expect the RCU delayed freeing afterwards. It's not entirely clear that this is required, but it makes for clear semantics: the subjective cred remains non-RCU as long as you only access it synchronously using the thread-local accessors, but you _can_ use it as a generic cred if you want to. It is possible that we should just remove the whole RCU markings for ->cred entirely. Only ->real_cred is really supposed to be accessed through RCU, and the long-term cred copies that nfs uses might want to explicitly re-enable RCU freeing if required, rather than have get_current_cred() do it implicitly. But this is a "minimal semantic changes" change for the immediate problem. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Glauber <jglauber@marvell.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
c5b430eeae |
elevator: fix truncation of icq_cache_name
commit 9bd2bbc01d17ddd567cc0f81f77fe1163e497462 upstream.
gcc 7.1 reports the following warning:
block/elevator.c: In function ‘elv_register’:
block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
"%s_io_cq", e->elevator_name);
^~~~~~~~~~
block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s_io_cq", e->elevator_name);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The bug is that the name of the icq_cache is 6 characters longer than
the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
for it --- so in the case of a maximum-length elevator name, the 'q'
character in "_io_cq" would be truncated by snprintf(). Fix it by
reserving ELV_NAME_MAX + 6 characters instead.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
edf2ce9a7a |
rcu: Force inlining of rcu_read_lock()
[ Upstream commit 6da9f775175e516fc7229ceaa9b54f8f56aa7924 ] When debugging options are turned on, the rcu_read_lock() function might not be inlined. This results in lockdep's print_lock() function printing "rcu_read_lock+0x0/0x70" instead of rcu_read_lock()'s caller. For example: [ 10.579995] ============================= [ 10.584033] WARNING: suspicious RCU usage [ 10.588074] 4.18.0.memcg_v2+ #1 Not tainted [ 10.593162] ----------------------------- [ 10.597203] include/linux/rcupdate.h:281 Illegal context switch in RCU read-side critical section! [ 10.606220] [ 10.606220] other info that might help us debug this: [ 10.606220] [ 10.614280] [ 10.614280] rcu_scheduler_active = 2, debug_locks = 1 [ 10.620853] 3 locks held by systemd/1: [ 10.624632] #0: (____ptrval____) (&type->i_mutex_dir_key#5){.+.+}, at: lookup_slow+0x42/0x70 [ 10.633232] #1: (____ptrval____) (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x70 [ 10.640954] #2: (____ptrval____) (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x70 These "rcu_read_lock+0x0/0x70" strings are not providing any useful information. This commit therefore forces inlining of the rcu_read_lock() function so that rcu_read_lock()'s caller is instead shown. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
31861f83bf |
VMCI: Fix integer overflow in VMCI handle arrays
commit 1c2eb5b2853c9f513690ba6b71072d8eb65da16a upstream. The VMCI handle array has an integer overflow in vmci_handle_arr_append_entry when it tries to expand the array. This can be triggered from a guest, since the doorbell link hypercall doesn't impose a limit on the number of doorbell handles that a VM can create in the hypervisor, and these handles are stored in a handle array. In this change, we introduce a mandatory max capacity for handle arrays/lists to avoid excessive memory usage. Signed-off-by: Vishnu Dasa <vdasa@vmware.com> Reviewed-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2fbaa1af06 |
bug.h: work around GCC PR82365 in BUG()
[ Upstream commit 173a3efd3edb2ef6ef07471397c5f542a360e9c1 ] Looking at functions with large stack frames across all architectures led me discovering that BUG() suffers from the same problem as fortify_panic(), which I've added a workaround for already. In short, variables that go out of scope by calling a noreturn function or __builtin_unreachable() keep using stack space in functions afterwards. A workaround that was identified is to insert an empty assembler statement just before calling the function that doesn't return. I'm adding a macro "barrier_before_unreachable()" to document this, and insert calls to that in all instances of BUG() that currently suffer from this problem. The files that saw the largest change from this had these frame sizes before, and much less with my patch: fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=] drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=] In case of ARC and CRIS, it turns out that the BUG() implementation actually does return (or at least the compiler thinks it does), resulting in lots of warnings about uninitialized variable use and leaving noreturn functions, such as: block/cfq-iosched.c: In function 'cfq_async_queue_prio': block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type] include/linux/dmaengine.h: In function 'dma_maxpq': include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type] This makes them call __builtin_trap() instead, which should normally dump the stack and kill the current process, like some of the other architectures already do. I tried adding barrier_before_unreachable() to panic() and fortify_panic() as well, but that had very little effect, so I'm not submitting that patch. Vineet said: : For ARC, it is double win. : : 1. Fixes 3 -Wreturn-type warnings : : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of : non-void function [-Wreturn-type] : : 2. bloat-o-meter reports code size improvements as gcc elides the : generated code for stack return. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christopher Li <sparse@chrisli.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> [ removed cris changes - gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2f8180ff33 |
swiotlb: Make linux/swiotlb.h standalone includible
[ Upstream commit 386744425e35e04984c6e741c7750fd6eef1a9df ] This header file uses the enum dma_data_direction and struct page types without explicitly including the corresponding header files. This makes it rely on the includer to have included the proper headers before. To fix this, include linux/dma-direction.h and forward-declare struct page. The swiotlb_free() function is also annotated __init, therefore requires linux/init.h to be included as well. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
8f6345a11c |
coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes:
|
||
|
|
91f1fc1ae4 |
cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
commit 18fa84a2db0e15b02baa5d94bdb5bd509175d2f6 upstream.
A PF_EXITING task can stay associated with an offline css. If such
task calls task_get_css(), it can get stuck indefinitely. This can be
triggered by BSD process accounting which writes to a file with
PF_EXITING set when racing against memcg disable as in the backtrace
at the end.
After this change, task_get_css() may return a css which was already
offline when the function was called. None of the existing users are
affected by this change.
INFO: rcu_sched self-detected stall on CPU
INFO: rcu_sched detected stalls on CPUs/tasks:
...
NMI backtrace for cpu 0
...
Call Trace:
<IRQ>
dump_stack+0x46/0x68
nmi_cpu_backtrace.cold.2+0x13/0x57
nmi_trigger_cpumask_backtrace+0xba/0xca
rcu_dump_cpu_stacks+0x9e/0xce
rcu_check_callbacks.cold.74+0x2af/0x433
update_process_times+0x28/0x60
tick_sched_timer+0x34/0x70
__hrtimer_run_queues+0xee/0x250
hrtimer_interrupt+0xf4/0x210
smp_apic_timer_interrupt+0x56/0x110
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:balance_dirty_pages_ratelimited+0x28f/0x3d0
...
btrfs_file_write_iter+0x31b/0x563
__vfs_write+0xfa/0x140
__kernel_write+0x4f/0x100
do_acct_process+0x495/0x580
acct_process+0xb9/0xdb
do_exit+0x748/0xa00
do_group_exit+0x3a/0xa0
get_signal+0x254/0x560
do_signal+0x23/0x5c0
exit_to_usermode_loop+0x5d/0xa0
prepare_exit_to_usermode+0x53/0x80
retint_user+0x8/0x8
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v4.2+
Fixes:
|
||
|
|
5767587907 |
pwm: Fix deadlock warning when removing PWM device
[ Upstream commit 347ab9480313737c0f1aaa08e8f2e1a791235535 ]
This patch fixes deadlock warning if removing PWM device
when CONFIG_PROVE_LOCKING is enabled.
This issue can be reproceduced by the following steps on
the R-Car H3 Salvator-X board if the backlight is disabled:
# cd /sys/class/pwm/pwmchip0
# echo 0 > export
# ls
device export npwm power pwm0 subsystem uevent unexport
# cd device/driver
# ls
bind e6e31000.pwm uevent unbind
# echo e6e31000.pwm > unbind
[ 87.659974] ======================================================
[ 87.666149] WARNING: possible circular locking dependency detected
[ 87.672327] 5.0.0 #7 Not tainted
[ 87.675549] ------------------------------------------------------
[ 87.681723] bash/2986 is trying to acquire lock:
[ 87.686337] 000000005ea0e178 (kn->count#58){++++}, at: kernfs_remove_by_name_ns+0x50/0xa0
[ 87.694528]
[ 87.694528] but task is already holding lock:
[ 87.700353] 000000006313b17c (pwm_lock){+.+.}, at: pwmchip_remove+0x28/0x13c
[ 87.707405]
[ 87.707405] which lock already depends on the new lock.
[ 87.707405]
[ 87.715574]
[ 87.715574] the existing dependency chain (in reverse order) is:
[ 87.723048]
[ 87.723048] -> #1 (pwm_lock){+.+.}:
[ 87.728017] __mutex_lock+0x70/0x7e4
[ 87.732108] mutex_lock_nested+0x1c/0x24
[ 87.736547] pwm_request_from_chip.part.6+0x34/0x74
[ 87.741940] pwm_request_from_chip+0x20/0x40
[ 87.746725] export_store+0x6c/0x1f4
[ 87.750820] dev_attr_store+0x18/0x28
[ 87.754998] sysfs_kf_write+0x54/0x64
[ 87.759175] kernfs_fop_write+0xe4/0x1e8
[ 87.763615] __vfs_write+0x40/0x184
[ 87.767619] vfs_write+0xa8/0x19c
[ 87.771448] ksys_write+0x58/0xbc
[ 87.775278] __arm64_sys_write+0x18/0x20
[ 87.779721] el0_svc_common+0xd0/0x124
[ 87.783986] el0_svc_compat_handler+0x1c/0x24
[ 87.788858] el0_svc_compat+0x8/0x18
[ 87.792947]
[ 87.792947] -> #0 (kn->count#58){++++}:
[ 87.798260] lock_acquire+0xc4/0x22c
[ 87.802353] __kernfs_remove+0x258/0x2c4
[ 87.806790] kernfs_remove_by_name_ns+0x50/0xa0
[ 87.811836] remove_files.isra.1+0x38/0x78
[ 87.816447] sysfs_remove_group+0x48/0x98
[ 87.820971] sysfs_remove_groups+0x34/0x4c
[ 87.825583] device_remove_attrs+0x6c/0x7c
[ 87.830197] device_del+0x11c/0x33c
[ 87.834201] device_unregister+0x14/0x2c
[ 87.838638] pwmchip_sysfs_unexport+0x40/0x4c
[ 87.843509] pwmchip_remove+0xf4/0x13c
[ 87.847773] rcar_pwm_remove+0x28/0x34
[ 87.852039] platform_drv_remove+0x24/0x64
[ 87.856651] device_release_driver_internal+0x18c/0x21c
[ 87.862391] device_release_driver+0x14/0x1c
[ 87.867175] unbind_store+0xe0/0x124
[ 87.871265] drv_attr_store+0x20/0x30
[ 87.875442] sysfs_kf_write+0x54/0x64
[ 87.879618] kernfs_fop_write+0xe4/0x1e8
[ 87.884055] __vfs_write+0x40/0x184
[ 87.888057] vfs_write+0xa8/0x19c
[ 87.891887] ksys_write+0x58/0xbc
[ 87.895716] __arm64_sys_write+0x18/0x20
[ 87.900154] el0_svc_common+0xd0/0x124
[ 87.904417] el0_svc_compat_handler+0x1c/0x24
[ 87.909289] el0_svc_compat+0x8/0x18
[ 87.913378]
[ 87.913378] other info that might help us debug this:
[ 87.913378]
[ 87.921374] Possible unsafe locking scenario:
[ 87.921374]
[ 87.927286] CPU0 CPU1
[ 87.931808] ---- ----
[ 87.936331] lock(pwm_lock);
[ 87.939293] lock(kn->count#58);
[ 87.945120] lock(pwm_lock);
[ 87.950599] lock(kn->count#58);
[ 87.953908]
[ 87.953908] *** DEADLOCK ***
[ 87.953908]
[ 87.959821] 4 locks held by bash/2986:
[ 87.963563] #0: 00000000ace7bc30 (sb_writers#6){.+.+}, at: vfs_write+0x188/0x19c
[ 87.971044] #1: 00000000287991b2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xb4/0x1e8
[ 87.978872] #2: 00000000f739d016 (&dev->mutex){....}, at: device_release_driver_internal+0x40/0x21c
[ 87.988001] #3: 000000006313b17c (pwm_lock){+.+.}, at: pwmchip_remove+0x28/0x13c
[ 87.995481]
[ 87.995481] stack backtrace:
[ 87.999836] CPU: 0 PID: 2986 Comm: bash Not tainted 5.0.0 #7
[ 88.005489] Hardware name: Renesas Salvator-X board based on r8a7795 ES1.x (DT)
[ 88.012791] Call trace:
[ 88.015235] dump_backtrace+0x0/0x190
[ 88.018891] show_stack+0x14/0x1c
[ 88.022204] dump_stack+0xb0/0xec
[ 88.025514] print_circular_bug.isra.32+0x1d0/0x2e0
[ 88.030385] __lock_acquire+0x1318/0x1864
[ 88.034388] lock_acquire+0xc4/0x22c
[ 88.037958] __kernfs_remove+0x258/0x2c4
[ 88.041874] kernfs_remove_by_name_ns+0x50/0xa0
[ 88.046398] remove_files.isra.1+0x38/0x78
[ 88.050487] sysfs_remove_group+0x48/0x98
[ 88.054490] sysfs_remove_groups+0x34/0x4c
[ 88.058580] device_remove_attrs+0x6c/0x7c
[ 88.062671] device_del+0x11c/0x33c
[ 88.066154] device_unregister+0x14/0x2c
[ 88.070070] pwmchip_sysfs_unexport+0x40/0x4c
[ 88.074421] pwmchip_remove+0xf4/0x13c
[ 88.078163] rcar_pwm_remove+0x28/0x34
[ 88.081906] platform_drv_remove+0x24/0x64
[ 88.085996] device_release_driver_internal+0x18c/0x21c
[ 88.091215] device_release_driver+0x14/0x1c
[ 88.095478] unbind_store+0xe0/0x124
[ 88.099048] drv_attr_store+0x20/0x30
[ 88.102704] sysfs_kf_write+0x54/0x64
[ 88.106359] kernfs_fop_write+0xe4/0x1e8
[ 88.110275] __vfs_write+0x40/0x184
[ 88.113757] vfs_write+0xa8/0x19c
[ 88.117065] ksys_write+0x58/0xbc
[ 88.120374] __arm64_sys_write+0x18/0x20
[ 88.124291] el0_svc_common+0xd0/0x124
[ 88.128034] el0_svc_compat_handler+0x1c/0x24
[ 88.132384] el0_svc_compat+0x8/0x18
The sysfs unexport in pwmchip_remove() is completely asymmetric
to what we do in pwmchip_add_with_polarity() and commit 0733424c9ba9
("pwm: Unexport children before chip removal") is a strong indication
that this was wrong to begin with. We should just move
pwmchip_sysfs_unexport() where it belongs, which is right after
pwmchip_sysfs_unexport_children(). In that case, we do not need
separate functions anymore either.
We also really want to remove sysfs irrespective of whether or not
the chip will be removed as a result of pwmchip_remove(). We can only
assume that the driver will be gone after that, so we shouldn't leave
any dangling sysfs files around.
This warning disappears if we move pwmchip_sysfs_unexport() to
the top of pwmchip_remove(), pwmchip_sysfs_unexport_children().
That way it is also outside of the pwm_lock section, which indeed
doesn't seem to be needed.
Moving the pwmchip_sysfs_export() call outside of that section also
seems fine and it'd be perfectly symmetric with pwmchip_remove() again.
So, this patch fixes them.
Signed-off-by: Phong Hoang <phong.hoang.wz@renesas.com>
[shimoda: revise the commit log and code]
Fixes:
|
||
|
|
4657ee0fe0 |
tcp: limit payload size of sacked skbs
commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream.
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :
BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48
An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.
This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.
Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.
CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Backport notes, provided by Joao Martins <joao.m.martins@oracle.com>
v4.15 or since commit 737ff314563 ("tcp: use sequence distance to
detect reordering") had switched from the packet-based FACK tracking and
switched to sequence-based.
v4.14 and older still have the old logic and hence on
tcp_skb_shift_data() needs to retain its original logic and have
@fack_count in sync. In other words, we keep the increment of pcount with
tcp_skb_pcount(skb) to later used that to update fack_count. To make it
more explicit we track the new skb that gets incremented to pcount in
@next_pcount, and we get to avoid the constant invocation of
tcp_skb_pcount(skb) all together.
Fixes:
|
||
|
|
3bf0c45961 |
fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream. Commit |
||
|
|
f0d1e74c81 |
rcu: locking and unlocking need to always be at least barriers
commit 66be4e66a7f422128748e3c3ef6ee72b20a6197b upstream. Herbert Xu pointed out that commit |
||
|
|
2b13a9580e |
usb: gadget: fix request length error for isoc transfer
commit 982555fc26f9d8bcdbd5f9db0378fe0682eb4188 upstream. For isoc endpoint descriptor, the wMaxPacketSize is not real max packet size (see Table 9-13. Standard Endpoint Descriptor, USB 2.0 specifcation), it may contain the number of packet, so the real max packet should be ep->desc->wMaxPacketSize && 0x7ff. Cc: Felipe F. Tonello <eu@felipetonello.com> Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Fixes: 16b114a6d797 ("usb: gadget: fix usb_ep_align_maybe endianness and new usb_ep_aligna") Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
6ad730b831 |
userfaultfd: don't pin the user memory in userfaultfd_file_create()
commit d2005e3f41d4f9299e2df6a967c8beb5086967a9 upstream. userfaultfd_file_create() increments mm->mm_users; this means that the memory won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can populate the orphaned mm more. Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed when we are going to actually play with this memory. Except handle_userfault() path doesn't need this, the caller must already have a reference. The patch adds the new trivial helper, mmget_not_zero(), it can have more users. Link: http://lkml.kernel.org/r/20160516172254.GA8595@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
a33b6d4c8b |
net: create skb_gso_validate_mac_len()
commit 2b16f048729bf35e6c28a40cbfad07239f9dcd90 upstream. If you take a GSO skb, and split it into packets, will the MAC length (L2 + L3 + L4 headers + payload) of those packets be small enough to fit within a given length? Move skb_gso_mac_seglen() to skbuff.h with other related functions like skb_gso_network_seglen() so we can use it, and then create skb_gso_validate_mac_len to do the full calculation. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.4: There is no GSO_BY_FRAGS case to handle, so skb_gso_validate_mac_len() becomes a trivial comparison. Put it inline in <linux/skbuff.h>.] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
7a47d18731 |
memcg: make it work on sparse non-0-node systems
commit 3e8589963773a5c23e2f1fe4bcad0e9a90b7f471 upstream.
We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100
It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.
The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.
The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.
So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.
Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes:
|
||
|
|
ec70e2c130 |
include/linux/bitops.h: sanitize rotate primitives
commit ef4d6f6b275c498f8e5626c99dbeefdc5027f843 upstream.
The ror32 implementation (word >> shift) | (word << (32 - shift) has
undefined behaviour if shift is outside the [1, 31] range. Similarly
for the 64 bit variants. Most callers pass a compile-time constant
(naturally in that range), but there's an UBSAN report that these may
actually be called with a shift count of 0.
Instead of special-casing that, we can make them DTRT for all values of
shift while also avoiding UB. For some reason, this was already partly
done for rol32 (which was well-defined for [0, 31]). gcc 8 recognizes
these patterns as rotates, so for example
__u32 rol32(__u32 word, unsigned int shift)
{
return (word << (shift & 31)) | (word >> ((-shift) & 31));
}
compiles to
0000000000000020 <rol32>:
20: 89 f8 mov %edi,%eax
22: 89 f1 mov %esi,%ecx
24: d3 c0 rol %cl,%eax
26: c3 retq
Older compilers unfortunately do not do as well, but this only affects
the small minority of users that don't pass constants.
Due to integer promotions, ro[lr]8 were already well-defined for shifts
in [0, 8], and ro[lr]16 were mostly well-defined for shifts in [0, 16]
(only mostly - u16 gets promoted to _signed_ int, so if bit 15 is set,
word << 16 is undefined). For consistency, update those as well.
Link: http://lkml.kernel.org/r/20190410211906.2190-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Vadim Pasternak <vadimp@mellanox.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
5db3c5adf4 |
HID: core: move Usage Page concatenation to Main item
[ Upstream commit 58e75155009cc800005629955d3482f36a1e0eec ] As seen on some USB wireless keyboards manufactured by Primax, the HID parser was using some assumptions that are not always true. In this case it's s the fact that, inside the scope of a main item, an Usage Page will always precede an Usage. The spec is not pretty clear as 6.2.2.7 states "Any usage that follows is interpreted as a Usage ID and concatenated with the Usage Page". While 6.2.2.8 states "When the parser encounters a main item it concatenates the last declared Usage Page with a Usage to form a complete usage value." Being somewhat contradictory it was decided to match Window's implementation, which follows 6.2.2.8. In summary, the patch moves the Usage Page concatenation from the local item parsing function to the main item parsing function. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Terry Junge <terry.junge@poly.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
a86d061794 |
iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
[ Upstream commit df1d80aee963480c5c2938c64ec0ac3e4a0df2e0 ] For devices from the SigmaDelta family we need to keep CS low when doing a conversion, since the device will use the MISO line as a interrupt to indicate that the conversion is complete. This is why the driver locks the SPI bus and when the SPI bus is locked keeps as long as a conversion is going on. The current implementation gets one small detail wrong though. CS is only de-asserted after the SPI bus is unlocked. This means it is possible for a different SPI device on the same bus to send a message which would be wrongfully be addressed to the SigmaDelta device as well. Make sure that the last SPI transfer that is done while holding the SPI bus lock de-asserts the CS signal. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Alexandru Ardelean <Alexandru.Ardelean@analog.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
b059848e11 |
smpboot: Place the __percpu annotation correctly
[ Upstream commit d4645d30b50d1691c26ff0f8fa4e718b08f8d3bb ]
The test robot reported a wrong assignment of a per-CPU variable which
it detected by using sparse and sent a report. The assignment itself is
correct. The annotation for sparse was wrong and hence the report.
The first pointer is a "normal" pointer and points to the per-CPU memory
area. That means that the __percpu annotation has to be moved.
Move the __percpu annotation to pointer which points to the per-CPU
area. This change affects only the sparse tool (and is ignored by the
compiler).
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
|
||
|
|
bf8474c648 |
hugetlb: use same fault hash key for shared and private mappings
commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream. hugetlb uses a fault mutex hash table to prevent page faults of the same pages concurrently. The key for shared and private mappings is different. Shared keys off address_space and file index. Private keys off mm and virtual address. Consider a private mappings of a populated hugetlbfs file. A fault will map the page from the file and if needed do a COW to map a writable page. Hugetlbfs hole punch uses the fault mutex to prevent mappings of file pages. It uses the address_space file index key. However, private mappings will use a different key and could race with this code to map the file page. This causes problems (BUG) for the page cache remove code as it expects the page to be unmapped. A sample stack is: page dumped because: VM_BUG_ON_PAGE(page_mapped(page)) kernel BUG at mm/filemap.c:169! ... RIP: 0010:unaccount_page_cache_page+0x1b8/0x200 ... Call Trace: __delete_from_page_cache+0x39/0x220 delete_from_page_cache+0x45/0x70 remove_inode_hugepages+0x13c/0x380 ? __add_to_page_cache_locked+0x162/0x380 hugetlbfs_fallocate+0x403/0x540 ? _cond_resched+0x15/0x30 ? __inode_security_revalidate+0x5d/0x70 ? selinux_file_permission+0x100/0x130 vfs_fallocate+0x13f/0x270 ksys_fallocate+0x3c/0x80 __x64_sys_fallocate+0x1a/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 There seems to be another potential COW issue/race with this approach of different private and shared keys as noted in commit |
||
|
|
bd020b3317 |
bio: fix improper use of smp_mb__before_atomic()
commit f381c6a4bd0ae0fde2d6340f1b9bb0f58d915de6 upstream.
This barrier only applies to the read-modify-write operations; in
particular, it does not apply to the atomic_set() primitive.
Replace the barrier with an smp_mb().
Fixes:
|
||
|
|
b8ab0c4eff |
of: fix clang -Wunsequenced for be32_to_cpu()
commit 440868661f36071886ed360d91de83bd67c73b4f upstream.
Now, make the loop explicit to avoid clang warning.
./include/linux/of.h:238:37: warning: multiple unsequenced modifications
to 'cell' [-Wunsequenced]
r = (r << 32) | be32_to_cpu(*(cell++));
^~
./include/linux/byteorder/generic.h:95:21: note: expanded from macro
'be32_to_cpu'
^
./include/uapi/linux/byteorder/little_endian.h:40:59: note: expanded
from macro '__be32_to_cpu'
^
./include/uapi/linux/swab.h:118:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
./include/uapi/linux/swab.h:18:12: note: expanded from macro
'___constant_swab32'
(((__u32)(x) & (__u32)0x000000ffUL) << 24) | \
^
Signed-off-by: Phong Tran <tranmanphong@gmail.com>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/460
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
[robh: fix up whitespace]
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
bfce20eaf1 |
writeback: synchronize sync(2) against cgroup writeback membership switches
commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 upstream. sync_inodes_sb() can race against cgwb (cgroup writeback) membership switches and fail to writeback some inodes. For example, if an inode switches to another wb while sync_inodes_sb() is in progress, the new wb might not be visible to bdi_split_work_to_wbs() at all or the inode might jump from a wb which hasn't issued writebacks yet to one which already has. This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb switch path against sync_inodes_sb() so that sync_inodes_sb() is guaranteed to see all the target wbs and inodes can't jump wbs to escape syncing. v2: Fixed misplaced rwsem init. Spotted by Jiufei. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jiufei Xue <xuejiufei@gmail.com> Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
592a36c59f |
mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
commit 6b4814a9451add06d457e198be418bf6a3e6a990 upstream. Mismatch between what is found in the Datasheets for DA9063 and DA9063L provided by Dialog Semiconductor, and the register names provided in the MFD registers file. The changes are for the OTP (one-time-programming) control registers. The two naming errors are OPT instead of OTP, and COUNT instead of CONT (i.e. control). Cc: Stable <stable@vger.kernel.org> Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
31a2c5f7a2 |
cpu/speculation: Add 'mitigations=' cmdline option
commit 98af8452945c55652de68536afdde3b520fec429 upstream. Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability. Most users fall into a few basic categories: a) they want all mitigations off; b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or c) they want all reasonable mitigations on, with SMT disabled if vulnerable. Define a set of curated, arch-independent options, each of which is an aggregation of existing options: - mitigations=off: Disable all mitigations. - mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable. - mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation. Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com [bwh: Backported to 4.4: - Drop the auto,nosmt option which we can't support - Adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
3fb41b4e2d |
x86/speculation/mds: Add sysfs reporting for MDS
commit 8a4b06d391b0a42a373808979b5028f5c84d9c6a upstream. Add the sysfs reporting file for MDS. It exposes the vulnerability and mitigation state similar to the existing files for the other speculative hardware vulnerabilities. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jon Masters <jcm@redhat.com> Tested-by: Jon Masters <jcm@redhat.com> [bwh: Backported to 4.4: - Test x86_hyper instead of using hypervisor_is_type() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
ff99c966c6 |
x86/speculation: Add prctl() control for indirect branch speculation
commit 9137bb27e60e554dab694eafa4cca241fa3a694f upstream. Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of indirect branch speculation via STIBP and IBPB. Invocations: Check indirect branch speculation status with - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0); Enable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0); Disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0); Force disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0); See Documentation/userspace-api/spec_ctrl.rst. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de [bwh: Backported to 4.4: - Renumber the PFA flags - Drop changes in tools/include/uapi/linux/prctl.h - Adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
892d9881b4 |
x86/speculation: Rework SMT state change
commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream. arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline. To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled. Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de [bwh: Backported to 4.4: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
1f562beba7 |
sched: Add sched_smt_active()
Add the sched_smt_active() function needed for some x86 speculation
mitigations. This was introduced upstream by commits 1b568f0aabf2
"sched/core: Optimize SCHED_SMT", ba2591a5993e "sched/smt: Update
sched_smt_present at runtime", c5511d03ec09 "sched/smt: Make
sched_smt_present track topology", and 321a874a7ef8 "sched/smt: Expose
sched_smt_present static key". The upstream implementation uses the
static_key_{disable,enable}_cpuslocked() functions, which aren't
practical to backport.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
4a215a1155 |
x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream.
Currently, IBPB is only issued in cases when switching into a non-dumpable
process, the rationale being to protect such 'important and security
sensitive' processess (such as GPG) from data leaking into a different
userspace process via spectre v2.
This is however completely insufficient to provide proper userspace-to-userpace
spectrev2 protection, as any process can poison branch buffers before being
scheduled out, and the newly scheduled process immediately becomes spectrev2
victim.
In order to minimize the performance impact (for usecases that do require
spectrev2 protection), issue the barrier only in cases when switching between
processess where the victim can't be ptraced by the potential attacker (as in
such cases, the attacker doesn't have to bother with branch buffers at all).
[ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
fine-grained ]
Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "SchauflerCasey" <casey.schaufler@intel.com>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
8d1385ea4c |
locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file
commit 8bd9cb51daac89337295b6f037b0486911e1b408 upstream. In preparation for implementing the asm-generic atomic bitops in terms of atomic_long_*(), we need to prevent <asm/atomic.h> implementations from pulling in <linux/bitops.h>. A common reason for this include is for the BITS_PER_BYTE definition, so move this and some other BIT() and masking macros into a new header file, <linux/bits.h>. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
c2a357d9b4 |
bitops: avoid integer overflow in GENMASK(_ULL)
commit c32ee3d9abd284b4fcaacc250b101f93829c7bae upstream. GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically results in an integer overflow. clang raises a warning if the overflow occurs in a preprocessor expression. Clear the low-order bits through a substraction instead of the left-shift to avoid the overflow. (akpm: no change in .text size in my testing) Link: http://lkml.kernel.org/r/20170803212020.24939-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
5ec6421c21 |
locking/static_keys: Provide DECLARE and well as DEFINE macros
commit b8fb03785d4de097507d0cf45873525e0ac4d2b2 upstream.
We will need to provide declarations of static keys in header
files. Provide DECLARE_STATIC_KEY_{TRUE,FALSE} macros.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/816881cf85bd3cf13385d212882618f38a3b5d33.1472754711.git.tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
1b2b6db776 |
USB: core: Fix bug caused by duplicate interface PM usage counter
commit c2b71462d294cf517a0bc6e4fd6424d7cee5596f upstream. The syzkaller fuzzer reported a bug in the USB hub driver which turned out to be caused by a negative runtime-PM usage counter. This allowed a hub to be runtime suspended at a time when the driver did not expect it. The symptom is a WARNING issued because the hub's status URB is submitted while it is already active: URB 0000000031fb463e submitted while active WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363 The negative runtime-PM usage count was caused by an unfortunate design decision made when runtime PM was first implemented for USB. At that time, USB class drivers were allowed to unbind from their interfaces without balancing the usage counter (i.e., leaving it with a positive count). The core code would take care of setting the counter back to 0 before allowing another driver to bind to the interface. Later on when runtime PM was implemented for the entire kernel, the opposite decision was made: Drivers were required to balance their runtime-PM get and put calls. In order to maintain backward compatibility, however, the USB subsystem adapted to the new implementation by keeping an independent usage counter for each interface and using it to automatically adjust the normal usage counter back to 0 whenever a driver was unbound. This approach involves duplicating information, but what is worse, it doesn't work properly in cases where a USB class driver delays decrementing the usage counter until after the driver's disconnect() routine has returned and the counter has been adjusted back to 0. Doing so would cause the usage counter to become negative. There's even a warning about this in the USB power management documentation! As it happens, this is exactly what the hub driver does. The kick_hub_wq() routine increments the runtime-PM usage counter, and the corresponding decrement is carried out by hub_event() in the context of the hub_wq work-queue thread. This work routine may sometimes run after the driver has been unbound from its interface, and when it does it causes the usage counter to go negative. It is not possible for hub_disconnect() to wait for a pending hub_event() call to finish, because hub_disconnect() is called with the device lock held and hub_event() acquires that lock. The only feasible fix is to reverse the original design decision: remove the duplicate interface-specific usage counter and require USB drivers to balance their runtime PM gets and puts. As far as I know, all existing drivers currently do this. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com CC: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
3dda8d29b5 |
x86/kprobes: Verify stack frame on kretprobe
commit 3ff9c075cc767b3060bdac12da72fc94dd7da1b8 upstream. Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
9ca0f944a7 |
appletalk: Fix compile regression
[ Upstream commit 27da0d2ef998e222a876c0cec72aa7829a626266 ]
A bugfix just broke compilation of appletalk when CONFIG_SYSCTL
is disabled:
In file included from net/appletalk/ddp.c:65:
net/appletalk/ddp.c: In function 'atalk_init':
include/linux/atalk.h:164:34: error: expected expression before 'do'
#define atalk_register_sysctl() do { } while(0)
^~
net/appletalk/ddp.c:1934:7: note: in expansion of macro 'atalk_register_sysctl'
rc = atalk_register_sysctl();
This is easier to avoid by using conventional inline functions
as stubs rather than macros. The header already has inline
functions for other purposes, so I'm changing over all the
macros for consistency.
Fixes: 6377f787aeb9 ("appletalk: Fix use-after-free in atalk_proc_exit")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
c947b45f2e |
include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
[ Upstream commit a4046c06be50a4f01d435aa7fe57514818e6cc82 ] Use offsetof() to calculate offset of a field to take advantage of compiler built-in version when possible, and avoid UBSAN warning when compiling with Clang: UBSAN: Undefined behaviour in mm/swapfile.c:3010:38 member access within null pointer of type 'union swap_header' CPU: 6 PID: 1833 Comm: swapon Tainted: G S 4.19.23 #43 Call trace: dump_backtrace+0x0/0x194 show_stack+0x20/0x2c __dump_stack+0x20/0x28 dump_stack+0x70/0x94 ubsan_epilogue+0x14/0x44 ubsan_type_mismatch_common+0xf4/0xfc __ubsan_handle_type_mismatch_v1+0x34/0x54 __se_sys_swapon+0x654/0x1084 __arm64_sys_swapon+0x1c/0x24 el0_svc_common+0xa8/0x150 el0_svc_compat_handler+0x2c/0x38 el0_svc_compat+0x8/0x18 Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.org Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |