Commit e679654a70 ("bpf: Fix a rcu_sched stall issue with
bpf task/task_file iterator") tries to fix rcu stalls warning
which is caused by bpf task_file iterator when running
"bpftool prog".
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
\x09(t=21031 jiffies g=2534773 q=179750)
NMI backtrace for cpu 7
CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G W 5.8.0-00004-g68bfc7f8c1b4 #6
Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
Call Trace:
<IRQ>
dump_stack+0x57/0x70
nmi_cpu_backtrace.cold+0x14/0x53
? lapic_can_unplug_cpu.cold+0x39/0x39
nmi_trigger_cpumask_backtrace+0xb7/0xc7
rcu_dump_cpu_stacks+0xa2/0xd0
rcu_sched_clock_irq.cold+0x1ff/0x3d9
? tick_nohz_handler+0x100/0x100
update_process_times+0x5b/0x90
tick_sched_timer+0x5e/0xf0
__hrtimer_run_queues+0x12a/0x2a0
hrtimer_interrupt+0x10e/0x280
__sysvec_apic_timer_interrupt+0x51/0xe0
asm_call_on_stack+0xf/0x20
</IRQ>
sysvec_apic_timer_interrupt+0x6f/0x80
...
task_file_seq_next+0x52/0xa0
bpf_seq_read+0xb9/0x320
vfs_read+0x9d/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x38/0x60
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The fix is to limit the number of bpf program runs to be
one million. This fixed the program in most cases. But
we also found under heavy load, which can increase the wallclock
time for bpf_seq_read(), the warning may still be possible.
For example, calling bpf_delay() in the "while" loop of
bpf_seq_read(), which will introduce artificial delay,
the warning will show up in my qemu run.
static unsigned q;
volatile unsigned *p = &q;
volatile unsigned long long ll;
static void bpf_delay(void)
{
int i, j;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
ll += *p;
}
There are two ways to fix this issue. One is to reduce the above
one million threshold to say 100,000 and hopefully rcu warning will
not show up any more. Another is to introduce a target feature
which enables bpf_seq_read() calling cond_resched().
This patch took second approach as the first approach may cause
more -EAGAIN failures for read() syscalls. Note that not all bpf_iter
targets can permit cond_resched() in bpf_seq_read() as some, e.g.,
netlink seq iterator, rcu read lock critical section spans through
seq_ops->next() -> seq_ops->show() -> seq_ops->next().
For the kernel code with the above hack, "bpftool p" roughly takes
38 seconds to finish on my VM with 184 bpf program runs.
Using the following command, I am able to collect the number of
context switches:
perf stat -e context-switches -- ./bpftool p >& log
Without this patch,
69 context-switches
With this patch,
75 context-switches
This patch added additional 6 context switches, roughly every 6 seconds
to reschedule, to avoid lengthy no-rescheduling which may cause the
above RCU warnings.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201028061054.1411116-1-yhs@fb.com
Coredump logics needs to report not only the registers of the dumping
thread, but (since 2.5.43) those of other threads getting killed.
Doing that might require extra state saved on the stack in asm glue at
kernel entry; signal delivery logics does that (we need to be able to
save sigcontext there, at the very least) and so does seccomp.
That covers all callers of do_coredump(). Secondary threads get hit with
SIGKILL and caught as soon as they reach exit_mm(), which normally happens
in signal delivery, so those are also fine most of the time. Unfortunately,
it is possible to end up with secondary zapped when it has already entered
exit(2) (or, worse yet, is oopsing). In those cases we reach exit_mm()
when mm->core_state is already set, but the stack contents is not what
we would have in signal delivery.
At least on two architectures (alpha and m68k) it leads to infoleaks - we
end up with a chunk of kernel stack written into coredump, with the contents
consisting of normal C stack frames of the call chain leading to exit_mm()
instead of the expected copy of userland registers. In case of alpha we
leak 312 bytes of stack. Other architectures (including the regset-using
ones) might have similar problems - the normal user of regsets is ptrace
and the state of tracee at the time of such calls is special in the same
way signal delivery is.
Note that had the zapper gotten to the exiting thread slightly later,
it wouldn't have been included into coredump anyway - we skip the threads
that have already cleared their ->mm. So let's pretend that zapper always
loses the race. IOW, have exit_mm() only insert into the dumper list if
we'd gotten there from handling a fatal signal[*]
As the result, the callers of do_exit() that have *not* gone through get_signal()
are not seen by coredump logics as secondary threads. Which excludes voluntary
exit()/oopsen/traps/etc. The dumper thread itself is unaffected by that,
so seccomp is fine.
[*] originally I intended to add a new flag in tsk->flags, but ebiederman pointed
out that PF_SIGNALED is already doing just what we need.
Cc: stable@vger.kernel.org
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull tracing fix from Steven Rostedt:
"Fix synthetic event "strcat" overrun
New synthetic event code used strcat() and miscalculated the ending,
causing the concatenation to write beyond the allocated memory.
Instead of using strncat(), the code is switched over to seq_buf which
has all the mechanisms in place to protect against writing more than
what is allocated, and cleans up the code a bit"
* tag 'trace-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing, synthetic events: Replace buggy strcat() with seq_buf operations
If should_futex_fail() returns true in futex_wake_pi(), then the 'ret'
variable is set to -EFAULT and then immediately overwritten. So the failure
injection is non-functional.
Fix it by actually leaving the function and returning -EFAULT.
The Fixes tag is kinda blury because the initial commit which introduced
failure injection was already sloppy, but the below mentioned commit broke
it completely.
[ tglx: Massaged changelog ]
Fixes: 6b4f4bc9cb ("locking/futex: Allow low-level atomic operations to return -EAGAIN")
Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200927000858.24219-1-mateusznosek0@gmail.com
Commit c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU") tried to speed-up hotplug of
SCHED_NORMAL tasks by temporarily elevating them to SCHED_FIFO. But
while at it, it also prevented hotplug from SCHED_IDLE, SCHED_BATCH or
SCHED_DEADLINE for no apparent reason.
Since this is a userspace-visible change, and is unlikely to actually be
needed, change the patch logic to only optimize for SCHED_NORMAL tasks
and leave the others untouched.
Bug: 169238689
Fixes: c6e5f9d7cf ("ANDROID: cpu-hotplug: Always use real time
scheduling when hotplugging a CPU")
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4d9e88b15fee56e7d234826e2eaea306a69328bb
When there are no audit rules registered, mandatory records (config,
etc.) are missing their accompanying records (syscall, proctitle, etc.).
This is due to audit context dummy set on syscall entry based on absence
of rules that signals that no other records are to be printed. Clear the dummy
bit if any record is generated, open coding this in audit_log_start().
The proctitle context and dummy checks are pointless since the
proctitle record will not be printed if no syscall records are printed.
The fds array is reset to -1 after the first syscall to indicate it
isn't valid any more, but was never set to -1 when the context was
allocated to indicate it wasn't yet valid.
Check ctx->pwd in audit_log_name().
The audit_inode* functions can be called without going through
getname_flags() or getname_kernel() that sets audit_names and cwd, so
set the cwd in audit_alloc_name() if it has not already been done so due to
audit_names being valid and purge all other audit_getcwd() calls.
Revert the LSM dump_common_audit_data() LSM_AUDIT_DATA_* cases from the
ghak96 patch since they are no longer necessary due to cwd coverage in
audit_alloc_name().
Thanks to bauen1 <j2468h@googlemail.com> for reporting LSM situations in
which context->cwd is not valid, inadvertantly fixed by the ghak96 patch.
Please see upstream github issue
https://github.com/linux-audit/audit-kernel/issues/120
This is also related to upstream github issue
https://github.com/linux-audit/audit-kernel/issues/96
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Fix a typo in a comment in freeze_processes().
Signed-off-by: Jackie Zamow <jackie.zamow@gmail.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There was a memory corruption bug happening while running the synthetic
event selftests:
kmemleak: Cannot insert 0xffff8c196fa2afe5 into the object search tree (overlaps existing)
CPU: 5 PID: 6866 Comm: ftracetest Tainted: G W 5.9.0-rc5-test+ #577
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
dump_stack+0x8d/0xc0
create_object.cold+0x3b/0x60
slab_post_alloc_hook+0x57/0x510
? tracing_map_init+0x178/0x340
__kmalloc+0x1b1/0x390
tracing_map_init+0x178/0x340
event_hist_trigger_func+0x523/0xa40
trigger_process_regex+0xc5/0x110
event_trigger_write+0x71/0xd0
vfs_write+0xca/0x210
ksys_write+0x70/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fef0a63a487
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff76f18398 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007fef0a63a487
RDX: 0000000000000039 RSI: 000055eb3b26d690 RDI: 0000000000000001
RBP: 000055eb3b26d690 R08: 000000000000000a R09: 0000000000000038
R10: 000055eb3b2cdb80 R11: 0000000000000246 R12: 0000000000000039
R13: 00007fef0a70b500 R14: 0000000000000039 R15: 00007fef0a70b700
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff8c196fa2afe0 (size 8):
kmemleak: comm "ftracetest", pid 6866, jiffies 4295082531
kmemleak: min_count = 1
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
__kmalloc+0x1b1/0x390
tracing_map_init+0x1be/0x340
event_hist_trigger_func+0x523/0xa40
trigger_process_regex+0xc5/0x110
event_trigger_write+0x71/0xd0
vfs_write+0xca/0x210
ksys_write+0x70/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The cause came down to a use of strcat() that was adding an string that was
shorten, but the strcat() did not take that into account.
strcat() is extremely dangerous as it does not care how big the buffer is.
Replace it with seq_buf operations that prevent the buffer from being
overwritten if what is being written is bigger than the buffer.
Fixes: 10819e2579 ("tracing: Handle synthetic event array field type checking correctly")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steps on the way to 5.10-rc1
Resolves conflicts in:
Documentation/admin-guide/sysctl/vm.rst
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic58f28718f28dae42948c935dfb0c62122fe86fc
Steps on the way to 5.10-rc1
Resolves conflicts in:
arch/arm64/kernel/vdso32/Makefile
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic9edce8127b3717469dc4fa96ca95166be419e9d
UBSAN reports:
Undefined behaviour in ./include/linux/time64.h:127:27
signed integer overflow:
17179869187 * 1000000000 cannot be represented in type 'long long int'
Call Trace:
timespec64_to_ns include/linux/time64.h:127 [inline]
set_cpu_itimer+0x65c/0x880 kernel/time/itimer.c:180
do_setitimer+0x8e/0x740 kernel/time/itimer.c:245
__x64_sys_setitimer+0x14c/0x2c0 kernel/time/itimer.c:336
do_syscall_64+0xa1/0x540 arch/x86/entry/common.c:295
Commit bd40a17576 ("y2038: itimer: change implementation to timespec64")
replaced the original conversion which handled time clamping correctly with
timespec64_to_ns() which has no overflow protection.
Fix it in timespec64_to_ns() as this is not necessarily limited to the
usage in itimers.
[ tglx: Added comment and adjusted the fixes tag ]
Fixes: 361a3bf005 ("time64: Add time64.h header and define struct timespec64")
Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1598952616-6416-1-git-send-email-prime.zeng@hisilicon.com
Since sched_clock_read_begin() and sched_clock_read_retry() are called
by notrace function sched_clock(), they shouldn't be traceable either,
or else ftrace_graph_caller will run into a dead loop on the path
as below (arm for instance):
ftrace_graph_caller()
prepare_ftrace_return()
function_graph_enter()
ftrace_push_return_trace()
trace_clock_local()
sched_clock()
sched_clock_read_begin/retry()
Fixes: 1b86abc1c6 ("sched_clock: Expose struct clock_read_data")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200929082027.16787-1-quanyang.wang@windriver.com
Steps on the way to 5.10-rc1
Resolves conflicts in:
drivers/net/virtio_net.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I72bb00e45bb7b6154b56f31a2e9040c4e8fe899a
Steps on the way to 5.10-rc1
Resolves conflicts in:
fs/userfaultfd.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
Steps on the way to 5.10-rc1
Resolves merge issues in:
drivers/net/virtio_net.c
net/xfrm/xfrm_state.c
net/xfrm/xfrm_user.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3132e7802f25cb775eb02d0b3a03068da39a6fe2
Resolves conflicts in:
kernel/dma/mapping.c
Steps on the way to 5.10-rc1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I61292201a3ac4b92c39f330585692652cf985550
Steps on the way to 5.10-rc1
Fixes merge conflicts in:
drivers/gpu/drm/bridge/lontium-lt9611.c
drivers/gpu/drm/virtio/virtgpu_display.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I65d9d4c69ea7c79854d275462c9aca0a37d42654
Steps on the way to 5.10-rc1
Resolves conflicts in:
drivers/hwtracing/stm/ftrace.c
drivers/misc/Makefile
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8ac53000bf0c61973970f47b383904a2067bd353
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace". So
sparse rightfully complains about passing a kernel pointer to
put_user().
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer fixes from Thomas Gleixner:
"A time namespace fix and a matching selftest. The futex absolute
timeouts which are based on CLOCK_MONOTONIC require time namespace
corrected. This was missed in the original time namesapce support"
* tag 'timers-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/timens: Add a test for futex()
futex: Adjust absolute futex timeouts with per time namespace offset
Pull scheduler fixes from Thomas Gleixner:
"Two scheduler fixes:
- A trivial build fix for sched_feat() to compile correctly with
CONFIG_JUMP_LABEL=n
- Replace a zero lenght array with a flexible array"
* tag 'sched-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/features: Fix !CONFIG_JUMP_LABEL case
sched: Replace zero-length array with flexible-array
Pull SafeSetID updates from Micah Morton:
"The changes are mostly contained to within the SafeSetID LSM, with the
exception of a few 1-line changes to change some ns_capable() calls to
ns_capable_setid() -- causing a flag (CAP_OPT_INSETID) to be set that
is examined by SafeSetID code and nothing else in the kernel.
The changes to SafeSetID internally allow for setting up GID
transition security policies, as already existed for UIDs"
* tag 'safesetid-5.10' of git://github.com/micah-morton/linux:
LSM: SafeSetID: Fix warnings reported by test bot
LSM: SafeSetID: Add GID security policy handling
LSM: Signal to SafeSetID when setting group IDs
Pull random32 updates from Willy Tarreau:
"Make prandom_u32() less predictable.
This is the cleanup of the latest series of prandom_u32
experimentations consisting in using SipHash instead of Tausworthe to
produce the randoms used by the network stack.
The changes to the files were kept minimal, and the controversial
commit that used to take noise from the fast_pool (f227e3ec3b) was
reverted. Instead, a dedicated "net_rand_noise" per_cpu variable is
fed from various sources of activities (networking, scheduling) to
perturb the SipHash state using fast, non-trivially predictable data,
instead of keeping it fully deterministic. The goal is essentially to
make any occasional memory leakage or brute-force attempt useless.
The resulting code was verified to be very slightly faster on x86_64
than what is was with the controversial commit above, though this
remains barely above measurement noise. It was also tested on i386 and
arm, and build- tested only on arm64"
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
* tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom:
random32: add a selftest for the prandom32 code
random32: add noise from network and scheduling activity
random32: make prandom_u32() output unpredictable
Pull dma-mapping fixes from Christoph Hellwig:
- document the new dma_{alloc,free}_pages() API
- two fixups for the dma-mapping.h split
* tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: document dma_{alloc,free}_pages
dma-mapping: move more functions to dma-map-ops.h
ARM/sa1111: add a missing include of dma-map-ops.h
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.
This patch restores the spirit of commit f227e3ec3b ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.
The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.
It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.
This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.
Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.
Commit f227e3ec3b ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.
Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Steps on the way to 5.10-rc1
Resolves conflicts in:
include/linux/blk-crypto.h
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4012850c2e4b804d9e87e90b8e03a3b9ce21b5e7
Pull tracing ring-buffer fix from Steven Rostedt:
"The success return value of ring_buffer_resize() is stated to be
zero and checked that way.
But it was incorrectly returning the size allocated.
Also, a fix to a comment"
* tag 'trace-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Update the description for ring_buffer_wait
ring-buffer: Return 0 on success from ring_buffer_resize()
Pull more power management updates from Rafael Wysocki:
"First of all, the adaptive voltage scaling (AVS) drivers go to new
platform-specific locations as planned (this part was reported to have
merge conflicts against the new arm-soc updates in linux-next).
In addition to that, there are some fixes (intel_idle, intel_pstate,
RAPL, acpi_cpufreq), the addition of on/off notifiers and idle state
accounting support to the generic power domains (genpd) code and some
janitorial changes all over.
Specifics:
- Move the AVS drivers to new platform-specific locations and get rid
of the drivers/power/avs directory (Ulf Hansson).
- Add on/off notifiers and idle state accounting support to the
generic power domains (genpd) framework (Ulf Hansson, Lina Iyer).
- Ulf will maintain the PM domain part of cpuidle-psci (Ulf Hansson).
- Make intel_idle disregard ACPI _CST if it cannot use the data
returned by that method (Mel Gorman).
- Modify intel_pstate to avoid leaving useless sysfs directory
structure behind if it cannot be registered (Chen Yu).
- Fix domain detection in the RAPL power capping driver and prevent
it from failing to enumerate the Psys RAPL domain (Zhang Rui).
- Allow acpi-cpufreq to use ACPI _PSD information with Family 19 and
later AMD chips (Wei Huang).
- Update the driver assumptions comment in intel_idle and fix a
kerneldoc comment in the runtime PM framework (Alexander Monakov,
Bean Huo).
- Avoid unnecessary resets of the cached frequency in the schedutil
cpufreq governor to reduce overhead (Wei Wang).
- Clean up the cpufreq core a bit (Viresh Kumar).
- Make assorted minor janitorial changes (Daniel Lezcano, Geert
Uytterhoeven, Hubert Jasudowicz, Tom Rix).
- Clean up and optimize the cpupower utility somewhat (Colin Ian
King, Martin Kaistra)"
* tag 'pm-5.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
PM: sleep: remove unreachable break
PM: AVS: Drop the avs directory and the corresponding Kconfig
PM: AVS: qcom-cpr: Move the driver to the qcom specific drivers
PM: runtime: Fix typo in pm_runtime_set_active() helper comment
PM: domains: Fix build error for genpd notifiers
powercap: Fix typo in Kconfig "Plance" -> "Plane"
cpufreq: schedutil: restore cached freq when next_f is not changed
acpi-cpufreq: Honor _PSD table setting on new AMD CPUs
PM: AVS: smartreflex Move driver to soc specific drivers
PM: AVS: rockchip-io: Move the driver to the rockchip specific drivers
PM: domains: enable domain idle state accounting
PM: domains: Add curly braces to delimit comment + statement block
PM: domains: Add support for PM domain on/off notifiers for genpd
powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain
powercap/intel_rapl: Fix domain detection
intel_idle: Ignore _CST if control cannot be taken from the platform
cpuidle: Remove pointless stub
intel_idle: mention assumption that WBINVD is not needed
MAINTAINERS: Add section for cpuidle-psci PM domain
cpufreq: intel_pstate: Delete intel_pstate sysfs if failed to register the driver
...
Pull networking fixes from Jakub Kicinski:
"Cross-tree/merge window issues:
- rtl8150: don't incorrectly assign random MAC addresses; fix late in
the 5.9 cycle started depending on a return code from a function
which changed with the 5.10 PR from the usb subsystem
Current release regressions:
- Revert "virtio-net: ethtool configurable RXCSUM", it was causing
crashes at probe when control vq was not negotiated/available
Previous release regressions:
- ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
bus, only first device would be probed correctly
- nexthop: Fix performance regression in nexthop deletion by
effectively switching from recently added synchronize_rcu() to
synchronize_rcu_expedited()
- netsec: ignore 'phy-mode' device property on ACPI systems; the
property is not populated correctly by the firmware, but firmware
configures the PHY so just keep boot settings
Previous releases - always broken:
- tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
bulk transfers getting "stuck"
- icmp: randomize the global rate limiter to prevent attackers from
getting useful signal
- r8169: fix operation under forced interrupt threading, make the
driver always use hard irqs, even on RT, given the handler is light
and only wants to schedule napi (and do so through a _irqoff()
variant, preferably)
- bpf: Enforce pointer id generation for all may-be-null register
type to avoid pointers erroneously getting marked as null-checked
- tipc: re-configure queue limit for broadcast link
- net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
tunnels
- fix various issues in chelsio inline tls driver
Misc:
- bpf: improve just-added bpf_redirect_neigh() helper api to support
supplying nexthop by the caller - in case BPF program has already
done a lookup we can avoid doing another one
- remove unnecessary break statements
- make MCTCP not select IPV6, but rather depend on it"
* tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
tcp: fix to update snd_wl1 in bulk receiver fast path
net: Properly typecast int values to set sk_max_pacing_rate
netfilter: nf_fwd_netdev: clear timestamp in forwarding path
ibmvnic: save changed mac address to adapter->mac_addr
selftests: mptcp: depends on built-in IPv6
Revert "virtio-net: ethtool configurable RXCSUM"
rtnetlink: fix data overflow in rtnl_calcit()
net: ethernet: mtk-star-emac: select REGMAP_MMIO
net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
mptcp: depends on IPV6 but not as a module
sfc: move initialisation of efx->filter_sem to efx_init_struct()
mpls: load mpls_gso after mpls_iptunnel
net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
net: dsa: bcm_sf2: make const array static, makes object smaller
mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
...
Export tick_nohz_get_sleep_length() so idle drivers may use this to
determine the available idle time before the next timer wakeup.
Bug: 169136276
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Change-Id: I0d18638d63c032862ae048bc2c3d49fa1bd90291