168 Commits

Author SHA1 Message Date
Anna-Maria Gleixner
1e34a8824b hrtimer: Prepare support for PREEMPT_RT
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling hrtimer_cancel() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
    handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
    handler on the same CPU, then spin waiting for the timer handler to
    complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If hrtimer_cancel() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
hrtimer_cancel() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.737767218@linutronix.de

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:23 +00:00
Thomas Gleixner
a55c16ef9b hrtimer: Make enqueue mode check work on RT
hrtimer_start_range_ns() has a WARN_ONCE() which verifies that a timer
which is marker for softirq expiry is not queued in the hard interrupt base
and vice versa.

When PREEMPT_RT is enabled, timers which are not explicitely marked to
expire in hard interrupt context are deferrred to the soft interrupt. So
the regular check would trigger.

Change the check, so when PREEMPT_RT is enabled, it is verified that the
timers marked for hard interrupt expiry are not tried to be queued for soft
interrupt expiry or any of the unmarked and softirq marked is tried to be
expired in hard interrupt context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:23 +00:00
Thomas Gleixner
3543ff9fdc hrtimer: Provide hrtimer_sleeper_start_expires()
hrtimer_sleepers will gain a scheduling class dependent treatment on
PREEMPT_RT. Create a wrapper around hrtimer_start_expires() to make that
possible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:23 +00:00
Sebastian Andrzej Siewior
284960c058 hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() calls
hrtimer_init_sleeper() calls require prior initialisation of the hrtimer
object which is embedded into the hrtimer_sleeper.

Combine the initialization and spare a function call. Fixup all call sites.

This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper
specific initializations of the embedded hrtimer without modifying any of
the call sites.

No functional change.

[ anna-maria: Minor cleanups ]
[ tglx: Adopted to the removal of the task argument of
  	hrtimer_init_sleeper() and trivial polishing.
	Folded a fix from Stephen Rothwell for the vsoc code ]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.887468908@linutronix.de

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:23 +00:00
Thomas Gleixner
c7a584ac5e hrtimer: Remove task argument from hrtimer_init_sleeper()
All callers hand in 'current' and that's the only task pointer which
actually makes sense. Remove the task argument and set current in the
function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.791885290@linutronix.de

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:22 +00:00
Rafael J. Wysocki
47bce51d8c time: hrtimer: Introduce hrtimer_next_event_without()
The next set of changes will need to compute the time to the next
hrtimer event over all hrtimers except for the scheduler tick one.

To that end introduce a new helper function,
hrtimer_next_event_without(), for computing the time until the next
hrtimer event over all timers except for one and modify the underlying
code in __hrtimer_next_event_base() to prepare it for being called by
that new function.

No intentional changes in functionality.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:22 +00:00
Anna-Maria Gleixner
2206c79f32 hrtimer: Implement support for softirq based hrtimers
hrtimer callbacks are always invoked in hard interrupt context. Several
users in tree require soft interrupt context for their callbacks and
achieve this by combining a hrtimer with a tasklet. The hrtimer schedules
the tasklet in hard interrupt context and the tasklet callback gets invoked
in softirq context later.

That's suboptimal and aside of that the real-time patch moves most of the
hrtimers into softirq context. So adding native support for hrtimers
expiring in softirq context is a valuable extension for both mainline and
the RT patch set.

Each valid hrtimer clock id has two associated hrtimer clock bases: one for
timers expiring in hardirq context and one for timers expiring in softirq
context.

Implement the functionality to associate a hrtimer with the hard or softirq
related clock bases and update the relevant functions to take them into
account when the next expiry time needs to be evaluated.

Add a check into the hard interrupt context handler functions to check
whether the first expiring softirq based timer has expired. If it's expired
the softirq is raised and the accounting of softirq based timers to
evaluate the next expiry time for programming the timer hardware is skipped
until the softirq processing has finished. At the end of the softirq
processing the regular processing is resumed.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:22 +00:00
Anna-Maria Gleixner
98845d120b hrtimer: Make hrtimer_reprogramm() unconditional
hrtimer_reprogram() needs to be available unconditionally for softirq based
hrtimers. Move the function and all required struct members out of the
CONFIG_HIGH_RES_TIMERS #ifdef.

There is no functional change because hrtimer_reprogram() is only invoked
when hrtimer_cpu_base.hres_active is true. Making it unconditional
increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids
replication of that code for the upcoming softirq based hrtimers support.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:21 +00:00
Anna-Maria Gleixner
a2bcdacaaa hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional
hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer
in a CPU base.

This pointer cannot be dereferenced and is solely used to check whether a
hrtimer which is removed is the hrtimer which is the first to expire in the
CPU base. If this is the case, then the timer hardware needs to be
reprogrammed to avoid an extra interrupt for nothing.

Again, this is conditional functionality, but there is no compelling reason
to make this conditional. As a preparation, hrtimer_cpu_base.next_timer
needs to be available unconditonally.

Aside of that the upcoming support for softirq based hrtimers requires access
to this pointer unconditionally as well, so our motivation is not entirely
simplicity based.

Make the update of hrtimer_cpu_base.next_timer unconditional and remove the
marginal as it's just a store on an already dirtied cacheline.

No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:21 +00:00
Anna-Maria Gleixner
033a1cc518 hrtimer: Make the remote enqueue check unconditional
hrtimer_cpu_base.expires_next is used to cache the next event armed in the
timer hardware. The value is used to check whether an hrtimer can be
enqueued remotely. If the new hrtimer is expiring before expires_next, then
remote enqueue is not possible as the remote hrtimer hardware cannot be
accessed for reprogramming to an earlier expiry time.

The remote enqueue check is currently conditional on
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no
compelling reason to make this conditional.

Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y
guarded area and remove the conditionals in hrtimer_check_target().

The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the
!hrtimer_cpu_base.hres_active case because in these cases nothing updates
hrtimer_cpu_base.expires_next yet. This will be changed with later patches
which further reduce the #ifdef zoo in this code.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:21 +00:00
Anna-Maria Gleixner
118840fad0 hrtimer: Make the hrtimer_cpu_base::hres_active field unconditional, to simplify the code
The hrtimer_cpu_base::hres_active_member field depends on CONFIG_HIGH_RES_TIMERS=y
currently, and all related functions to this member are conditional as well.

To simplify the code make it unconditional and set it to zero during initialization.

(This will also help with the upcoming softirq based hrtimers code.)

The conditional code sections can be avoided by adding IS_ENABLED(HIGHRES)
conditionals into common functions, which ensures dead code elimination.

There is no functional change.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-14-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:21 +00:00
Anna-Maria Gleixner
7dab329ffb hrtimer: Make room in 'struct hrtimer_cpu_base'
The upcoming softirq based hrtimers support requires an additional field in
the hrtimer_cpu_base struct, which would grow the struct size beyond a
cache line.

The hrtimer_cpu_base::nr_retries and ::nr_hangs members are solely
used for diagnostic output and have no requirement to be 'unsigned int'.

Make them 'unsigned short' to create room for the new struct member.

No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-13-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:21 +00:00
Anna-Maria Gleixner
e0c117cf4c hrtimer: Fix kerneldoc syntax for 'struct hrtimer_cpu_base'
The '/**' sequence marks the start of a structure description. Add the
missing second asterisk. While at it adapt the ordering of the struct
members to the struct definition and document the purpose of
expires_next more precisely.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-4-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:20 +00:00
celtare21
f2079facaa Revert "BACKPORT: time: hrtimer: Introduce hrtimer_next_event_without()"
This reverts commit 6277dd586f.

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:20 +00:00
Anna-Maria Gleixner
e40bfcab24 hrtimer: Store running timer in hrtimer_clock_base
The pointer to the currently running timer is stored in hrtimer_cpu_base
before the base lock is dropped and the callback is invoked.

This results in two levels of indirections and the upcoming support for
softirq based hrtimer requires splitting the "running" storage into soft
and hard IRQ context expiry.

Storing both in the cpu base would require conditionals in all code paths
accessing that information.

It's possible to have a per clock base sequence count and running pointer
without changing the semantics of the related mechanisms because the timer
base pointer cannot be changed while a timer is running the callback.

Unfortunately this makes cpu_clock base larger than 32 bytes on 32-bit
kernels. Instead of having huge gaps due to alignment, remove the alignment
and let the compiler pack CPU base for 32-bit kernels. The resulting cache access
patterns are fortunately not really different from the current
behaviour. On 64-bit kernels the 64-byte alignment stays and the behaviour is
unchanged. This was determined by analyzing the resulting layout and
looking at the number of cache lines involved for the frequently used
clocks.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-12-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:20 +00:00
Anna-Maria Gleixner
e78d0467d4 hrtimer: Fix hrtimer_start[_range_ns]() function descriptions
The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only
when HRTIMER_MODE_PINNED is set.

Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-6-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:20 +00:00
Anna-Maria Gleixner
1f3fa2e99f hrtimer: Clean up the 'int clock' parameter of schedule_hrtimeout_range_clock()
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the
clock ID, instead of the customary predefined "clockid_t" type.

In hrtimer coding style the canonical variable name for the clock ID is
'clock_id', therefore change the name of the parameter here as well
to make it all consistent.

While at it, clean up the description for the 'clock_id' and 'mode'
function parameters. The clock modes and the clock IDs are not
restricted as the comment suggests.

Fix the mode description as well for the callers of schedule_hrtimeout_range_clock().

No functional changes intended.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:20 +00:00
Thomas Gleixner
1bee707835 hrtimer: Optimize the hrtimer code by using static keys for migration_enable/nohz_active
The hrtimer_cpu_base::migration_enable and ::nohz_active fields
were originally introduced to avoid accessing global variables
for these decisions.

Still that results in a (cache hot) load and conditional branch,
which can be avoided by using static keys.

Implement it with static keys and optimize for the most critical
case of high performance networking which tends to disable the
timer migration functionality.

No change in functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos
Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:20 +00:00
Sebastian Andrzej Siewior
e899e64d3e hrtimer: Introduce HARD expiry mode
On PREEMPT_RT not all hrtimers can be expired in hard interrupt context
even if that is perfectly fine on a PREEMPT_RT=n kernel, e.g. because they
take regular spinlocks. Also for latency reasons PREEMPT_RT tries to defer
most hrtimers' expiry into soft interrupt context.

But there are hrtimers which must be expired in hard interrupt context even
when PREEMPT_RT is enabled:

  - hrtimers which must expiry in hard interrupt context, e.g. scheduler,
    perf, watchdog related hrtimers

  - latency critical hrtimers, e.g. nanosleep, ..., kvm lapic timer

Add a new mode flag HRTIMER_MODE_HARD which allows to mark these timers so
PREEMPT_RT will not move them into softirq expiry mode.

[ tglx: Split out of a larger combo patch. Added changelog ]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185752.981398465@linutronix.de

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:19 +00:00
Anna-Maria Gleixner
a5da363d25 hrtimer: Add clock bases and hrtimer mode for softirq context
Currently hrtimer callback functions are always executed in hard interrupt
context. Users of hrtimers, which need their timer function to be executed
in soft interrupt context, make use of tasklets to get the proper context.

Add additional hrtimer clock bases for timers which must expire in softirq
context, so the detour via the tasklet can be avoided. This is also
required for RT, where the majority of hrtimer is moved into softirq
hrtimer context.

The selection of the expiry mode happens via a mode bit. Introduce
HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED
bits and update the decoding of hrtimer_mode in tracepoints.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:19 +00:00
Anna-Maria Gleixner
425b05a84e hrtimer: Clean up 'enum hrtimer_mode'
It's not obvious that the HRTIMER_MODE variants are bit combinations,
because all modes are hard coded constants currently.

Change it so the bit meanings are clear; and use the symbols for creating
modes which combine bits.

While at it get rid of the ugly tail comments as well.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-8-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-04-11 07:23:19 +00:00
Srinivasarao P
89c9d6d8aa Merge android-4.14.162 (c2bd4f8) into msm-4.14
* refs/heads/tmp-c2bd4f8:
  Linux 4.14.162
  spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
  gtp: avoid zero size hashtable
  gtp: fix an use-after-free in ipv4_pdp_find()
  gtp: fix wrong condition in gtp_genl_dump_pdp()
  tcp: do not send empty skb from tcp_write_xmit()
  tcp/dccp: fix possible race __inet_lookup_established()
  gtp: do not allow adding duplicate tid and ms_addr pdp context
  sit: do not confirm neighbor when do pmtu update
  vti: do not confirm neighbor when do pmtu update
  tunnel: do not confirm neighbor when do pmtu update
  net/dst: add new function skb_dst_update_pmtu_no_confirm
  gtp: do not confirm neighbor when do pmtu update
  ip6_gre: do not confirm neighbor when do pmtu update
  net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  vhost/vsock: accept only packets with the right dst_cid
  udp: fix integer overflow while computing available space in sk_rcvbuf
  ptp: fix the race between the release of ptp_clock and cdev
  net/mlxfw: Fix out-of-memory error in mfa2 flash burning
  net: ena: fix napi handler misbehavior when the napi budget is zero
  pinctrl: baytrail: Really serialize all register accesses
  tty/serial: atmel: fix out of range clock divider handling
  spi: fsl: don't map irq during probe
  hrtimer: Annotate lockless access to timer->state
  net: icmp: fix data-race in cmp_global_allow()
  net: add a READ_ONCE() in skb_peek_tail()
  inetpeer: fix data-race in inet_putpeer / inet_putpeer
  netfilter: bridge: make sure to pull arp header in br_nf_forward_arp()
  6pack,mkiss: fix possible deadlock
  netfilter: ebtables: compat: reject all padding in matches/watchers
  filldir[64]: remove WARN_ON_ONCE() for bad directory entries
  Make filldir[64]() verify the directory entry filename is valid
  perf strbuf: Remove redundant va_end() in strbuf_addv()
  bonding: fix active-backup transition after link failure
  ALSA: hda - Downgrade error message for single-cmd fallback
  netfilter: nf_queue: enqueue skbs with NULL dst
  net, sysctl: Fix compiler warning when only cBPF is present
  x86/mce: Fix possibly incorrect severity calculation on AMD
  userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
  kernel: sysctl: make drop_caches write-only
  ocfs2: fix passing zero to 'PTR_ERR' warning
  s390/cpum_sf: Check for SDBT and SDB consistency
  libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h
  s390/zcrypt: handle new reply code FILTERED_BY_HYPERVISOR
  perf regs: Make perf_reg_name() return "unknown" instead of NULL
  perf script: Fix brstackinsn for AUXTRACE
  cdrom: respect device capabilities during opening action
  scripts/kallsyms: fix definitely-lost memory leak
  apparmor: fix unsigned len comparison with less than zero
  gpio: mpc8xxx: Don't overwrite default irq_set_type callback
  scsi: target: iscsi: Wait for all commands to finish before freeing a session
  scsi: iscsi: Don't send data to unbound connection
  scsi: NCR5380: Add disconnect_mask module parameter
  scsi: scsi_debug: num_tgts must be >= 0
  scsi: ufs: Fix error handing during hibern8 enter
  scsi: pm80xx: Fix for SATA device discovery
  HID: Improve Windows Precision Touchpad detection.
  libnvdimm/btt: fix variable 'rc' set but not used
  HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
  bcache: at least try to shrink 1 node in bch_mca_scan()
  clk: pxa: fix one of the pxa RTC clocks
  scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE
  powerpc/security: Fix wrong message when RFI Flush is disable
  powerpc/pseries/cmm: Implement release() function for sysfs device
  scsi: ufs: fix potential bug which ends in system hang
  scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
  fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
  irqchip: ingenic: Error out if IRQ domain creation failed
  irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
  clk: qcom: Allow constant ratio freq tables for rcg
  f2fs: fix to update dir's i_pino during cross_rename
  scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
  scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
  jbd2: Fix statistics for the number of logged blocks
  ext4: update direct I/O read lock pattern for IOCB_NOWAIT
  powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
  powerpc/security/book3s64: Report L1TF status in sysfs
  clocksource/drivers/asm9260: Add a check for of_clk_get
  dma-debug: add a schedule point in debug_dma_dump_mappings()
  powerpc/tools: Don't quote $objdump in scripts
  powerpc/pseries: Don't fail hash page table insert for bolted mapping
  powerpc/pseries: Mark accumulate_stolen_time() as notrace
  scsi: csiostor: Don't enable IRQs too early
  scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
  scsi: target: compare full CHAP_A Algorithm strings
  iommu/tegra-smmu: Fix page tables in > 4 GiB memory
  Input: atmel_mxt_ts - disable IRQ across suspend
  scsi: lpfc: Fix locking on mailbox command completion
  scsi: mpt3sas: Fix clear pending bit in ioctl status
  scsi: lpfc: Fix discovery failures when target device connectivity bounces
  ANDROID: serdev: Fix platform device support

Conflicts:
	drivers/scsi/ufs/ufshcd.c
	kernel/time/hrtimer.c

Discarded commit 'kernel: sysctl: make drop_caches write-only'
due to vts regression.

Change-Id: Ieabdc1178e170d30672e233f43139bb97af9bf80
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2020-04-18 17:49:12 +05:30
Greg Kroah-Hartman
c2bd4f8f0c Merge 4.14.162 into android-4.14
Changes in 4.14.162
	scsi: lpfc: Fix discovery failures when target device connectivity bounces
	scsi: mpt3sas: Fix clear pending bit in ioctl status
	scsi: lpfc: Fix locking on mailbox command completion
	Input: atmel_mxt_ts - disable IRQ across suspend
	iommu/tegra-smmu: Fix page tables in > 4 GiB memory
	scsi: target: compare full CHAP_A Algorithm strings
	scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
	scsi: csiostor: Don't enable IRQs too early
	powerpc/pseries: Mark accumulate_stolen_time() as notrace
	powerpc/pseries: Don't fail hash page table insert for bolted mapping
	powerpc/tools: Don't quote $objdump in scripts
	dma-debug: add a schedule point in debug_dma_dump_mappings()
	clocksource/drivers/asm9260: Add a check for of_clk_get
	powerpc/security/book3s64: Report L1TF status in sysfs
	powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
	ext4: update direct I/O read lock pattern for IOCB_NOWAIT
	jbd2: Fix statistics for the number of logged blocks
	scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
	scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
	f2fs: fix to update dir's i_pino during cross_rename
	clk: qcom: Allow constant ratio freq tables for rcg
	irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
	irqchip: ingenic: Error out if IRQ domain creation failed
	fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
	scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
	scsi: ufs: fix potential bug which ends in system hang
	powerpc/pseries/cmm: Implement release() function for sysfs device
	powerpc/security: Fix wrong message when RFI Flush is disable
	scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE
	clk: pxa: fix one of the pxa RTC clocks
	bcache: at least try to shrink 1 node in bch_mca_scan()
	HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
	libnvdimm/btt: fix variable 'rc' set but not used
	HID: Improve Windows Precision Touchpad detection.
	scsi: pm80xx: Fix for SATA device discovery
	scsi: ufs: Fix error handing during hibern8 enter
	scsi: scsi_debug: num_tgts must be >= 0
	scsi: NCR5380: Add disconnect_mask module parameter
	scsi: iscsi: Don't send data to unbound connection
	scsi: target: iscsi: Wait for all commands to finish before freeing a session
	gpio: mpc8xxx: Don't overwrite default irq_set_type callback
	apparmor: fix unsigned len comparison with less than zero
	scripts/kallsyms: fix definitely-lost memory leak
	cdrom: respect device capabilities during opening action
	perf script: Fix brstackinsn for AUXTRACE
	perf regs: Make perf_reg_name() return "unknown" instead of NULL
	s390/zcrypt: handle new reply code FILTERED_BY_HYPERVISOR
	libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h
	s390/cpum_sf: Check for SDBT and SDB consistency
	ocfs2: fix passing zero to 'PTR_ERR' warning
	kernel: sysctl: make drop_caches write-only
	userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
	x86/mce: Fix possibly incorrect severity calculation on AMD
	net, sysctl: Fix compiler warning when only cBPF is present
	netfilter: nf_queue: enqueue skbs with NULL dst
	ALSA: hda - Downgrade error message for single-cmd fallback
	bonding: fix active-backup transition after link failure
	perf strbuf: Remove redundant va_end() in strbuf_addv()
	Make filldir[64]() verify the directory entry filename is valid
	filldir[64]: remove WARN_ON_ONCE() for bad directory entries
	netfilter: ebtables: compat: reject all padding in matches/watchers
	6pack,mkiss: fix possible deadlock
	netfilter: bridge: make sure to pull arp header in br_nf_forward_arp()
	inetpeer: fix data-race in inet_putpeer / inet_putpeer
	net: add a READ_ONCE() in skb_peek_tail()
	net: icmp: fix data-race in cmp_global_allow()
	hrtimer: Annotate lockless access to timer->state
	spi: fsl: don't map irq during probe
	tty/serial: atmel: fix out of range clock divider handling
	pinctrl: baytrail: Really serialize all register accesses
	net: ena: fix napi handler misbehavior when the napi budget is zero
	net/mlxfw: Fix out-of-memory error in mfa2 flash burning
	ptp: fix the race between the release of ptp_clock and cdev
	udp: fix integer overflow while computing available space in sk_rcvbuf
	vhost/vsock: accept only packets with the right dst_cid
	net: add bool confirm_neigh parameter for dst_ops.update_pmtu
	ip6_gre: do not confirm neighbor when do pmtu update
	gtp: do not confirm neighbor when do pmtu update
	net/dst: add new function skb_dst_update_pmtu_no_confirm
	tunnel: do not confirm neighbor when do pmtu update
	vti: do not confirm neighbor when do pmtu update
	sit: do not confirm neighbor when do pmtu update
	gtp: do not allow adding duplicate tid and ms_addr pdp context
	tcp/dccp: fix possible race __inet_lookup_established()
	tcp: do not send empty skb from tcp_write_xmit()
	gtp: fix wrong condition in gtp_genl_dump_pdp()
	gtp: fix an use-after-free in ipv4_pdp_find()
	gtp: avoid zero size hashtable
	spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
	Linux 4.14.162

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-01-04 19:09:37 +01:00
Eric Dumazet
a51afeedc6 hrtimer: Annotate lockless access to timer->state
commit 56144737e67329c9aaed15f942d46a6302e2e3d8 upstream.

syzbot reported various data-race caused by hrtimer_is_queued() reading
timer->state. A READ_ONCE() is required there to silence the warning.

Also add the corresponding WRITE_ONCE() when timer->state is set.

In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
loading timer->state twice.

KCSAN reported these cases:

BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
 __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
 __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
 __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
 tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
 tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
 tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
 tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
 tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
 sk_backlog_rcv include/net/sock.h:945 [inline]
 __release_sock+0x135/0x1e0 net/core/sock.c:2435
 release_sock+0x61/0x160 net/core/sock.c:2951
 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0x9f/0xc0 net/socket.c:657

BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
 __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
 __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
 __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0xbb/0xe0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
 __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
 tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
 tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
 sk_backlog_rcv include/net/sock.h:945 [inline]
 __release_sock+0x135/0x1e0 net/core/sock.c:2435
 release_sock+0x61/0x160 net/core/sock.c:2951
 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0x9f/0xc0 net/socket.c:657
 __sys_sendto+0x21f/0x320 net/socket.c:1952
 __do_sys_sendto net/socket.c:1964 [inline]
 __se_sys_sendto net/socket.c:1960 [inline]
 __x64_sys_sendto+0x89/0xb0 net/socket.c:1960
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

[ tglx: Added comments ]

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04 14:00:08 +01:00
Prasad Sodagudi
c96a62baf3 Revert "cpuidle: Fix cpu frequent exits from low power mode"
This reverts 'commit ae04811d41 ("cpuidle: Fix cpu frequent
exits from low power mode")'. Fix is identified for LPM modes
regressions. Clock event device is not getting programmed in
hrtimer_cancel as hrtimer_next_event_without() sets
cpu_base.next_timer to NULL unconditionally.

Change-Id: Ic4604da1c7829a49d3a2a54b1426f3024b26f731
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2018-09-18 11:51:58 +05:30
Prasad Sodagudi
ae04811d41 cpuidle: Fix cpu frequent exits from low power mode
Following upstream changes are reverted due cpu low
power mode use cases regressions.

'commit 55e591cc18 ("BACKPORT: time: tick-sched: Reorganize idle tick management code")'
'commit f6d3093dfc ("BACKPORT: sched: idle: Do not stop the tick upfront in the idle loop")'
'commit 27e8616e42 ("UPSTREAM: sched: idle: Do not stop the tick before cpuidle_idle_call()")'
'commit 8b468535df ("UPSTREAM: jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC")'
'commit 3a25735bd7 ("UPSTREAM: cpuidle: Return nohz hint from cpuidle_select()")'
'commit f69cfc8ef9 ("BACKPORT: time: tick-sched: Split tick_nohz_stop_sched_tick()")'
'commit 6277dd586f ("BACKPORT: time: hrtimer: Introduce hrtimer_next_event_without()")'
'commit 8c71f69fb4 ("UPSTREAM: sched: idle: Select idle state before stopping the tick")'
'commit 30693a4f09 ("UPSTREAM: cpuidle: menu: Refine idle state selection for running tick")'
'commit e32966ced8 ("UPSTREAM: cpuidle: menu: Avoid selecting shallow states with stopped tick")'

Also fix the compilation errrors as tick_nohz_get_sleep_length()
function signature is changed.

Change-Id: I21488a5a91f1eac11d4e139fd44968d52563717c
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2018-09-14 23:21:12 -07:00
Isaac J. Manjarres
b2c8463039 Merge android-4.14-p.61 (b7e55e8) into msm-4.14
* remotes/origin/tmp-b7e55e8:
  Linux 4.14.61
  scsi: sg: fix minor memory leak in error path
  drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formats
  crypto: padlock-aes - Fix Nano workaround data corruption
  RDMA/uverbs: Expand primary and alt AV port checks
  iwlwifi: add more card IDs for 9000 series
  userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails
  audit: fix potential null dereference 'context->module.name'
  kvm: x86: vmx: fix vpid leak
  x86/entry/64: Remove %ebx handling from error_entry/exit
  x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
  virtio_balloon: fix another race between migration and ballooning
  net: socket: fix potential spectre v1 gadget in socketcall
  can: ems_usb: Fix memory leak on ems_usb_disconnect()
  squashfs: more metadata hardenings
  squashfs: more metadata hardening
  net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
  rxrpc: Fix user call ID check in rxrpc_service_prealloc_one
  net: stmmac: Fix WoL for PCI-based setups
  netlink: Fix spectre v1 gadget in netlink_create()
  net: dsa: Do not suspend/resume closed slave_dev
  ipv4: frags: handle possible skb truesize change
  inet: frag: enforce memory limits earlier
  bonding: avoid lockdep confusion in bond_get_stats()
  Linux 4.14.60
  tcp: add one more quick ack after after ECN events
  tcp: refactor tcp_ecn_check_ce to remove sk type cast
  tcp: do not aggressively quick ack after ECN events
  tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode
  tcp: do not force quickack when receiving out-of-order packets
  netlink: Don't shift with UB on nlk->ngroups
  netlink: Do not subscribe to non-existent groups
  xen-netfront: wait xenbus state change when load module manually
  tcp_bbr: fix bw probing to raise in-flight data for very small BDPs
  NET: stmmac: align DMA stuff to largest cache line length
  net: mdio-mux: bcm-iproc: fix wrong getter and setter pair
  net: lan78xx: fix rx handling before first packet is send
  net: fix amd-xgbe flow-control issue
  net: ena: Fix use of uninitialized DMA address bits field
  ipv4: remove BUG_ON() from fib_compute_spec_dst
  net: dsa: qca8k: Allow overwriting CPU port setting
  net: dsa: qca8k: Add QCA8334 binding documentation
  net: dsa: qca8k: Enable RXMAC when bringing up a port
  net: dsa: qca8k: Force CPU port to its highest bandwidth
  RDMA/uverbs: Protect from attempts to create flows on unsupported QP
  usb: gadget: udc: renesas_usb3: should remove debugfs
  ovl: Sync upper dirty data when syncing overlayfs
  PCI: xgene: Remove leftover pci_scan_child_bus() call
  PCI: pciehp: Assume NoCompl+ for Thunderbolt ports
  ext4: fix check to prevent initializing reserved inodes
  ext4: check for allocation block validity with block group locked
  ext4: fix inline data updates with checksums enabled
  squashfs: be more careful about metadata corruption
  random: mix rdrand with entropy sent in from userspace
  block: reset bi_iter.bi_done after splitting bio
  blkdev: __blkdev_direct_IO_simple: fix leak in error case
  block: bio_iov_iter_get_pages: fix size of last iovec
  drm/dp/mst: Fix off-by-one typo when dump payload table
  drm/atomic-helper: Drop plane->fb references only for drm_atomic_helper_shutdown()
  drm: Add DP PSR2 sink enable bit
  ASoC: topology: Add missing clock gating parameter when parsing hw_configs
  ASoC: topology: Fix bclk and fsync inversion in set_link_hw_format()
  media: si470x: fix __be16 annotations
  media: atomisp: compat32: fix __user annotations
  scsi: cxlflash: Avoid clobbering context control register value
  scsi: cxlflash: Synchronize reset and remove ops
  scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
  scsi: scsi_dh: replace too broad "TP9" string with the exact models
  regulator: Don't return or expect -errno from of_map_mode()
  media: omap3isp: fix unbalanced dma_iommu_mapping
  crypto: authenc - don't leak pointers to authenc keys
  crypto: authencesn - don't leak pointers to authenc keys
  usb: hub: Don't wait for connect state at resume for powered-off ports
  microblaze: Fix simpleImage format generation
  soc: imx: gpcv2: Do not pass static memory as platform data
  serial: core: Make sure compiler barfs for 16-byte earlycon names
  staging: lustre: ldlm: free resource when ldlm_lock_create() fails.
  staging: lustre: llite: correct removexattr detection
  staging: vchiq_core: Fix missing semaphore release in error case
  audit: allow not equal op for audit by executable
  rsi: fix nommu_map_sg overflow kernel panic
  rsi: Fix 'invalid vdd' warning in mmc
  ipconfig: Correctly initialise ic_nameservers
  drm/gma500: fix psb_intel_lvds_mode_valid()'s return type
  igb: Fix queue selection on MAC filters on i210
  arm64: defconfig: Enable Rockchip io-domain driver
  nvme: lightnvm: add granby support
  memory: tegra: Apply interrupts mask per SoC
  memory: tegra: Do not handle spurious interrupts
  delayacct: Use raw_spinlocks
  stop_machine: Use raw spinlocks
  backlight: pwm_bl: Don't use GPIOF_* with gpiod_get_direction
  dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
  net: hns3: Fixes the out of bounds access in hclge_map_tqp
  spi: meson-spicc: Fix error handling in meson_spicc_probe()
  dt-bindings: pinctrl: meson: add support for the Meson8m2 SoC
  mmc: pwrseq: Use kmalloc_array instead of stack VLA
  mmc: dw_mmc: update actual clock for mmc debugfs
  ALSA: hda/ca0132: fix build failure when a local macro is defined
  drm/atomic: Handling the case when setting old crtc for plane
  media: siano: get rid of __le32/__le16 cast warnings
  f2fs: avoid fsync() failure caused by EAGAIN in writepage()
  bpf: fix references to free_bpf_prog_info() in comments
  thermal: exynos: fix setting rising_threshold for Exynos5433
  staging: lustre: o2iblnd: Fix FastReg map/unmap for MLX5
  staging: lustre: o2iblnd: fix race at kiblnd_connect_peer
  scsi: qedf: Set the UNLOADING flag when removing a vport
  scsi: hisi_sas: config ATA de-reset as an constrained command for v3 hw
  scsi: megaraid: silence a static checker bug
  scsi: 3w-xxxx: fix a missing-check bug
  scsi: 3w-9xxx: fix a missing-check bug
  bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only.
  perf: fix invalid bit in diagnostic entry
  s390/cpum_sf: Add data entry sizes to sampling trailer entry
  brcmfmac: Add support for bcm43364 wireless chipset
  mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages
  media: saa7164: Fix driver name in debug output
  media: media-device: fix ioctl function types
  ACPI / LPSS: Only call pwm_add_table() for Bay Trail PWM if PMIC HRV is 2
  libata: Fix command retry decision
  media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open()
  net: phy: phylink: Release link GPIO
  dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
  tty: Fix data race in tty_insert_flip_string_fixed_flag
  i40e: free the skb after clearing the bitlock
  nvmem: properly handle returned value nvmem_reg_read
  ARM: dts: sh73a0: Add missing interrupt-affinity to PMU node
  ARM: dts: emev2: Add missing interrupt-affinity to PMU node
  ARM: dts: stih407-pinctrl: Fix complain about IRQ_TYPE_NONE usage
  EDAC, altera: Fix ARM64 build warning
  HID: i2c-hid: check if device is there before really probing
  powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet
  drm/amdgpu: Remove VRAM from shared bo domains.
  drm/radeon: fix mode_valid's return type
  arm64: dts: renesas: salvator-common: use audio-graph-card for Sound
  HID: hid-plantronics: Re-resend Update to map button for PTT products
  arm64: cmpwait: Clear event register before arming exclusive monitor
  media: atomisp: ov2680: don't declare unused vars
  ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback
  net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value
  media: smiapp: fix timeout checking in smiapp_read_nvm
  ixgbevf: fix MAC address changes through ixgbevf_set_mac()
  md: fix NULL dereference of mddev->pers in remove_and_add_spares()
  md/raid1: add error handling of read error from FailFast device
  regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops
  ALSA: emu10k1: Rate-limit error messages about page errors
  rtc: tps65910: fix possible race condition
  rtc: vr41xx: fix possible race condition
  rtc: tps6586x: fix possible race condition
  Bluetooth: btusb: add ID for LiteOn 04ca:301a
  drm/nouveau/fifo/gk104-: poll for runlist update completion
  scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger
  scsi: ufs: fix exception event handling
  scsi: ufs: ufshcd: fix possible unclocked register access
  fscrypt: use unbound workqueue for decryption
  net: hns3: Fix the missing client list node initialization
  spi: Add missing pm_runtime_put_noidle() after failed get
  drivers/perf: arm-ccn: don't log to dmesg in event_init
  ima: based on policy verify firmware signatures (pre-allocated buffer)
  mwifiex: correct histogram data with appropriate index
  net: dsa: qca8k: Add support for QCA8334 switch
  PCI: pciehp: Request control of native hotplug only if supported
  bpf: powerpc64: pad function address loads with NOPs
  pinctrl: at91-pio4: add missing of_node_put
  powerpc/8xx: fix invalid register expression in head_8xx.S
  spi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCAC
  powerpc: Add __printf verification to prom_printf
  powerpc/powermac: Mark variable x as unused
  powerpc/powermac: Add missing prototype for note_bootable_part()
  powerpc/chrp/time: Make some functions static, add missing header include
  powerpc/32: Add a missing include header
  ath: Add regulatory mapping for Bahamas
  ath: Add regulatory mapping for Bermuda
  ath: Add regulatory mapping for Serbia
  ath: Add regulatory mapping for Tanzania
  ath: Add regulatory mapping for Uganda
  ath: Add regulatory mapping for APL2_FCCA
  ath: Add regulatory mapping for APL13_WORLD
  ath: Add regulatory mapping for ETSI8_WORLD
  ath: Add regulatory mapping for FCC3_ETSIC
  nvme-pci: Fix AER reset handling
  nvme-rdma: stop admin queue before freeing it
  PCI: Prevent sysfs disable of device while driver is attached
  PM / wakeup: Make s2idle_lock a RAW_SPINLOCK
  x86/microcode: Make the late update update_lock a raw lock for RT
  btrfs: qgroup: Finish rescan when hit the last leaf of extent tree
  btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
  Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  Btrfs: don't return ino to ino cache if inode item removal fails
  media: videobuf2-core: don't call memop 'finish' when queueing
  media: tw686x: Fix incorrect vb2_mem_ops GFP flags
  net: hns3: Fixes the init of the VALID BD info in the descriptor
  wlcore: sdio: check for valid platform device data before suspend
  mwifiex: handle race during mwifiex_usb_disconnect
  mfd: cros_ec: Fail early if we cannot identify the EC
  ASoC: dpcm: fix BE dai not hw_free and shutdown
  Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011
  Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning
  iwlwifi: pcie: fix race in Rx buffer allocator
  btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
  PCI: Fix devm_pci_alloc_host_bridge() memory leak
  selftests: intel_pstate: return Kselftest Skip code for skipped tests
  selftests: memfd: return Kselftest Skip code for skipped tests
  selftests/intel_pstate: Improve test, minor fixes
  perf/x86/intel/uncore: Correct fixed counter index check for NHM
  perf/x86/intel/uncore: Correct fixed counter index check in generic code
  usbip: dynamically allocate idev by nports found in sysfs
  usbip: usbip_detach: Fix memory, udev context and udev leak
  block, bfq: remove wrong lock in bfq_requests_merged
  f2fs: fix race in between GC and atomic open
  f2fs: fix to detect failure of dquot_initialize
  f2fs: Fix deadlock in shutdown ioctl
  f2fs: fix to wait page writeback during revoking atomic write
  f2fs: fix to don't trigger writeback during recovery
  f2fs: fix error path of move_data_page
  disable loading f2fs module on PAGE_SIZE > 4KB
  pnfs: Don't release the sequence slot until we've processed layoutget on open
  netfilter: nf_tables: check msg_type before nft_trans_set(trans)
  lightnvm: pblk: warn in case of corrupted write buffer
  RDMA/mad: Convert BUG_ONs to error flows
  powerpc/64s: Fix compiler store ordering to SLB shadow area
  hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common()
  powerpc/eeh: Fix use-after-release of EEH driver
  powerpc/64s: Add barrier_nospec
  powerpc/lib: Adjust .balign inside string functions for PPC32
  infiniband: fix a possible use-after-free bug
  e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
  ceph: fix alignment of rasize
  bpf, arm32: fix inconsistent naming about emit_a32_lsr_{r64,i64}
  printk: drop in_nmi check from printk_safe_flush_on_panic()
  watchdog: da9063: Fix updating timeout value
  irqchip/ls-scfg-msi: Map MSIs in the iommu
  netfilter: ipset: List timing out entries with "timeout 1" instead of zero
  netfilter: ipset: forbid family for hash:mac sets
  perf tools: Fix pmu events parsing rule
  rtc: ensure rtc_set_alarm fails when alarms are not supported
  mm/slub.c: add __printf verification to slab_err()
  mm: vmalloc: avoid racy handling of debugobjects in vunmap
  mm: /proc/pid/pagemap: hide swap entries from unprivileged users
  kernel/hung_task.c: show all hung tasks before panic
  vfio/type1: Fix task tracking for QEMU vCPU hotplug
  vfio/mdev: Check globally for duplicate devices
  vfio: platform: Fix reset module leak in error path
  nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
  NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY
  ALSA: fm801: add error handling for snd_ctl_add
  ALSA: emu10k1: add error handling for snd_ctl_add
  skip LAYOUTRETURN if layout is invalid
  hv_netvsc: fix network namespace issues with VF support
  xen/netfront: raise max number of slots in xennet_get_responses()
  kcov: ensure irq code sees a valid area
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting
  arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
  tracing: Quiet gcc warning about maybe unused link variable
  tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
  kthread, tracing: Don't expose half-written comm when creating kthreads
  tracing: Fix possible double free in event_enable_trigger_func()
  tracing: Fix double free of event_trigger_data
  delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
  kvm, mm: account shadow page tables to kmemcg
  Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
  Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
  Input: elan_i2c - add ACPI ID for lenovo ideapad 330
  spi: spi-s3c64xx: Fix system resume support
  drivers/infiniband/ulp/srpt/ib_srpt.c: fix build with gcc-4.4.4
  IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()
  drivers/infiniband/core/verbs.c: fix build with gcc-4.4.4
  RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access
  i2c: core: decrease reference count of device node in i2c_unregister_device
  fork: unconditionally clear stack on fork
  Linux 4.14.59
  turn off -Wattribute-alias
  can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode
  can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only
  can: xilinx_can: fix RX overflow interrupt not being enabled
  can: xilinx_can: fix incorrect clear of non-processed interrupts
  can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
  can: xilinx_can: fix device dropping off bus on RX overrun
  can: xilinx_can: fix recovery from error states not being propagated
  can: xilinx_can: fix power management handling
  can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
  driver core: Partially revert "driver core: correct device's shutdown order"
  usb: gadget: f_fs: Only return delayed status when len is 0
  usb: dwc2: Fix DMA alignment to start at allocated boundary
  usb: core: handle hub C_PORT_OVER_CURRENT condition
  usb: cdc_acm: Add quirk for Castles VEGA3000
  staging: speakup: fix wraparound in uaccess length check
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  tcp: do not delay ACK in DCTCP upon CE status change
  tcp: do not cancel delay-AcK on DCTCP special ACK
  tcp: helpers to send special DCTCP ack
  tcp: fix dctcp delayed ACK schedule
  vxlan: fix default fdb entry netlink notify ordering during netdev create
  vxlan: make netlink notify in vxlan_fdb_destroy optional
  vxlan: add new fdb alloc and create helpers
  rtnetlink: add rtnl_link_state check in rtnl_configure_link
  sock: fix sg page frag coalescing in sk_alloc_sg
  net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv
  multicast: do not restore deleted record source filter mode to new one
  net/ipv6: Fix linklocal to global address with VRF
  net/mlx5e: Fix quota counting in aRFS expire flow
  net/mlx5e: Don't allow aRFS for encapsulated packets
  net/mlx5: Adjust clock overflow work period
  net: skb_segment() should not return NULL
  net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
  ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
  ip: hash fragments consistently
  bonding: set default miimon value for non-arp modes if not set
  drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs
  drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  xen/PVH: Set up GS segment for stack canary
  MIPS: Fix off-by-one in pci_resource_to_user()
  MIPS: ath79: fix register address in ath79_ddr_wb_flush()
  Revert "cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting"
  ANDROID: verity: really fix android-verity Kconfig
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  x86_64_cuttlefish_defconfig: Enable android-verity
  x86_64_cuttlefish_defconfig: enable verity cert
  ANDROID: android-verity: Fix broken parameter handling.
  ANDROID: android-verity: Make it work with newer kernels
  ANDROID: android-verity: Add API to verify signature with builtin keys.
  ANDROID: verity: fix android-verity Kconfig dependencies
  Linux 4.14.58
  xhci: Fix perceived dead host due to runtime suspend race with event handler
  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
  cxl_getfile(): fix double-iput() on alloc_file() failures
  alpha: fix osf_wait4() breakage
  net: usb: asix: replace mii_nway_restart in resume path
  ipv6: make DAD fail with enhanced DAD when nonce length differs
  net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite
  net/mlx4_en: Don't reuse RX page when XDP is set
  hv_netvsc: Fix napi reschedule while receive completion is busy
  tg3: Add higher cpu clock for 5762.
  qmi_wwan: add support for Quectel EG91
  ptp: fix missing break in switch
  net: phy: fix flag masking in __set_phy_supported
  net/ipv4: Set oif in fib_compute_spec_dst
  skbuff: Unconditionally copy pfmemalloc in __skb_clone()
  net: Don't copy pfmemalloc flag in __copy_skb_header()
  net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort
  lib/rhashtable: consider param->min_size when setting initial table size
  ipv6: ila: select CONFIG_DST_CACHE
  ipv6: fix useless rol32 call on hash
  ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
  gen_stats: Fix netlink stats dumping in the presence of padding
  drm/nouveau: Avoid looping through fake MST connectors
  drm/nouveau: Use drm_connector_list_iter_* for iterating connectors
  drm/i915: Fix hotplug irq ack on i965/g4x
  stop_machine: Disable preemption when waking two stopper threads
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  vfio/pci: Fix potential Spectre v1
  cpufreq: intel_pstate: Register when ACPI PCCH is present
  mm/huge_memory.c: fix data loss when splitting a file pmd
  mm: memcg: fix use after free in mem_cgroup_iter()
  ARC: mm: allow mprotect to make stack mappings executable
  ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs
  ARC: Fix CONFIG_SWAP
  ARCv2: [plat-hsdk]: Save accl reg pair by default
  ALSA: hda: add mute led support for HP ProBook 455 G5
  ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
  ALSA: rawmidi: Change resized buffers atomically
  fat: fix memory allocation failure handling of match_strdup()
  x86/MCE: Remove min interval polling limitation
  x86/events/intel/ds: Fix bts_interrupt_threshold alignment
  x86/apm: Don't access __preempt_count with zeroed fs
  KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
  scsi: sd_zbc: Fix variable type and bogus comment
  ANDROID: uid_sys_stats: Replace tasklist lock with RCU in uid_cputime_show
  Linux 4.14.57
  string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
  arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
  arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
  arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
  arm64: KVM: Add HYP per-cpu accessors
  arm64: ssbd: Add prctl interface for per-thread mitigation
  arm64: ssbd: Introduce thread flag to control userspace mitigation
  arm64: ssbd: Restore mitigation status on CPU resume
  arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
  arm64: ssbd: Add global mitigation state accessor
  arm64: Add 'ssbd' command-line option
  arm64: Add ARCH_WORKAROUND_2 probing
  arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
  arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
  arm/arm64: smccc: Add SMCCC-specific return codes
  KVM: arm64: Avoid storing the vcpu pointer on the stack
  KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state
  arm64: alternatives: Add dynamic patching feature
  KVM: arm64: Stop save/restoring host tpidr_el1 on VHE
  arm64: alternatives: use tpidr_el2 on VHE hosts
  KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
  KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation
  KVM: arm64: Store vcpu on the stack during __guest_enter()
  net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
  rds: avoid unenecessary cong_update in loop transport
  bdi: Fix another oops in wb_workfn()
  netfilter: ipv6: nf_defrag: drop skb dst before queueing
  nsh: set mac len based on inner packet
  autofs: fix slab out of bounds read in getname_kernel()
  tls: Stricter error checking in zerocopy sendmsg path
  KEYS: DNS: fix parsing multiple options
  reiserfs: fix buffer overflow with long warning messages
  netfilter: ebtables: reject non-bridge targets
  PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
  block: do not use interruptible wait anywhere
  mtd: rawnand: denali_dt: set clk_x_rate to 200 MHz unconditionally
  crypto: af_alg - Initialize sg_num_bytes in error code path
  clocksource: Initialize cs->wd_list
  media: rc: oops in ir_timer_keyup after device unplug
  xhci: Fix USB3 NULL pointer dereference at logical disconnect.
  net: lan78xx: Fix race in tx pending skb size calculation
  rtlwifi: rtl8821ae: fix firmware is not ready to run
  rtlwifi: Fix kernel Oops "Fw download fail!!"
  net: cxgb3_main: fix potential Spectre v1
  VSOCK: fix loopback on big-endian systems
  vhost_net: validate sock before trying to put its fd
  tcp: prevent bogus FRTO undos with non-SACK flows
  tcp: fix Fast Open key endianness
  strparser: Remove early eaten to fix full tcp receive buffer stall
  stmmac: fix DMA channel hang in half-duplex mode
  r8152: napi hangup fix after disconnect
  qmi_wwan: add support for the Dell Wireless 5821e module
  qed: Limit msix vectors in kdump kernel to the minimum required count.
  qed: Fix use of incorrect size in memcpy call.
  qed: Fix setting of incorrect eswitch mode.
  qede: Adverstise software timestamp caps when PHC is not available.
  net/tcp: Fix socket lookups with SO_BINDTODEVICE
  net: sungem: fix rx checksum support
  net_sched: blackhole: tell upper qdisc about dropped packets
  net/packet: fix use-after-free
  net: mvneta: fix the Rx desc DMA address in the Rx path
  net/mlx5: Fix wrong size allocation for QoS ETC TC regitster
  net/mlx5: Fix required capability for manipulating MPFS
  net/mlx5: Fix incorrect raw command length parsing
  net/mlx5: Fix command interface race in polling mode
  net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
  net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager
  net/mlx5e: Avoid dealing with vport representors if not being e-switch manager
  net: macb: Fix ptp time adjustment for large negative delta
  net: fix use-after-free in GRO with ESP
  net: dccp: switch rx_tstamp_last_feedback to monotonic clock
  net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
  ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
  ipvlan: fix IFLA_MTU ignored on NEWLINK
  ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
  hv_netvsc: split sub-channel setup into async and sync
  atm: zatm: Fix potential Spectre v1
  atm: Preserve value of skb->truesize when accounting to vcc
  alx: take rtnl before calling __alx_open from resume
  crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak
  crypto: crypto4xx - remove bad list_del
  PCI: exynos: Fix a potential init_clk_resources NULL pointer dereference
  bcm63xx_enet: do not write to random DMA channel on BCM6345
  bcm63xx_enet: correct clock usage
  ocfs2: ip_alloc_sem should be taken in ocfs2_get_block()
  ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent
  xprtrdma: Fix corner cases when handling device removal
  cpufreq / CPPC: Set platform specific transition_delay_us
  Btrfs: fix duplicate extents after fsync of file with prealloc extents
  x86/paravirt: Make native_save_fl() extern inline
  x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h>
  compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
  ANDROID: Add hold functionality to schedtune CPU boost
  ANDROID: sched/rt: Add schedtune accounting to rt task enqueue/dequeue
  UPSTREAM: cpuidle: menu: Avoid selecting shallow states with stopped tick
  UPSTREAM: cpuidle: menu: Refine idle state selection for running tick
  UPSTREAM: sched: idle: Select idle state before stopping the tick
  BACKPORT: time: hrtimer: Introduce hrtimer_next_event_without()
  BACKPORT: time: tick-sched: Split tick_nohz_stop_sched_tick()
  UPSTREAM: cpuidle: Return nohz hint from cpuidle_select()
  UPSTREAM: jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  UPSTREAM: sched: idle: Do not stop the tick before cpuidle_idle_call()
  BACKPORT: sched: idle: Do not stop the tick upfront in the idle loop
  BACKPORT: time: tick-sched: Reorganize idle tick management code
  ANDROID: sched/fair: fix a warning
  ANDROID: sched/walt: Fix compilation issue for x86_64
  ANDROID: mnt: Fix next_descendent
  ANDROID: sched/events: Introduce util_est trace events
  ANDROID: sched/fair: schedtune: update before schedutil
  FROMLIST: sched/fair: add support to tune PELT ramp/decay timings
  BACKPORT: sched/fair: Update util_est before updating schedutil
  BACKPORT: sched/fair: Update util_est only on util_avg updates
  BACKPORT: sched/fair: Use util_est in LB and WU paths
  BACKPORT: sched/fair: Add util_est on top of PELT
  ANDROID: sched/fair: Cleanup cpu_util{_wake}()
  ANDROID: sched: Update max cpu capacity in case of max frequency constraints
  ANDROID: arm: enable max frequency capping
  ANDROID: arm64: enable max frequency capping
  ANDROID: implement max frequency capping
  ANDROID: sched/fair: add arch scaling function for max frequency capping
  ANDROID: trace: Add WALT util signal to trace event sched_load_cfs_rq
  ANDROID: sched, trace: Remove trace event sched_load_avg_cpu
  ANDROID: Rename and move include/linux/sched_energy.h
  ANDROID: Adjust juno energy model
  ANDROID: Check equality of max cap state cap and cpu scale
  ANDROID: Move energy model init call into arch_topology driver
  ANDROID: Streamline sched_domain_energy_f functions
  ANDROID: Separate cpu_scale and energy model setup
  ANDROID: update_group_capacity for single cpu in cluster
  ANDROID: sched/fair: return idle CPU immediately for prefer_idle
  ANDROID: sched/fair: add idle state filter to prefer_idle case
  ANDROID: sched/fair: remove order from CPU selection
  ANDROID: sched/fair: unify spare capacity calculation
  ANDROID:sched/fair: prefer energy efficient CPUs for !prefer_idle tasks
  ANDROID: sched/fair: fix CPU selection for non latency sensitive tasks
  ANDROID: sched/fair: Also do misfit in overloaded groups
  ANDROID: sched/fair: Don't balance misfits if it would overload local group
  ANDROID: sched/fair: Attempt to improve throughput for asym cap systems
  FROMLIST: sched/fair: Don't move tasks to lower capacity cpus unless necessary
  FROMLIST: sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains
  FROMLIST: sched/core: Disable SD_ASYM_CPUCAPACITY for root_domains without asymmetry
  FROMLIST: sched/fair: Set rq->rd->overload when misfit
  FROMLIST: sched: Wrap rq->rd->overload accesses with READ/WRITE_ONCE
  FROMLIST: sched: Change root_domain->overload type to int
  FROMLIST: sched/fair: Change prefer_sibling type to bool
  FROMLIST: sched/fair: Consider misfit tasks when load-balancing
  FROMLIST: sched: Add sched_group per-cpu max capacity
  FROMLIST: sched/fair: Add group_misfit_task load-balance type
  FROMLIST: sched: Add static_key for asymmetric cpu capacity optimizations
  UPSTREAM: ANDROID: binder: change down_write to down_read
  UPSTREAM: ANDROID: binder: correct the cmd print for BINDER_WORK_RETURN_ERROR
  UPSTREAM: ANDROID: binder: remove 32-bit binder interface.
  UPSTREAM: android: binder: Use true and false for boolean values
  UPSTREAM: android: binder: Use octal permissions
  UPSTREAM: android: binder: Prefer __func__ to using hardcoded function name
  UPSTREAM: ANDROID: binder: make binder_alloc_new_buf_locked static and indent its arguments
  UPSTREAM: android: binder: Check for errors in binder_alloc_shrinker_init().

Conflicts:
	arch/arm64/Kconfig
	arch/arm64/include/asm/cpucaps.h
	arch/arm64/include/asm/cpufeature.h
	arch/arm64/include/asm/thread_info.h
	arch/arm64/kernel/cpu_errata.c
	arch/arm64/kernel/cpufeature.c
	arch/arm64/kernel/entry.S
	arch/arm64/kernel/ssbd.c
	drivers/base/arch_topology.c
	drivers/md/Kconfig
	drivers/scsi/ufs/ufshcd.c
	drivers/usb/gadget/function/f_fs.c
	include/trace/events/sched.h
	kernel/sched/cpufreq_schedutil.c
	kernel/sched/energy.c
	kernel/sched/fair.c
	kernel/sched/features.h
	kernel/sched/sched.h
	kernel/sched/topology.c
	kernel/sched/tune.c
	kernel/sched/walt.c
	kernel/sched/walt.h
	kernel/stop_machine.c
	kernel/time/tick-sched.c
	net/socket.c
	sound/core/rawmidi.c

Change-Id: Ia246711317930ecd55bb42565a04e6b4fdfc26d2
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-08-09 11:57:44 -07:00
Rafael J. Wysocki
6277dd586f BACKPORT: time: hrtimer: Introduce hrtimer_next_event_without()
The next set of changes will need to compute the time to the next
hrtimer event over all hrtimers except for the scheduler tick one.

To that end introduce a new helper function,
hrtimer_next_event_without(), for computing the time until the next
hrtimer event over all timers except for one and modify the underlying
code in __hrtimer_next_event_base() to prepare it for being called by
that new function.

No intentional code behavior changes.

Cherry-picked from a59855cd8c613ba4bb95147f6176360d95f75e60

 - Fixed conflict with tick_nohz_stopped_cpu not appearing in the
   chunk context. Trivial fix.

Change-Id: I0dae9e08e19559efae9697800738c5522ab5933f
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Paven Kondati <pkondeti@qti.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-07-19 19:09:14 +00:00
Viresh Kumar
a67f648836 hrtimer: create hrtimer_quiesce_cpu() to isolate CPU from hrtimers
To isolate CPUs (isolate from hrtimers) from sysfs using cpusets, we need some
support from the hrtimer core. i.e. A routine hrtimer_quiesce_cpu() which would
migrate away all the unpinned hrtimers, but shouldn't touch the pinned ones.

This patch creates this routine.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: d4d50a0ddc35e58ee95137ba4d14e74fea8b682f
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-4.9]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>

Change-Id: Icf93622bdeed9c6c28eb1aca1637bc5b9522a533
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2018-01-22 09:09:06 -08:00
Viresh Kumar
06d4745936 hrtimer: update timer->state with 'pinned' information
'Pinned' information would be required in migrate_hrtimers() now, as we can
migrate non-pinned timers away without a hotplug (i.e. with cpuset.quiesce). And
so we may need to identify pinned timers now, as we can't migrate them.

This patch reuses the timer->state variable for setting this flag as there were
enough number of free bits available in this variable. And there is no point
increasing size of this struct by adding another field.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: 62feaf1ed0b64c04868d143d8bdb92d60dc3189b
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[markivx: Port to 4.14, fix merge conflicts due to lack of timer stats code]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>

Change-Id: Id2cdd74903598e31fab6999c4cbab9c2fd17fe23
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2018-01-22 09:09:03 -08:00
Deepa Dinamani
c0edd7c9ac nanosleep: Use get_timespec64() and put_timespec64()
Usage of these apis and their compat versions makes
the syscalls: clock_nanosleep and nanosleep and
their compat implementations simpler.

This is a preparatory patch to isolate data conversions to
struct timespec64 at userspace boundaries. This helps contain
the changes needed to transition to new y2038 safe types.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-30 04:14:14 -04:00
Thomas Gleixner
938e7cf2d5 posix-timers: Make nanosleep timespec argument const
No nanosleep implementation modifies the rqtp argument. Mark is const.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
2017-06-14 00:00:47 +02:00
Al Viro
fb923c4a3c posix-timers: Kill ->nsleep_restart()
No more users.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170607084241.28657-8-viro@ZenIV.linux.org.uk
2017-06-14 00:00:42 +02:00
Al Viro
ce41aaf47a hrtimers/posix-timers: Merge nanosleep timespec copyout logics into a new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170607084241.28657-7-viro@ZenIV.linux.org.uk
2017-06-14 00:00:42 +02:00
Al Viro
192a82f900 hrtimer_nanosleep(): Pass rmtp in restart_block
Store the pointer to the timespec which gets updated with the remaining
time in the restart block and remove the function argument.

[ tglx: Added changelog ]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170607084241.28657-3-viro@ZenIV.linux.org.uk
2017-06-14 00:00:40 +02:00
Deepa Dinamani
ad19638463 time: Change k_clock nsleep() to use timespec64
struct timespec is not y2038 safe on 32 bit machines.  Replace uses of
struct timespec with struct timespec64 in the kernel.

The syscall interfaces themselves will be changed in a separate series.

Note that the restart_block parameter for nanosleep has also been left
unchanged and will be part of syscall series noted above.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: y2038@lists.linaro.org
Cc: john.stultz@linaro.org
Cc: arnd@arndb.de
Link: http://lkml.kernel.org/r/1490555058-4603-8-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 21:49:56 +02:00
Stephen Boyd
016da20148 hrtimer: Remove hrtimer_peek_ahead_timers() leftovers
This function was removed in commit c6eb3f70d4 (hrtimer: Get rid of
hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted.

Delete it now.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/20170317010814.2591-1-sboyd@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-17 18:24:58 +01:00
Ingo Molnar
283cb90305 sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
In our quest to simplify <linux/sched.h>'s header dependencies, remove
the <linux/wait.h> inclusion from <linux/hrtimer.h> - which does
not appear to be necessary, as hrtimer.h does not use waitqueues.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-03 01:45:40 +01:00
Kees Cook
dfb4357da6 time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

        SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

 #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10 11:15:08 +01:00
Thomas Gleixner
2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Thomas Gleixner
27590dc17b hrtimer: Convert to hotplug state machine
Split out the clockevents callbacks instead of piggybacking them on
hrtimers.

This gets rid of a POST_DEAD user. See commit:

  54e88fad22 ("sched: Make sure timers have migrated before killing the migration_thread")

We just move the callback state to the proper place in the state machine.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.485419196@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:41:37 +02:00
John Stultz
da8b44d5a9 timer: convert timer_slack_ns from unsigned long to u64
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).

The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems.  It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.

The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.

With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:

$ time sleep 1

real    0m10.747s
user    0m0.001s
sys     0m0.005s

The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s.  Let me
know if it makes sense to break that up more or not.

Other than that things are fairly straightforward.

This patch (of 2):

The timer_slack_ns value in the task struct is currently a unsigned
long.  This means that on 32bit applications, the maximum slack is just
over 4 seconds.  However, on 64bit machines, its much much larger (~500
years).

This disparity could make application development a little (as well as
the default_slack) to a u64.  This means both 32bit and 64bit systems
have the same effective internal slack range.

Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.

This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Thomas Gleixner
203cbf77de hrtimer: Handle remaining time proper for TIME_LOW_RES
If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.

Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.

Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.

Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-17 11:13:55 +01:00
Thomas Gleixner
683be13a28 timer: Minimize nohz off overhead
If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.

Before:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.73%  hog       [.] main
  15.36%  [kernel]  [k] _raw_spin_lock_irqsave
   9.77%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.61%  [kernel]  [k] lock_timer_base.isra.38
   6.42%  [kernel]  [k] mod_timer
   3.90%  [kernel]  [k] detach_if_pending
   3.76%  [kernel]  [k] del_timer
   2.41%  [kernel]  [k] internal_add_timer
   1.39%  [kernel]  [k] __internal_add_timer
   0.76%  [kernel]  [k] timerfn

We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 15:18:28 +02:00
Thomas Gleixner
bc7a34b8b9 timer: Reduce timer migration overhead if disabled
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.

We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.

The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.

With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.

Before:
  47.76%  hog       [.] main
  14.84%  [kernel]  [k] _raw_spin_lock_irqsave
   9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.71%  [kernel]  [k] mod_timer
   6.24%  [kernel]  [k] lock_timer_base.isra.38
   3.76%  [kernel]  [k] detach_if_pending
   3.71%  [kernel]  [k] del_timer
   2.50%  [kernel]  [k] internal_add_timer
   1.51%  [kernel]  [k] get_nohz_timer_target
   1.28%  [kernel]  [k] __internal_add_timer
   0.78%  [kernel]  [k] timerfn
   0.48%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu


Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 15:18:28 +02:00
Peter Zijlstra
887d9dc989 hrtimer: Allow hrtimer::function() to free the timer
Currently an hrtimer callback function cannot free its own timer
because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
after it. Freeing the timer would result in a clear use-after-free.

Solve this by using a scheme similar to regular timers; track the
current running timer in hrtimer_clock_base::running.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 00:09:56 +02:00
Oleg Nesterov
c04dca02bc hrtimer: Remove HRTIMER_STATE_MIGRATE
I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally
confused it looks buggy and simply unneeded.

migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this
is not enough: this can fool, say, hrtimer_is_queued() in
dequeue_signal().

Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED?
This fixes the race and we can kill STATE_MIGRATE.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 00:09:56 +02:00
Borislav Petkov
d711b8b30c hrtimers: Make sure hrtimer_resolution is unsigned int
... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like
this one:

net/sched/sch_api.c: In function ‘psched_show’:
net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
      (u32)NSEC_PER_SEC / hrtimer_resolution);

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1433583000-32090-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
2015-06-08 15:46:06 +02:00
Thomas Gleixner
61699e1307 hrtimer: Remove hrtimer_start() return value
No user was ever interested whether the timer was active or not when
it was started. All abusers of the return value are gone, so get rid
of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.483556394@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
02a171af1a hrtimer: Make hrtimer_start() a inline wrapper
No point for an extra export just to set the extra argument of
hrtimer_start_range_ns() to 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.808544539@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00