Commit Graph

79 Commits

Author SHA1 Message Date
Blagovest Kolenichev
16b6ed19fc Merge android-4.9.87 (a290494) into msm-4.9
* refs/heads/tmp-a290494:
  Linux 4.9.87
  btrfs: preserve i_mode if __btrfs_set_acl() fails
  bpf, ppc64: fix out of bounds access in tail call
  bpf: add schedule points in percpu arrays management
  bpf, arm64: fix out of bounds access in tail call
  bpf, x64: implement retpoline for tail call
  bpf: fix mlock precharge on arraymaps
  bpf: fix wrong exposure of map_flags into fdinfo for lpm
  mpls, nospec: Sanitize array index in mpls_label_ok()
  net: mpls: Pull common label check into helper
  sctp: verify size of a new chunk in _sctp_make_chunk()
  s390/qeth: fix IPA command submission race
  s390/qeth: fix IP address lookup for L3 devices
  s390/qeth: fix double-free on IP add/remove race
  s390/qeth: fix IP removal on offline cards
  s390/qeth: fix overestimated count of buffer elements
  s390/qeth: fix SETIP command handling
  s390/qeth: fix underestimated count of buffer elements
  sctp: fix dst refcnt leak in sctp_v6_get_dst()
  tcp_bbr: better deal with suboptimal GSO
  rxrpc: Fix send in rxrpc_send_data_packet()
  tcp: Honor the eor bit in tcp_mtu_probe
  net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT
  mlxsw: spectrum_switchdev: Check success of FDB add operation
  sctp: fix dst refcnt leak in sctp_v4_get_dst
  udplite: fix partial checksum initialization
  ppp: prevent unregistered channels from connecting to PPP units
  netlink: ensure to loop over all netns in genlmsg_multicast_allns()
  net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68
  net: fix race on decreasing number of TX queues
  ipv6 sit: work around bogus gcc-8 -Wrestrict warning
  hdlc_ppp: carrier detect ok, don't turn off negotiation
  fib_semantics: Don't match route with mismatching tclassid
  bridge: check brport attr show in brport_show
  x86/apic/vector: Handle legacy irq data correctly
  netlink: put module reference if dump start fails
  md: only allow remove_and_add_spares when no sync_thread running.
  x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  x86/mm: Give each mm TLB flush generation a unique ID
  ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
  ARM: dts: LogicPD SOM-LV: Fix I2C1 pinmux
  dm io: fix duplicate bio completion due to missing ref count
  PCI/ASPM: Deal with missing root ports in link state handling
  KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
  KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
  KVM: mmu: Fix overlap between public and private memslots
  ARM: kvm: fix building with gcc-8
  ARM: mvebu: Fix broken PL310_ERRATA_753970 selects
  nospec: Allow index argument to have const-qualified type
  media: m88ds3103: don't call a non-initalized function
  x86/platform/intel-mid: Handle Intel Edison reboot correctly
  x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
  dax: fix vma_is_fsdax() helper
  cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
  parisc: Fix ordering of cache and TLB flushes
  timers: Forward timer base before migrating timers
  ALSA: hda - Fix pincfg at resume on Lenovo T470 dock
  ALSA: hda: Add a power_save blacklist
  ALSA: usb-audio: Add a quirck for B&W PX headphones
  tpm-dev-common: Reject too short writes
  tpm_tis_spi: Use DMA-safe memory for SPI transfers
  tpm: constify transmit data pointers
  tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
  tpm_i2c_nuvoton: fix potential buffer overruns caused by bit glitches on the bus
  tpm_i2c_infineon: fix potential buffer overruns caused by bit glitches on the bus
  tpm: st33zp24: fix potential buffer overruns caused by bit glitches on the bus
  FROMLIST: ARM: amba: Don't read past the end of sysfs "driver_override" buffer
  UPSTREAM: ANDROID: binder: remove WARN() for redundant txn error

Conflicts:
	kernel/time/timer.c

Change-Id: I302546c52a480e9a4c661accf021766c499739b9
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2018-04-17 10:39:47 -07:00
Lingutla Chandrasekhar
13e75c74cd timers: Forward timer base before migrating timers
commit c52232a49e203a65a6e1a670cd5262f59e9364a0 upstream.

On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.

If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.

In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.

But there is a worse problem than that. The following sequence of events
illustrates it:

 - CPU0 timer1 is queued expires = 59969 and base->clk = 59131.

   The timer is queued at wheel level 2, with resulting expiry time = 60032
   (due to level granularity).

 - CPU1 enters idle @60007, with next timer expiry @60020.

 - CPU0 is hotplugged at @60009

 - CPU1 exits idle and runs the control thread which migrates the
   timers from CPU0

   timer1 is now queued in level 0 for immediate handling in the next
   softirq because the requested expiry time 59969 is before CPU1 base->clk
   60007

 - CPU1 runs code which forwards the base clock which succeeds because the
   next expiring timer. which was collected at idle entry time is still set
   to 60020.

   So it forwards beyond 60007 and therefore misses to expire the migrated
   timer1. That timer gets expired when the wheel wraps around again, which
   takes between 63 and 630ms depending on the HZ setting.

Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.

[ tglx: Massaged comment and changelog ]

Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Co-developed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:21:28 +01:00
Lingutla Chandrasekhar
ea6a32473b Revert "time: Run deferrable timers on other CPUs when tick_do_timer_cpu is busy"
Only Watchdog was using deferrable timers, but it is not using deferrable
any more. So reverting the change and let tick_do_timer_cpu run the
deferrable timers.

This reverts 'commit 4fe122d73b ("time: Run deferrable timers on
other CPUs when tick_do_timer_cpu is busy")'

Change-Id: I59fcd7cdb216230864cd948d242d563ade6437c8
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
2018-03-02 13:24:22 +05:30
Linux Build Service Account
b7e257f0a6 Merge "Merge android-4.9-o.80 (a9fd318) into msm-4.9" 2018-02-09 16:29:04 -08:00
Pavankumar Kondeti
4fe122d73b time: Run deferrable timers on other CPUs when tick_do_timer_cpu is busy
The deferrable timers processing is done by the tick_do_timer_cpu. If
softirqs are deferred to the ksoftirqd task on this CPU, it may take longer
time to process the deferrable timers under heavy load scenarios. Allow
deferrable timers to be processed on other CPUs when ksoftirqd is active
on the tick_do_timer_cpu.

Change-Id: Ic7b39415b5efd3239a28e41460708a3bcfdb47b2
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2018-02-06 12:18:17 +05:30
Blagovest Kolenichev
ec23108aed Merge android-4.9-o.78 (bf3b339) into msm-4.9
* refs/heads/tmp-bf3b339:
  Linux 4.9.78
  MIPS: AR7: ensure the port type's FCR value is used
  x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
  x86/pti: Document fix wrong index
  kprobes/x86: Disable optimizing on the function jumps to indirect thunk
  kprobes/x86: Blacklist indirect thunk functions for kprobes
  retpoline: Introduce start/end markers of indirect thunk
  x86/mce: Make machine check speculation protected
  usbip: fix warning in vhci_hcd_probe/lockdep_init_map
  x86/cpu, x86/pti: Do not enable PTI on AMD processors
  arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
  dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6
  dm btree: fix serious bug in btree_split_beneath()
  workqueue: avoid hard lockups in show_workqueue_state()
  libata: apply MAX_SEC_1024 to all LITEON EP1 series devices
  proc: fix coredump vs read /proc/*/stat race
  scripts/gdb/linux/tasks.py: fix get_thread_info
  can: peak: fix potential bug in packet fragmentation
  ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7
  ARM: sunxi_defconfig: Enable CMA
  phy: work around 'phys' references to usb-nop-xceiv devices
  tracing: Fix converting enum's from the map in trace_event_eval_update()
  Input: twl4030-vibra - fix sibling-node lookup
  Input: twl6040-vibra - fix child-node lookup
  Input: 88pm860x-ts - fix child-node lookup
  Input: ALPS - fix multi-touch decoding on SS4 plus touchpads
  perf tools: Fix build with ARCH=x86_64
  x86/apic/vector: Fix off by one in error path
  pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
  x86/tsc: Fix erroneous TSC rate on Skylake Xeon
  x86/mm/pkeys: Fix fill_sig_info_pkey
  module: Add retpoline tag to VERMAGIC
  x86/cpufeature: Move processor tracing out of scattered features
  objtool: Improve error message for bad file argument
  x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
  x86/retpoline: Fill RSB on context switch for affected CPUs
  sched/deadline: Zero out positive runtime after throttling constrained tasks
  scsi: hpsa: fix volume offline state
  iser-target: Fix possible use-after-free in connection establishment error
  af_key: fix buffer overread in parse_exthdrs()
  af_key: fix buffer overread in verify_address_len()
  timers: Unconditionally check deferrable base
  ALSA: hda - Apply the existing quirk to iMac 14,1
  ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
  ALSA: pcm: Remove yet superfluous WARN_ON()
  ALSA: seq: Make ioctls race-free
  futex: Prevent overflow by strengthen input validation
  scsi: sg: disable SET_FORCE_LOW_DMA
  libnvdimm, btt: Fix an incompatibility in the log layout
  FROMLIST: arm64: kpti: Fix the interaction between ASID switching and software PAN
  FROMLIST: arm64: Move post_ttbr_update_workaround to C code

Conflicts:
	arch/arm64/include/asm/efi.h
	arch/arm64/include/asm/mmu_context.h
	arch/arm64/mm/context.c
	drivers/scsi/sg.c
	kernel/workqueue.c

Change-Id: Icbdef53178fe3e325386cbae73edea918d23f519
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2018-01-24 08:33:10 -08:00
Thomas Gleixner
676109b28c timers: Unconditionally check deferrable base
commit ed4bbf7910b28ce3c691aef28d245585eaabda06 upstream.

When the timer base is checked for expired timers then the deferrable base
must be checked as well. This was missed when making the deferrable base
independent of base::nohz_active.

Fixes: ced6d5c11d3e ("timers: Use deferrable base independent of base::nohz_active")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: rt@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:57:04 +01:00
Blagovest Kolenichev
42d425962e Merge android-4.9-o.74 (127372f) into msm-4.9
* refs/heads/tmp-127372f:
  Linux 4.9.74
  mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
  tty: fix tty_ldisc_receive_buf() documentation
  n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
  x86/smpboot: Remove stale TLB flush invocations
  nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
  timers: Reinitialize per cpu bases on hotplug
  timers: Invoke timer_start_debug() where it makes sense
  timers: Use deferrable base independent of base::nohz_active
  usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
  USB: Fix off by one in type-specific length check of BOS SSP capability
  usb: add RESET_RESUME for ELSA MicroLink 56K
  usb: Add device quirk for Logitech HD Pro Webcam C925e
  USB: serial: option: adding support for YUGA CLM920-NC5
  USB: serial: option: add support for Telit ME910 PID 0x1101
  USB: serial: qcserial: add Sierra Wireless EM7565
  USB: serial: ftdi_sio: add id for Airbus DS P8GR
  usbip: vhci: stop printing kernel pointer addresses in messages
  usbip: stub: stop printing kernel pointer addresses in messages
  usbip: prevent leaking socket pointer address in messages
  usbip: fix usbip bind writing random string after command in match_busid
  s390/qeth: update takeover IPs after configuration change
  s390/qeth: lock IP table while applying takeover changes
  s390/qeth: don't apply takeover changes to RXIP
  s390/qeth: apply takeover changes when mode is toggled
  net/mlx5: Fix error flow in CREATE_QP command
  net/mlx5e: Prevent possible races in VXLAN control flow
  net/mlx5e: Add refcount to VXLAN structure
  net/mlx5e: Fix possible deadlock of VXLAN lock
  net/mlx5e: Fix features check of IPv6 traffic
  net/mlx5: Fix rate limit packet pacing naming and struct
  tcp: invalidate rate samples during SACK reneging
  sock: free skb in skb_complete_tx_timestamp on error
  net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
  net: Fix double free and memory corruption in get_net_ns_by_id()
  net: fec: Allow reception of frames bigger than 1522 bytes
  net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
  ipv4: Fix use-after-free when flushing FIB tables
  adding missing rcu_read_unlock in ipxip6_rcv
  sctp: Replace use of sockets_allocated with specified macro.
  net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
  net: ipv4: fix for a race condition in raw_sendmsg
  tg3: Fix rx hang on MTU change with 5717/5719
  tcp md5sig: Use skb's saddr when replying to an incoming segment
  tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
  RDS: Check cmsg_len before dereferencing CMSG_DATA
  ptr_ring: add barriers
  net: reevalulate autoflowlabel setting after sysctl setting
  net: qmi_wwan: add Sierra EM7565 1199:9091
  netlink: Add netns check on taps
  net: igmp: Use correct source address on IGMPv3 reports
  net: fec: unmap the xmit buffer that are not transferred by DMA
  ipv6: mcast: better catch silly mtu values
  ipv4: igmp: guard against silly MTU values
  kbuild: add '-fno-stack-check' to kernel build options
  x86/mm/64: Fix reboot interaction with CR4.PCIDE
  x86/mm: Enable CR4.PCIDE on supported systems
  x86/mm: Add the 'nopcid' boot option to turn off PCID
  x86/mm: Disable PCID on 32-bit kernels
  x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
  x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
  x86/mm: Make flush_tlb_mm_range() more predictable
  x86/mm: Remove flush_tlb() and flush_tlb_current_task()
  x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
  ALSA: hda - fix headset mic detection issue on a Dell machine
  ALSA: hda: Drop useless WARN_ON()
  ASoC: tlv320aic31xx: Fix GPIO1 register definition
  ASoC: twl4030: fix child-node lookup
  ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
  ASoC: da7218: fix fix child-node lookup
  ASoC: wm_adsp: Fix validation of firmware and coeff lengths
  iw_cxgb4: Only validate the MSN for successful completions
  ring-buffer: Mask out the info bits when returning buffer page length
  tracing: Fix crash when it fails to alloc ring buffer
  tracing: Fix possible double free on failure of allocating trace buffer
  tracing: Remove extra zeroing out of the ring buffer page
  sync objtool's copy of x86-opcode-map.txt

Conflicts:
	include/linux/cpuhotplug.h
	kernel/time/timer.c

Change-Id: I0198e2b75715d13acd86237321966774cd6d9f1d
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2018-01-18 07:01:08 -08:00
Thomas Gleixner
249d4a9b32 timers: Reinitialize per cpu bases on hotplug
commit 26456f87aca7157c057de65c9414b37f1ab881d1 upstream.

The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.

Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.

Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:17 +01:00
Thomas Gleixner
574e543ff9 timers: Invoke timer_start_debug() where it makes sense
commit fd45bb77ad682be728d1002431d77b8c73342836 upstream.

The timer start debug function is called before the proper timer base is
set. As a consequence the trace data contains the stale CPU and flags
values.

Call the debug function after setting the new base and flags.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: rt@linutronix.de
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:17 +01:00
Anna-Maria Gleixner
d840687aa8 timers: Use deferrable base independent of base::nohz_active
commit ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd upstream.

During boot and before base::nohz_active is set in the timer bases, deferrable
timers are enqueued into the standard timer base. This works correctly as
long as base::nohz_active is false.

Once it base::nohz_active is set and a timer which was enqueued before that
is accessed the lock selector code choses the lock of the deferred
base. This causes unlocked access to the standard base and in case the
timer is removed it does not clear the pending flag in the standard base
bitmap which causes get_next_timer_interrupt() to return bogus values.

To prevent that, the deferrable timers must be enqueued in the deferrable
base, even when base::nohz_active is not set. Those deferrable timers also
need to be expired unconditional.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: rt@linutronix.de
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:35:17 +01:00
Maria Yu
219fe504c6 kernel: time: Fix low resolution timer not fire in 32bit case
Low resolution timer is not fired to run at given expired time
which is equal to this cpu's base clk time. This is caused by
32bit intergar overflow.
When pos_up is 0, and pos_down is -1, with unsigned add, it will
not seen pos_up + clk is bigger than pos_down + clk. So add
an cast to u64 to have the expected result.

Change-Id: I45777a1fd282d8f70ba94528b04fce2f0436d7e4
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
2017-11-09 15:45:27 +08:00
Kyle Yan
cd02e634b8 Merge remote-tracking branch '4.9/tmp-379e3b2' into 4.9
* 4.9/tmp-379e3b2:
  ANDROID: binder: fix transaction leak.
  ANDROID: binder: Add tracing for binder priority inheritance.
  Linux 4.9.53
  swiotlb-xen: implement xen_swiotlb_dma_mmap callback
  video: fbdev: aty: do not leak uninitialized padding in clk to userspace
  KVM: VMX: use cmpxchg64
  cxl: Fix driver use count
  KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
  KVM: VMX: do not change SN bit in vmx_update_pi_irte()
  timer/sysclt: Restrict timer migration sysctl values to 0 and 1
  gfs2: Fix debugfs glocks dump
  x86/fpu: Don't let userspace set bogus xcomp_bv
  x86/mm: Fix fault error path using unsafe vma pointer
  btrfs: prevent to set invalid default subvolid
  btrfs: propagate error to btrfs_cmp_data_prepare caller
  btrfs: fix NULL pointer dereference from free_reloc_roots()
  PCI: Fix race condition with driver_override
  etnaviv: fix gem object list corruption
  xfs: validate bdev support for DAX inode flag
  kvm: nVMX: Don't allow L2 to access the hardware CR8
  KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
  kvm/x86: Handle async PF in RCU read-side critical sections
  KVM: VMX: simplify and fix vmx_vcpu_pi_load
  KVM: VMX: avoid double list add with VT-d posted interrupts
  KVM: VMX: extract __pi_post_block
  arm64: fault: Route pte translation faults via do_translation_fault
  arm64: Make sure SPsel is always set
  seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
  selftests/seccomp: Support glibc 2.26 siginfo_t.h
  iw_cxgb4: put ep reference in pass_accept_req()
  iw_cxgb4: remove the stid on listen create failure
  bsg-lib: don't free job in bsg_prepare_job
  nl80211: check for the required netlink attributes presence
  vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets
  SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags
  SMB: Validate negotiate (to protect against downgrade) even if signing off
  SMB3: Warn user if trying to sign connection that authenticated as guest
  Fix SMB3.1.1 guest authentication to Samba
  PM: core: Fix device_pm_check_callbacks()
  s390/mm: fix write access check in gup_huge_pmd()
  powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS
  powerpc/tm: Flush TM only if CPU has TM feature
  powerpc/pseries: Fix parent_dn reference leak in add_dt_node()
  KEYS: prevent KEYCTL_READ on negative key
  KEYS: prevent creating a different user's keyrings
  KEYS: fix writing past end of user-supplied buffer in keyring_read()
  security/keys: rewrite all of big_key crypto
  security/keys: properly zero out sensitive key material in big_key
  crypto: talitos - fix hashing
  crypto: talitos - fix sha224
  crypto: talitos - Don't provide setkey for non hmac hashing algs.
  crypto: drbg - fix freeing of resources
  drm/radeon: disable hard reset in hibernate for APUs
  scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
  md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list
  md/raid5: fix a race condition in stripe batch
  tracing: Erase irqsoff trace with empty write
  tracing: Fix trace_pipe behavior for instance traces
  KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list
  KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
  genirq: Make sparse_irq_lock protect what it should protect
  mac80211: flush hw_roc_start work before cancelling the ROC
  mac80211_hwsim: Use proper TX power
  mac80211: fix VLAN handling with TXQs
  fs/proc: Report eip/esp in /prod/PID/stat for coredumping
  cifs: release auth_key.response for reconnect.
  cifs: release cifs root_cred after exit_cifs
  ANDROID: add script to fetch android kernel config fragments
  FROMLIST: binder: fix use-after-free in binder_transaction()
  UPSTREAM: ipv6: fib: Unlink replaced routes from their nodes
  Linux 4.9.52
  bcache: fix bch_hprint crash and improve output
  bcache: fix for gc and write-back race
  bcache: Correct return value for sysfs attach errors
  bcache: correct cache_dirty_target in __update_writeback_rate()
  bcache: do not subtract sectors_to_gc for bypassed IO
  bcache: Fix leak of bdev reference
  bcache: initialize dirty stripes in flash_dev_run()
  PM / devfreq: Fix memory leak when fail to register device
  media: uvcvideo: Prevent heap overflow when accessing mapped controls
  media: v4l2-compat-ioctl32: Fix timespec conversion
  s390/mm: fix race on mm->context.flush_mm
  s390/mm: fix local TLB flushing vs. detach of an mm address space
  net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()
  PCI: pciehp: Report power fault only once until we clear it
  PCI: shpchp: Enable bridge bus mastering if MSI is enabled
  ARC: Re-enable MMU upon Machine Check exception
  tracing: Apply trace_clock changes to instance max buffer
  tracing: Add barrier to trace_printk() buffer nesting modification
  ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
  ftrace: Fix selftest goto location on error
  scsi: qla2xxx: Fix an integer overflow in sysfs code
  scsi: qla2xxx: Correction to vha->vref_count timeout
  scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
  scsi: sg: factor out sg_fill_request_table()
  scsi: sg: off by one in sg_ioctl()
  scsi: sg: use standard lists for sg_requests
  scsi: sg: remove 'save_scat_len'
  scsi: storvsc: fix memory leak on ring buffer busy
  scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
  scsi: megaraid_sas: Check valid aen class range to avoid kernel panic
  scsi: megaraid_sas: set minimum value of resetwaittime to be 1 secs
  scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
  scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
  scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
  scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
  scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
  scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
  scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
  scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
  skd: Submit requests to firmware before triggering the doorbell
  skd: Avoid that module unloading triggers a use-after-free
  md/bitmap: disable bitmap_resize for file-backed bitmaps.
  block: Relax a check in blk_start_queue()
  powerpc: Fix DAR reporting when alignment handler faults
  ext4: fix quota inconsistency during orphan cleanup for read-only mounts
  ext4: fix incorrect quotaoff if the quota feature is enabled
  crypto: AF_ALG - remove SGL terminator indicator when chaining
  crypto: ccp - Fix XTS-AES-128 support on v5 CCPs
  MIPS: math-emu: <MADDF|MSUBF>.D: Fix accuracy (64-bit case)
  MIPS: math-emu: <MADDF|MSUBF>.S: Fix accuracy (32-bit case)
  MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Clean up "maddf_flags" enumeration
  MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of zero inputs
  MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix some cases of infinite inputs
  MIPS: math-emu: <MADDF|MSUBF>.<D|S>: Fix NaN propagation
  MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately
  MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
  MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
  MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
  MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
  MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
  MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
  Input: i8042 - add Gigabyte P57 to the keyboard reset table
  pinctrl/amd: save pin registers over suspend/resume
  tty: fix __tty_insert_flip_char regression
  tty: improve tty_insert_flip_char() slow path
  tty: improve tty_insert_flip_char() fast path
  IB/addr: Fix setting source address in addr6_resolve()
  drm/sun4i: Implement drm_driver lastclose to restore fbdev console
  IB/{qib, hfi1}: Avoid flow control testing for RDMA write operation
  orangefs: Don't clear SGID when inheriting ACLs
  mm: prevent double decrease of nr_reserved_highatomic
  NFSv4: Fix callback server shutdown
  SUNRPC: Refactor svc_set_num_threads()
  UPSTREAM: drm/atomic: Handle -EDEADLK with out-fences correctly
  UPSTREAM: sched/fair: Fix FTQ noise bench regression
  UPSTREAM: fib_rules: fix error return code
  UPSTREAM: ipv4: add missing initialization for flowi4_uid
  ANDROID: Squashfs: optimize reading uncompressed data
  ANDROID: Squashfs: implement .readpages()
  ANDROID: Squashfs: replace buffer_head with BIO
  ANDROID: Squashfs: refactor page_actor
  ANDROID: Squashfs: remove the FILE_CACHE option
  FROMLIST: android: binder: Don't get mm from task
  FROMLIST: android: binder: Remove unused vma argument
  FROMLIST: android: binder: Drop lru lock in isolate callback
  ANDROID: Use sk_uid to replace uid get from socket file
  ANDROID: nf: xt_qtaguid: fix handling for cases where tunnels are used.
  Revert "ANDROID: Use sk_uid to replace uid get from socket file"
  ANDROID: USB gadget: mtp: Fix hang in ioctl(MTP_RECEIVE_FILE) for WritePartialObject

Conflicts:
	drivers/android/binder_alloc.c
	drivers/media/v4l2-core/v4l2-compat-ioctl32.c
	drivers/scsi/sg.c
	drivers/usb/gadget/function/f_mtp.c
	net/netfilter/xt_qtaguid.c
	net/wireless/nl80211.c

Change-Id: I6af673bd4b920bb229fe238a3e96b2330fa18263
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
2017-10-17 14:48:18 -07:00
Myungho Jung
4c00015385 timer/sysclt: Restrict timer migration sysctl values to 0 and 1
commit b94bf594cf8ed67cdd0439e70fa939783471597a upstream.

timer_migration sysctl acts as a boolean switch, so the allowed values
should be restricted to 0 and 1.

Add the necessary extra fields to the sysctl table entry to enforce that.

[ tglx: Rewrote changelog ]

Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kazuhiro Hayashi <kazuhiro3.hayashi@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:44:04 +02:00
Kyle Yan
7d337cc7f9 Merge remote-tracking branch '4.9/tmp-85e1c01' into 4.9
* 4.9/tmp-85e1c01:
  Linux 4.9.48
  epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
  kvm: arm/arm64: Force reading uncached stage2 PGD
  drm/ttm: Fix accounting error when fail to get pages for pool
  xfrm: policy: check policy direction value
  lib/mpi: kunmap after finishing accessing buffer
  wl1251: add a missing spin_lock_init()
  CIFS: remove endian related sparse warning
  CIFS: Fix maximum SMB2 header size
  alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
  cpuset: Fix incorrect memory_pressure control file mapping
  cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
  ceph: fix readpage from fscache
  mm, madvise: ensure poisoned pages are removed from per-cpu lists
  mm, uprobes: fix multiple free of ->uprobes_state.xol_area
  crypto: algif_skcipher - only call put_page on referenced and used pages
  i2c: ismt: Return EMSGSIZE for block reads with bogus length
  i2c: ismt: Don't duplicate the receive length for block reads
  irqchip: mips-gic: SYNC after enabling GIC region
  ANDROID: fiq_debugger: Fix minor bug in code
  ANDROID: configs: remove requirement for CONFIG_SYNC
  FROMLIST: binder: fix an ret value override
  FROMLIST: binder: fix memory corruption in binder_transaction binder
  Linux 4.9.47
  lz4: fix bogus gcc warning
  scsi: sg: reset 'res_in_use' after unlinking reserved array
  scsi: sg: protect accesses to 'reserved' page array
  locking/spinlock/debug: Remove spinlock lockup detection code
  arm64: fpsimd: Prevent registers leaking across exec
  x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
  arm64: mm: abort uaccess retries upon fatal signal
  kvm: arm/arm64: Fix race in resetting stage2 PGD
  gcov: support GCC 7.1
  staging: wilc1000: simplify vif[i]->ndev accesses
  scsi: isci: avoid array subscript warning
  p54: memset(0) whole array
  FROMLIST: android: binder: Add page usage in binder stats
  FROMLIST: android: binder: Add shrinker tracepoints
  FROMLIST: android: binder: Add global lru shrinker to binder
  FROMLIST: android: binder: Move buffer out of area shared with user space
  FROMLIST: android: binder: Add allocator selftest
  FROMLIST: android: binder: Refactor prev and next buffer into a helper function
  android: android-base.config: enable IP6_NF_MATCH_RPFILTER
  Linux 4.9.46
  powerpc/mm: Ensure cpumask update is ordered
  ACPI: EC: Fix regression related to wrong ECDT initialization order
  ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal
  ACPI: ioapic: Clear on-stack resource before using it
  ntb: transport shouldn't disable link due to bogus values in SPADs
  ntb: ntb_test: ensure the link is up before trying to configure the mws
  ntb: no sleep in ntb_async_tx_submit
  NTB: ntb_test: fix bug printing ntb_perf results
  ntb_transport: fix bug calculating num_qps_mw
  ntb_transport: fix qp count bug
  Clarify (and fix) MAX_LFS_FILESIZE macros
  staging: rtl8188eu: add RNX-N150NUB support
  iio: hid-sensor-trigger: Fix the race with user space powering up sensors
  iio: imu: adis16480: Fix acceleration scale factor for adis16480
  ANDROID: binder: fix proc->tsk check.
  binder: Use wake up hint for synchronous transactions.
  binder: use group leader instead of open thread
  Revert "android: binder: Sanity check at binder ioctl"
  Bluetooth: bnep: fix possible might sleep error in bnep_session
  Bluetooth: cmtp: fix possible might sleep error in cmtp_session
  Bluetooth: hidp: fix possible might sleep error in hidp_session_thread
  netfilter: nat: fix src map lookup
  Revert "leds: handle suspend/resume in heartbeat trigger"
  net: sunrpc: svcsock: fix NULL-pointer exception
  x86/mm: Fix use-after-free of ldt_struct
  timers: Fix excessive granularity of new timers after a nohz idle
  perf/x86/intel/rapl: Make package handling more robust
  perf probe: Fix --funcs to show correct symbols for offline module
  perf/core: Fix group {cpu,task} validation
  ftrace: Check for null ret_stack on profile function graph entry function
  nfsd: Limit end of page list when decoding NFSv4 WRITE
  cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
  cifs: Fix df output for users with quota limits
  kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured
  tracing: Fix freeing of filter in create_filter() when set_str is false
  tracing: Fix kmemleak in tracing_map_array_free()
  tracing: Call clear_boot_tracer() at lateinit_sync
  drm: rcar-du: Fix H/V sync signal polarity configuration
  drm: rcar-du: Fix display timing controller parameter
  drm: rcar-du: Fix crash in encoder failure error path
  drm/atomic: If the atomic check fails, return its value first
  drm: Release driver tracking before making the object available again
  mm/memblock.c: reversed logic in memblock_discard()
  fork: fix incorrect fput of ->exe_file causing use-after-free
  mm/madvise.c: fix freeing of locked page with MADV_FREE
  i2c: designware: Fix system suspend
  mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled
  ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
  ALSA: firewire: fix NULL pointer dereference when releasing uninitialized data of iso-resource
  ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
  ALSA: core: Fix unexpected error at replacing user TLV
  ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets
  KVM: x86: block guest protection keys unless the host has them enabled
  KVM: s390: sthyi: fix specification exception detection
  KVM: s390: sthyi: fix sthyi inline assembly
  Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad
  Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
  Input: trackpoint - add new trackpoint firmware ID
  bpf/verifier: fix min/max handling in BPF_SUB
  bpf: fix mixed signed/unsigned derived min/max value bounds
  bpf, verifier: fix alu ops against map_value{, _adj} register types
  bpf: adjust verifier heuristics
  bpf, verifier: add additional patterns to evaluate_reg_imm_alu
  net_sched: fix order of queue length updates in qdisc_replace()
  net: sched: fix NULL pointer dereference when action calls some targets
  irda: do not leak initialized list.dev to userspace
  net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
  tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
  ipv6: repair fib6 tree in failure case
  ipv6: reset fn->rr_ptr when replacing route
  tipc: fix use-after-free
  sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
  nfp: fix infinite loop on umapping cleanup
  ipv4: better IP_MAX_MTU enforcement
  ptr_ring: use kmalloc_array()
  openvswitch: fix skb_panic due to the incorrect actions attrlen
  bpf: fix bpf_trace_printk on 32 bit archs
  net_sched: remove warning from qdisc_hash_add
  net_sched/sfq: update hierarchical backlog when drop packet
  ipv4: fix NULL dereference in free_fib_info_rcu()
  dccp: defer ccid_hc_tx_delete() at dismantle time
  dccp: purge write queue in dccp_destroy_sock()
  af_key: do not use GFP_KERNEL in atomic contexts
  sparc64: remove unnecessary log message
  ANDROID: NFC: st21nfca: Fix memory OOB and leak issues in connectivity events handler
  Linux 4.9.45
  usb: qmi_wwan: add D-Link DWM-222 device ID
  usb: optimize acpi companion search for usb port devices
  pids: make task_tgid_nr_ns() safe
  Sanitize 'move_pages()' permission checks
  genirq/ipi: Fixup checks against nr_cpu_ids
  genirq: Restore trigger settings in irq_modify_status()
  irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
  irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
  x86/asm/64: Clear AC on NMI entries
  xen-blkfront: use a right index when checking requests
  powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
  blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL
  xen: fix bio vec merging
  mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
  mm/mempolicy: fix use after free when calling get_mempolicy
  mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS
  mm: discard memblock data later
  ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
  ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
  ALSA: seq: 2nd attempt at fixing race creating a queue
  Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB
  Input: elan_i2c - add ELAN0608 to the ACPI table
  crypto: x86/sha1 - Fix reads beyond the number of blocks passed
  crypto: ixp4xx - Fix error handling path in 'aead_perform()'
  parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo
  audit: Fix use after free in audit_remove_watch_rule()
  netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister
  ANDROID: check dir value of xfrm_userpolicy_id
  ANDROID: NFC: Fix possible memory corruption when handling SHDLC I-Frame commands
  ANDROID: nfc: fdp: Fix possible buffer overflow in WCS4000 NFC driver
  ANDROID: NFC: st21nfca: Fix out of bounds kernel access when handling ATR_REQ
  ANDROID: usb: gadget: assign no-op request complete callbacks
  ANDROID: usb: gadget: configfs: fix null ptr in android_disconnect
  ANDROID: uid_sys_stats: Fix implicit declaration of get_cmdline()
  uid_sys_stats: log task io with a debug flag
  Linux 4.9.44
  MIPS: DEC: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression
  pinctrl: meson-gxbb: Add missing GPIODV_18 pin entry
  pinctrl: samsung: Remove bogus irq_[un]mask from resource management
  pinctrl: uniphier: fix WARN_ON() of pingroups dump on LD20
  pinctrl: uniphier: fix WARN_ON() of pingroups dump on LD11
  pinctrl: intel: merrifield: Correct UART pin lists
  pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
  pnfs/blocklayout: require 64-bit sector_t
  iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
  usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
  usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
  usb: core: unlink urbs from the tail of the endpoint's urb_list
  USB: Check for dropped connection before switching to full speed
  usb: renesas_usbhs: Fix UGCTRL2 value for R-Car Gen3
  usb: gadget: udc: renesas_usb3: Fix usb_gadget_giveback_request() calling
  uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
  staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING
  iio: light: tsl2563: use correct event code
  iio: accel: bmc150: Always restore device to normal mode after suspend-resume
  staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
  USB: hcd: Mark secondary HCD as dead if the primary one died
  usb: musb: fix tx fifo flush handling again
  USB: serial: pl2303: add new ATEN device id
  USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
  USB: serial: option: add D-Link DWM-222 device ID
  drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut
  drm/etnaviv: Fix off-by-one error in reloc checking
  nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
  mmc: mmc: correct the logic for setting HS400ES signal voltage
  nand: fix wrong default oob layout for small pages using soft ecc
  fuse: initialize the flock flag in fuse_file on allocation
  target: Fix node_acl demo-mode + uncached dynamic shutdown regression
  iscsi-target: Fix iscsi_np reset hung task during parallel delete
  iscsi-target: fix memory leak in iscsit_setup_text_cmd()
  mtd: nand: Fix timing setup for NANDs that do not support SET FEATURES
  xtensa: don't limit csum_partial export by CONFIG_NET
  xtensa: mm/cache: add missing EXPORT_SYMBOLs
  xtensa: fix cache aliasing handling code for WT cache
  futex: Remove unnecessary warning from get_futex_key
  mm: fix list corruptions on shmem shrinklist
  mm: ratelimit PFNs busy info message
  ANDROID: Use sk_uid to replace uid get from socket file
  Linux 4.9.43
  Revert "ARM: dts: sun8i: Support DTB build for NanoPi M1"
  KVM: arm/arm64: Handle hva aging while destroying the vm
  sparc64: Prevent perf from running during super critical sections
  udp: consistently apply ufo or fragmentation
  revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"
  revert "net: account for current skb length when deciding about UFO"
  packet: fix tp_reserve race in packet_set_ring
  igmp: Fix regression caused by igmp sysctl namespace code.
  net: avoid skb_warn_bad_offload false positives on UFO
  tcp: fastopen: tcp_connect() must refresh the route
  net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
  net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
  bpf, s390: fix jit branch offset related to ldimm64
  net: fix keepalive code vs TCP_FASTOPEN_CONNECT
  tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
  ppp: fix xmit recursion detection on ppp channels
  ppp: Fix false xmit recursion detect with two ppp devices
  Linux 4.9.42
  workqueue: implicit ordered attribute should be overridable
  net: phy: Fix PHY unbind crash
  net: account for current skb length when deciding about UFO
  ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
  net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroy
  mm: don't dereference struct page fields of invalid pages
  signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
  lib/Kconfig.debug: fix frv build failure
  mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
  ARM: 8632/1: ftrace: fix syscall name matching
  virtio_blk: fix panic in initialization error path
  nbd: blk_mq_init_queue returns an error code on failure, not NULL
  iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort
  ARM: dts: sunxi: Change node name for pwrseq pin on Olinuxino-lime2-emmc
  ARM: dts: sun8i: Support DTB build for NanoPi M1
  drm/virtio: fix framebuffer sparse warning
  scsi: qla2xxx: Get mutex lock before checking optrom_state
  clk/samsung: exynos542x: mark some clocks as critical
  ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
  phy state machine: failsafe leave invalid RUNNING state
  netfilter: use fwmark_reflect in nf_send_reset
  ASoC: rt5645: set sel_i2s_pre_div1 to 2
  spi: spi-axi: Free resources on error path
  x86/boot: Add missing declaration of string functions
  tg3: Fix race condition in tg3_get_stats64().
  net: phy: dp83867: fix irq generation
  sh_eth: R8A7740 supports packet shecksumming
  sh_eth: fix EESIPR values for SH77{34|63}
  wext: handle NULL extra data in iwe_stream_add_point better
  sparc64: Fix exception handling in UltraSPARC-III memcpy.
  sparc64: Measure receiver forward progress to avoid send mondo timeout
  xen-netback: correctly schedule rate-limited queues
  net: phy: Correctly process PHY_HALTED in phy_stop_machine()
  net/mlx5e: Schedule overflow check work to mlx5e workqueue
  net/mlx5e: Fix wrong delay calculation for overflow check scheduling
  net/mlx5e: Fix outer_header_zero() check size
  net/mlx5: Fix command bad flow on command entry allocation failure
  net/mlx5: Consider tx_enabled in all modes on remap
  sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
  sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
  dccp: fix a memleak for dccp_feat_init err process
  dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
  dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
  net: ethernet: nb8800: Handle all 4 RGMII modes identically
  ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
  packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
  openvswitch: fix potential out of bound access in parse_ct
  mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled
  rtnetlink: allocate more memory for dev_set_mac_address()
  ipv4: initialize fib_trie prior to register_netdev_notifier call.
  net: dsa: b53: Add missing ARL entries for BCM53125
  ipv6: avoid overflow of offset in ip6_find_1stfragopt
  net: Zero terminate ifr_name in dev_ifname().
  ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
  tcp_bbr: init pacing rate on first RTT sample
  tcp_bbr: remove sk_pacing_rate=0 transient during init
  tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helper
  tcp_bbr: introduce bbr_bw_to_pacing_rate() helper
  tcp_bbr: cut pacing rate only if filled pipe
  saa7164: fix double fetch PCIe access condition
  Btrfs: fix early ENOSPC due to delalloc
  f2fs: sanity check checkpoint segno and blkoff
  media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
  mmc: core: Use device_property_read instead of of_property_read
  mmc: dw_mmc: Use device_property_read instead of of_property_read
  iscsi-target: Fix initial login PDU asynchronous socket close OOPs
  media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
  ARM: dts: tango4: Request RGMII RX and TX clock delays
  ARM: dts: armada-38x: Fix irq type for pca955
  ext4: fix overflow caused by missing cast in ext4_resize_fs()
  ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
  gpiolib: skip unwanted events, don't convert them to opposite edge
  iommu/amd: Enable ga_log_intr when enabling guest_mode
  powerpc/64: Fix __check_irq_replay missing decrementer interrupt
  powerpc/tm: Fix saving of TM SPRs in core dump
  timers: Fix overflow in get_next_timer_interrupt
  mm/page_alloc: Remove kernel address exposure in free_reserved_area()
  KVM: async_pf: make rcu irq exit if not triggered from idle task
  ASoC: do not close shared backend dailink
  drm/amdgpu: Fix undue fallthroughs in golden registers initialization
  ALSA: hda - Fix speaker output from VAIO VPCL14M1R
  cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
  mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
  mmc: core: Fix access to HS400-ES devices
  device property: Make dev_fwnode() public
  mmc: sdhci-of-at91: force card detect value for non removable devices
  NFSv4: Fix EXCHANGE_ID corrupt verifier issue
  brcmfmac: fix memleak due to calling brcmf_sdiod_sgtable_alloc() twice
  iwlwifi: dvm: prevent an out of bounds access
  workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
  libata: array underflow in ata_find_dev()
  cgroup: fix error return value from cgroup_subtree_control()
  cgroup: create dfl_root files on subsys registration
  parisc: Handle vma's whose context is not current in flush_cache_range
  ANDROID: binder: don't queue async transactions to thread.
  ANDROID: binder: don't enqueue death notifications to thread todo.
  ANDROID: binder: call poll_wait() unconditionally.
  ANDROID: keychord: Fix for a memory leak in keychord.
  ANDROID: keychord: Fix races in keychord_write.
  android: configs: move quota-related configs to recommended
  ANDROID: sdcardfs: override credential for ioctl to lower fs
  ANDROID: xt_qtaguid: handle properly request sockets

Conflicts:
	drivers/staging/android/fiq_debugger/fiq_debugger.c
	include/linux/sched.h
	kernel/locking/spinlock_debug.c
	sound/soc/soc-pcm.c

Change-Id: I163a8c98f1737eeb01b9c8a0636a91d552ef349f
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
2017-09-07 14:32:09 -07:00
Nicholas Piggin
70b3fd5ce2 timers: Fix excessive granularity of new timers after a nohz idle
commit 2fe59f507a65dbd734b990a11ebc7488f6f87a24 upstream.

When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
  the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
  it is marked not-idle and before the timer tick on this CPU, where a
  timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

             max     avg      std
upstream   506.0    1.20     4.68
patched      2.0    1.08     0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

	     max     avg      std
upstream     9.0    1.05     0.19
patched      2.0    1.04     0.11

Fixes: Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: David Miller <davem@davemloft.net>
Cc: dzickus@redhat.com
Cc: sfr@canb.auug.org.au
Cc: mpe@ellerman.id.au
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linuxarm@huawei.com
Cc: abdhalee@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: akpm@linux-foundation.org
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-30 10:21:51 +02:00
Channagoud Kadabi
ce49c27f13 kernel: time: Fix accuracy for low resolution timer
timer wheel calculates the index for any timer based on the expiry
value and level granularity of the timer. Due to the level granularity
timer will not fire at the exact time instead expire at a time value
expires + granularity. This is done in the timer code when the index for
each timer is calculated based on the expiry and granularity at each
level:

 expires = (expires + LVL_GRAN(lvl)) >> LVL_SHIFT(lvl);

For devfreq drivers the requirement is to fire the timer at the exact
time. If the timer does not expire at the exact time then it'll take
much longer to react and increase the device frequency. Devfreq driver
registers timer for 10ms expiry and due to slack in timer code the
expirty happens at 20 ms. For eg: Frame rendering time is 16ms.
If devfreq driver reacts after 20ms instead of 10ms, that's
way past a frame rendering time.

Timers with 10ms to 630ms expiry fall under level 0, to overcome the
granularity issue for level 0 with low expirty values do not add the
granularity by introducing a new function calc_index_min_granularity.

With the above approach if the timer interrupt on the cpu is delayed for
a long time then there is a chance of missing the timer if base clk is
forwarded to jiffies. In order to account for this corner case modify
the nex_pending_bucket function to choose the least of expiry.
The next_pending_bucket starts at the index based on base clk value and
increments till the end of the bucket.

 ------------------------------
|   |   |            |   |     |
| 0 | 1 | -----------|   |  63 |
|   |   |            |   |     |
 ------------------------------
             ^
[start]      | [pos]       [end]

Above pos is the position based on the current base clk value, current
code looks for first pending timer from pos to end. But there is a
chance that there is pending timer from start to pos which could have
lesser expiry. Modify the implementation to pick the least of the
expiries.

Change-Id: I60f6f4394de4b5f409829de9734645e1a0d7659e
Signed-off-by: Channagoud Kadabi <ckadabi@codeaurora.org>
2017-08-29 13:57:12 -07:00
Matija Glavinic Pecotic
9ef8b23b94 timers: Fix overflow in get_next_timer_interrupt
commit 34f41c0316ed52b0b44542491d89278efdaa70e4 upstream.

For e.g. HZ=100, timer being 430 jiffies in the future, and 32 bit
unsigned int, there is an overflow on unsigned int right-hand side
of the expression which results with wrong values being returned.

Type cast the multiplier to 64bit to avoid that issue.

Fixes: 46c8f0b077 ("timers: Fix get_next_timer_interrupt() computation")
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Cc: khilman@baylibre.com
Cc: akpm@linux-foundation.org
Link: http://lkml.kernel.org/r/a7900f04-2a21-c9fd-67be-ab334d459ee5@nokia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11 08:49:30 -07:00
Prasad Sodagudi
602c4e27af sched: Add a check for cpu unbound deferrable timers
Add a check for cpu unbound deferrable timer expiry and raise
softirq for handling the expired timers so that the CPU can
process the cpu unbound deferrable times as early as possible
when a cpu tries to enter/exit idle loop.

Change-Id: Ieffa74fa22a4d25493f5590b5ac1e0d784fcbbad
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2017-07-17 19:23:35 -07:00
Kees Cook
a967be8dec time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

        SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

 #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Change-Id: I66e06ae2d6e32c309824310d3d9bf54d1047eab1
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Git-commit: dfb4357da6ddbdf57d583ba64361c9d792b0e0b1
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[ohaugan@codeaurora.org: Fixed merge conflicts]
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2017-05-11 13:26:42 -07:00
Vikram Mulukutla
78a643ea96 timer: Update code that migrates timers and hrtimers during isolation
__migrate_timers() can be called from both hotplug and isolation
contexts. When called from the isolation context, we might sometimes
encounter running timers. This is OK since the currently running or
just expired timers are off of the timer wheel and so everything else
can be migrated off.

In the case of hrtimers, we must wait until all callbacks have finished.
However, a udelay is important while waiting to allow for store-exclusive
fairness when run_hrtimer is attempting to grab the hrtimer base lock.

Change-Id: I4dccc66e09819a44b2f9597408a6a3ac4e11f5d7
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2017-04-24 11:38:30 -07:00
Kyle Yan
df06519aa2 timer: Fix incorrect parenthesis in timer
timer code was missing parenthesis in one of the checks to differentiate
between global deferrable timer and percpu deferrable timer.

Change-Id: I6894da3c3ca8ba01fe267d35aa22f2ec4303cd88
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
2017-03-13 17:11:21 -07:00
Kyle Yan
e980f1e966 timer: Initialize global deferrable timer
Initialize timer_base_deferrable variables properly along with
the initialization of the per cpu timers.

Change-Id: I14599cb6ab2fcc657edc7489ee1a55535183e3db
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
2017-03-07 15:04:20 -08:00
Kyle Yan
c1f109ce0a timer: Add a global deferrable timer
Add a global deferrable timer in addition to the per-cpu deferrable timer
to allow deferrable timers without the TIMER_PINNED flag to run
on any active CPU.

Change-Id: I8e6b77cef972589912ad18f324c46c936fbbb96f
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
2017-03-02 13:43:20 -08:00
Olav Haugan
0f3f78edd7 timer: Do not require CPUSETS to be enabled for migration
Do not require CPUSETS to be enabled to allow migration of timers and
hrtimers.

Change-Id: Ib911a0d34c250c4df020bdb265b92d2b8df8db93
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-4.9]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2017-02-07 14:50:20 -08:00
Santosh Shukla
e92935e2b4 timer: Add function to migrate timers
Add function to migrate timer that will be used by later patch set.

Change-Id: I370e404001344e635a663822b07557abbe0f6f52
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Updated commit text and fixed trivial merge conflict]
Git-commit: 3633b88d8fcb4273807574c27c328b6908a741e5
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-4.9]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2017-02-07 14:50:19 -08:00
Viresh Kumar
9536efe77a timer: create timer_quiesce_cpu() to isolate CPU from timers
To isolate CPUs (isolate from timers) from sysfs using cpusets, we need some
support from the timer core. i.e. A routine timer_quiesce_cpu() which would
migrates away all the unpinned timers, but shouldn't touch the pinned ones.

This patch creates this routine.

Change-Id: I8624e0659b86b7b8fa425a3fafdb0784fe005124
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4. Fixes for compilation error]
Git-commit: 313910b70ea0c73f8789d9189c11e1f339080646
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-4.9 and rebase to change
					patch dependency order.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2017-02-07 14:50:13 -08:00
Thomas Gleixner
6bad6bccf2 timers: Prevent base clock corruption when forwarding
When a timer is enqueued we try to forward the timer base clock. This
mechanism has two issues:

1) Forwarding a remote base unlocked

The forwarding function is called from get_target_base() with the current
timer base lock held. But if the new target base is a different base than
the current base (can happen with NOHZ, sigh!) then the forwarding is done
on an unlocked base. This can lead to corruption of base->clk.

Solution is simple: Invoke the forwarding after the target base is locked.

2) Possible corruption due to jiffies advancing

This is similar to the issue in get_net_timer_interrupt() which was fixed
in the previous patch. jiffies can advance between check and assignement
and therefore advancing base->clk beyond the next expiry value.

So we need to read jiffies into a local variable once and do the checks and
assignment with the local copy.

Fixes: a683f390b93f("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes <scoopta@gmail.com>
Reported-by: Michael Thayer <michael.thayer@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.253640125@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-25 16:32:50 +02:00
Thomas Gleixner
041ad7bc75 timers: Prevent base clock rewind when forwarding clock
Ashton and Michael reported, that kernel versions 4.8 and later suffer from
USB timeouts which are caused by the timer wheel rework.

This is caused by a bug in the base clock forwarding mechanism, which leads
to timers expiring early. The scenario which leads to this is:

run_timers()
  while (jiffies >= base->clk) {
    collect_expired_timers();
    base->clk++;
    expire_timers();
  }          

So base->clk = jiffies + 1. Now the cpu goes idle:

idle()
  get_next_timer_interrupt()
    nextevt = __next_time_interrupt();
    if (time_after(nextevt, base->clk))
       	base->clk = jiffies;

jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.

base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():

 jiffies = 0
 base->clk = 1

A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:

  base->clk + delta = 1 + 63 = 64

The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.

If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.

Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:

  if (time_after(jiffies, base->clk) {
    if (time_after(nextevt, base->clk))
       base->clk = jiffies;

jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.

get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.

Thanks to Ashton for bisecting and providing trace data!

Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes <scoopta@gmail.com>
Reported-by: Michael Thayer <michael.thayer@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-25 16:32:50 +02:00
Thomas Gleixner
4da9152a43 timers: Lock base for same bucket optimization
Linus stumbled over the unlocked modification of the timer expiry value in
mod_timer() which is an optimization for timers which stay in the same
bucket - due to the bucket granularity - despite their expiry time getting
updated.

The optimization itself still makes sense even if we take the lock, because
in case that the bucket stays the same, we avoid the pointless
queue/enqueue dance.

Make the check and the modification of timer->expires protected by the base
lock and shuffle the remaining code around so we can keep the lock held
when we actually have to requeue the timer to a different bucket.

Fixes: f00c0afdfa ("timers: Implement optimization for same expiry time in mod_timer()")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-10-25 16:27:39 +02:00
Thomas Gleixner
b831275a35 timers: Plug locking race vs. timer migration
Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.

While this has not been observed (yet), we need to make sure that it never
happens.

Fixes: 0eeda71bc3 ("timer: Replace timer base by a cpu index")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-10-25 16:27:39 +02:00
Emese Revfy
0766f788eb latent_entropy: Mark functions with __latent_entropy
The __latent_entropy gcc attribute can be used only on functions and
variables.  If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents.  The variable must
be an integer, an integer array type or a structure with integer fields.

These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.

Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: expanded commit message]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-10-10 14:51:45 -07:00
Chris Metcalf
46c8f0b077 timers: Fix get_next_timer_interrupt() computation
The tick_nohz_stop_sched_tick() routine is not properly
canceling the sched timer when nothing is pending, because
get_next_timer_interrupt() is no longer returning KTIME_MAX in
that case.  This causes periodic interrupts when none are needed.

When determining the next interrupt time, we first use
__next_timer_interrupt() to get the first expiring timer in the
timer wheel.  If no timer is found, we return the base clock value
plus NEXT_TIMER_MAX_DELTA to indicate there is no timer in the
timer wheel.

Back in get_next_timer_interrupt(), we set the "expires" value
by converting the timer wheel expiry (in ticks) to a nsec value.
But we don't want to do this if the timer wheel expiry value
indicates no timer; we want to return KTIME_MAX.

Prior to commit 500462a9de ("timers: Switch to a non-cascading
wheel") we checked base->active_timers to see if any timers
were active, and if not, we didn't touch the expiry value and so
properly returned KTIME_MAX.  Now we don't have active_timers.

To fix this, we now just check the timer wheel expiry value to
see if it is "now + NEXT_TIMER_MAX_DELTA", and if it is, we don't
try to compute a new value based on it, but instead simply let the
KTIME_MAX value in expires remain.

Fixes: 500462a9de "timers: Switch to a non-cascading wheel"
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1470688147-22287-1-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-09 09:31:55 +02:00
Richard Cochran
24f73b9971 timers/core: Convert to hotplug state machine
When tearing down, call timers_dead_cpu() before notify_dead().
There is a hidden dependency between:

 - timers
 - block multiqueue
 - rcutree

If timers_dead_cpu() comes later than blk_mq_queue_reinit_notify()
that latter function causes a RCU stall.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.566790058@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:41:42 +02:00
Ingo Molnar
4b4b20852d Merge branch 'timers/fast-wheel' into timers/core 2016-07-07 10:35:28 +02:00
Anna-Maria Gleixner
f00c0afdfa timers: Implement optimization for same expiry time in mod_timer()
The existing optimization for same expiry time in mod_timer() checks whether
the timer expiry time is the same as the new requested expiry time. In the old
timer wheel implementation this does not take the slack batching into account,
neither does the new implementation evaluate whether the new expiry time will
requeue the timer to the same bucket.

To optimize that, we can calculate the resulting bucket and check if the new
expiry time is different from the current expiry time. This calculation
happens outside the base lock held region. If the resulting bucket is the same
we can avoid taking the base lock and requeueing the timer.

If the timer needs to be requeued then we have to check under the base lock
whether the base time has changed between the lockless calculation and taking
the lock. If it has changed we need to recalculate under the lock.

This optimization takes effect for timers which are enqueued into the less
granular wheel levels (1 and above). With a simple test case the functionality
has been verified:

            Before        After
 Match:       5.5%        86.6%
 Requeue:    94.5%        13.4%
 Recalc:                  <0.01%

In the non optimized case the timer is requeued in 94.5% of the cases. With
the index optimization in place the requeue rate drops to 13.4%. The case
where the lockless index calculation has to be redone is less than 0.01%.

With a real world test case (networking) we observed the following changes:

            Before        After
 Match:      97.8%        99.7%
 Requeue:     2.2%         0.3%
 Recalc:                  <0.001%

That means two percent fewer lock/requeue/unlock operations done in one of
the hot path use cases of timers.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.778527749@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:12 +02:00
Anna-Maria Gleixner
ffdf047728 timers: Split out index calculation
For further optimizations we need to seperate index calculation
from queueing. No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.691159619@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:12 +02:00
Thomas Gleixner
4e85876a9d timers: Only wake softirq if necessary
With the wheel forwading in place and with the HZ=1000 4ms folding we can
avoid running the softirq at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.607650550@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:11 +02:00
Thomas Gleixner
a683f390b9 timers: Forward the wheel clock whenever possible
The wheel clock is stale when a CPU goes into a long idle sleep. This has the
side effect that timers which are queued end up in the outer wheel levels.
That results in coarser granularity.

To solve this, we keep track of the idle state and forward the wheel clock
whenever possible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.512039360@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:11 +02:00
Anna-Maria Gleixner
236968383c timers: Optimize collect_expired_timers() for NOHZ
After a NOHZ idle sleep the timer wheel must be forwarded to current jiffies.
There might be expired timers so the current code loops and checks the expired
buckets for timers. This can take quite some time for long NOHZ idle periods.

The pending bitmask in the timer base allows us to do a quick search for the
next expiring timer and therefore a fast forward of the base time which
prevents pointless long lasting loops.

For a 3 seconds idle sleep this reduces the catchup time from ~1ms to 5us.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.351296290@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:10 +02:00
Anna-Maria Gleixner
73420fea80 timers: Move __run_timers() function
Move __run_timers() below __next_timer_interrupt() and next_pending_bucket()
in preparation for __run_timers() NOHZ optimization.

No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.271872665@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:09 +02:00
Thomas Gleixner
53bf837b78 timers: Remove set_timer_slack() leftovers
We now have implicit batching in the timer wheel. The slack API is no longer
used, so remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jaehoon Chung <jh80.chung@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Cc: Pali Rohár <pali.rohar@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:09 +02:00
Thomas Gleixner
500462a9de timers: Switch to a non-cascading wheel
The current timer wheel has some drawbacks:

1) Cascading:

   Cascading can be an unbound operation and is completely pointless in most
   cases because the vast majority of the timer wheel timers are canceled or
   rearmed before expiration. (They are used as timeout safeguards, not as
   real timers to measure time.)

2) No fast lookup of the next expiring timer:

   In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
   must fast forward the base time to the current value of jiffies. As we
   have no way to find the next expiring timer fast, the code loops linearly
   and increments the base time one by one and checks for expired timers
   in each step. This causes unbound overhead spikes exactly in the moment
   when we should wake up as fast as possible.

After a thorough analysis of real world data gathered on laptops,
workstations, webservers and other machines (thanks Chris!) I came to the
conclusion that the current 'classic' timer wheel implementation can be
modified to address the above issues.

The vast majority of timer wheel timers is canceled or rearmed before
expiry. Most of them are timeouts for networking and other I/O tasks. The
nature of timeouts is to catch the exception from normal operation (TCP ack
timed out, disk does not respond, etc.). For these kinds of timeouts the
accuracy of the timeout is not really a concern. Timeouts are very often
approximate worst-case values and in case the timeout fires, we already
waited for a long time and performance is down the drain already.

The few timers which actually expire can be split into two categories:

 1) Short expiry times which expect halfways accurate expiry

 2) Long term expiry times are inaccurate today already due to the
    batching which is done for NOHZ automatically and also via the
    set_timer_slack() API.

So for long term expiry timers we can avoid the cascading property and just
leave them in the less granular outer wheels until expiry or
cancelation. Timers which are armed with a timeout larger than the wheel
capacity are no longer cascaded. We expire them with the longest possible
timeout (6+ days). We have not observed such timeouts in our data collection,
but at least we handle them, applying the rule of the least surprise.

To avoid extending the wheel levels for HZ=1000 so we can accomodate the
longest observed timeouts (5 days in the network conntrack code) we reduce the
first level granularity on HZ=1000 to 4ms, which effectively is the same as
the HZ=250 behaviour. From our data analysis there is nothing which relies on
that 1ms granularity and as a side effect we get better batching and timer
locality for the networking code as well.

Contrary to the classic wheel the granularity of the next wheel is not the
capacity of the first wheel. The granularities of the wheels are in the
currently chosen setting 8 times the granularity of the previous wheel.

So for HZ=250 we end up with the following granularity levels:

 Level Offset   Granularity                  Range
     0      0          4 ms                 0 ms -        252 ms
     1     64         32 ms               256 ms -       2044 ms (256ms - ~2s)
     2    128        256 ms              2048 ms -      16380 ms (~2s   - ~16s)
     3    192       2048 ms (~2s)       16384 ms -     131068 ms (~16s  - ~2m)
     4    256      16384 ms (~16s)     131072 ms -    1048572 ms (~2m   - ~17m)
     5    320     131072 ms (~2m)     1048576 ms -    8388604 ms (~17m  - ~2h)
     6    384    1048576 ms (~17m)    8388608 ms -   67108863 ms (~2h   - ~18h)
     7    448    8388608 ms (~2h)    67108864 ms -  536870911 ms (~18h  - ~6d)

That's a worst case inaccuracy of 12.5% for the timers which are queued at the
beginning of a level.

So the new wheel concept addresses the old issues:

1) Cascading is avoided completely

2) By keeping the timers in the bucket until expiry/cancelation we can track
   the buckets which have timers enqueued in a bucket bitmap and therefore can
   look up the next expiring timer very fast and O(1).

A further benefit of the concept is that the slack calculation which is done
on every timer start is no longer necessary because the granularity levels
provide natural batching already.

Our extensive testing with various loads did not show any performance
degradation vs. the current wheel implementation.

This patch does not address the 'fast lookup' issue as we wanted to make sure
that there is no regression introduced by the wheel redesign. The
optimizations are in follow up patches.

This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:09 +02:00
Thomas Gleixner
494af3ed78 timers: Give a few structs and members proper names
Some of the names in the internal implementation of the timer code
are not longer correct and others are simply too long to type.

Clean it up before we switch the wheel implementation over to
the new scheme.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.948752516@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:08 +02:00
Thomas Gleixner
177ec0a0a5 timers: Remove the deprecated mod_timer_pinned() API
We switched all users to initialize the timers as pinned and call
mod_timer(). Remove the now unused timer API function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:35:06 +02:00
Thomas Gleixner
e675447bda timers: Make 'pinned' a timer property
We want to move the timer migration logic from a 'push' to a 'pull' model.

Under the current 'push' model pinned timers are handled via
a runtime API variant: mod_timer_pinned().

The 'pull' model requires us to store the pinned attribute of a timer
in the timer_list structure itself, as a new TIMER_PINNED bit in
timer->flags.

This flag must be set at initialization time and the timer APIs
recognize the flag.

This patch:

 - Implements the new flag and associated new-style initialization
   methods

 - makes mod_timer() recognize new-style pinned timers,

 - and adds some migration helper facility to allow
   step by step conversion of old-style to new-style
   pinned timers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-07 10:25:13 +02:00
Bjorn Helgaas
b5227d03b7 timers: Clarify usleep_range() function comment
Update the usleep_range() function comment to make it clear that it can
only be used in non-atomic context.

Previously we claimed usleep_range() was a drop-in replacement for udelay()
where wakeup is flexible.  But that's only true in non-atomic contexts,
where it's possible to sleep instead of delay.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20160531212302.28502.44995.stgit@bhelgaas-glaptop2.roam.corp.google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-06-10 00:59:14 +02:00
Du, Changbin
b9fdac7f66 debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
When activating a static object we need make sure that the object is
tracked in the object tracker.  If it is a non-static object then the
activation is illegal.

In previous implementation, each subsystem need take care of this in
their fixup callbacks.  Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.

To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not.  If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.

This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.

At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked.  Then Change
such object can overwrite someone's memory and cause unexpected
behaviour.  For example, the timer_fixup_activate bind timer to function
stub_timer.

Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
  Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin
e3252464da timer: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).

Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Andrew Morton
69b27baf00 sched: add schedule_timeout_idle()
This will be needed in the patch "mm, oom: introduce oom reaper".

Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00