Commit Graph

130 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
5ddfe4cd13 Merge 4.4.208 into android-4.4-p
Changes in 4.4.208
	btrfs: do not leak reloc root if we fail to read the fs root
	btrfs: handle ENOENT in btrfs_uuid_tree_iterate
	ALSA: hda/ca0132 - Keep power on during processing DSP response
	ALSA: hda/ca0132 - Avoid endless loop
	drm: mst: Fix query_payload ack reply struct
	iio: light: bh1750: Resolve compiler warning and make code more readable
	spi: Add call to spi_slave_abort() function when spidev driver is released
	staging: rtl8188eu: fix possible null dereference
	rtlwifi: prevent memory leak in rtl_usb_probe
	IB/iser: bound protection_sg size by data_sg size
	media: am437x-vpfe: Setting STD to current value is not an error
	media: i2c: ov2659: fix s_stream return value
	media: i2c: ov2659: Fix missing 720p register config
	media: ov6650: Fix stored frame format not in sync with hardware
	tools/power/cpupower: Fix initializer override in hsw_ext_cstates
	usb: renesas_usbhs: add suspend event support in gadget mode
	hwrng: omap3-rom - Call clk_disable_unprepare() on exit only if not idled
	regulator: max8907: Fix the usage of uninitialized variable in max8907_regulator_probe()
	media: flexcop-usb: fix NULL-ptr deref in flexcop_usb_transfer_init()
	samples: pktgen: fix proc_cmd command result check logic
	mwifiex: pcie: Fix memory leak in mwifiex_pcie_init_evt_ring
	media: ti-vpe: vpe: fix a v4l2-compliance warning about invalid pixel format
	media: ti-vpe: vpe: fix a v4l2-compliance failure about frame sequence number
	media: ti-vpe: vpe: Make sure YUYV is set as default format
	extcon: sm5502: Reset registers during initialization
	x86/mm: Use the correct function type for native_set_fixmap()
	perf report: Add warning when libunwind not compiled in
	iio: adc: max1027: Reset the device at probe time
	Bluetooth: hci_core: fix init for HCI_USER_CHANNEL
	drm/gma500: fix memory disclosures due to uninitialized bytes
	x86/ioapic: Prevent inconsistent state when moving an interrupt
	arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()
	libata: Ensure ata_port probe has completed before detach
	pinctrl: sh-pfc: sh7734: Fix duplicate TCLK1_B
	bnx2x: Fix PF-VF communication over multi-cos queues.
	spi: img-spfi: fix potential double release
	rtlwifi: fix memory leak in rtl92c_set_fw_rsvdpagepkt()
	perf probe: Fix to find range-only function instance
	perf probe: Fix to list probe event with correct line number
	perf probe: Walk function lines in lexical blocks
	perf probe: Fix to probe an inline function which has no entry pc
	perf probe: Fix to show ranges of variables in functions without entry_pc
	perf probe: Fix to show inlined function callsite without entry_pc
	perf probe: Skip overlapped location on searching variables
	perf probe: Return a better scope DIE if there is no best scope
	perf probe: Fix to show calling lines of inlined functions
	perf probe: Skip end-of-sequence and non statement lines
	perf probe: Filter out instances except for inlined subroutine and subprogram
	ath10k: fix get invalid tx rate for Mesh metric
	media: pvrusb2: Fix oops on tear-down when radio support is not present
	media: si470x-i2c: add missed operations in remove
	EDAC/ghes: Fix grain calculation
	spi: pxa2xx: Add missed security checks
	ASoC: rt5677: Mark reg RT5677_PWR_ANLG2 as volatile
	parport: load lowlevel driver if ports not found
	cpufreq: Register drivers only after CPU devices have been registered
	x86/crash: Add a forward declaration of struct kimage
	spi: tegra20-slink: add missed clk_unprepare
	btrfs: don't prematurely free work in end_workqueue_fn()
	iwlwifi: check kasprintf() return value
	fbtft: Make sure string is NULL terminated
	crypto: sun4i-ss - Fix 64-bit size_t warnings on sun4i-ss-hash.c
	crypto: vmx - Avoid weird build failures
	libtraceevent: Fix memory leakage in copy_filter_type
	net: phy: initialise phydev speed and duplex sanely
	Revert "mmc: sdhci: Fix incorrect switch to HS mode"
	usb: xhci: Fix build warning seen with CONFIG_PM=n
	btrfs: do not call synchronize_srcu() in inode_tree_del
	btrfs: return error pointer from alloc_test_extent_buffer
	btrfs: abort transaction after failed inode updates in create_subvol
	Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
	ALSA: pcm: Avoid possible info leaks from PCM stream buffers
	af_packet: set defaule value for tmo
	fjes: fix missed check in fjes_acpi_add
	mod_devicetable: fix PHY module format
	net: hisilicon: Fix a BUG trigered by wrong bytes_compl
	net: nfc: nci: fix a possible sleep-in-atomic-context bug in nci_uart_tty_receive()
	net: qlogic: Fix error paths in ql_alloc_large_buffers()
	net: usb: lan78xx: Fix suspend/resume PHY register access error
	sctp: fully initialize v4 addr in some functions
	net: dst: Force 4-byte alignment of dst_metrics
	usbip: Fix error path of vhci_recv_ret_submit()
	USB: EHCI: Do not return -EPIPE when hub is disconnected
	platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
	staging: comedi: gsc_hpdi: check dma_alloc_coherent() return value
	ext4: check for directory entries too close to block end
	powerpc/irq: fix stack overflow verification
	mmc: sdhci-of-esdhc: fix P2020 errata handling
	perf probe: Fix to show function entry line as probe-able
	scsi: mpt3sas: Fix clear pending bit in ioctl status
	scsi: lpfc: Fix locking on mailbox command completion
	Input: atmel_mxt_ts - disable IRQ across suspend
	iommu/tegra-smmu: Fix page tables in > 4 GiB memory
	scsi: target: compare full CHAP_A Algorithm strings
	scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
	scsi: csiostor: Don't enable IRQs too early
	powerpc/pseries: Mark accumulate_stolen_time() as notrace
	dma-debug: add a schedule point in debug_dma_dump_mappings()
	clocksource/drivers/asm9260: Add a check for of_clk_get
	powerpc/security/book3s64: Report L1TF status in sysfs
	jbd2: Fix statistics for the number of logged blocks
	scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
	scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
	clk: qcom: Allow constant ratio freq tables for rcg
	irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
	irqchip: ingenic: Error out if IRQ domain creation failed
	fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
	scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
	scsi: ufs: fix potential bug which ends in system hang
	powerpc/pseries/cmm: Implement release() function for sysfs device
	powerpc/security: Fix wrong message when RFI Flush is disable
	clk: pxa: fix one of the pxa RTC clocks
	bcache: at least try to shrink 1 node in bch_mca_scan()
	HID: Improve Windows Precision Touchpad detection.
	ext4: work around deleting a file with i_nlink == 0 safely
	scsi: pm80xx: Fix for SATA device discovery
	scsi: target: iscsi: Wait for all commands to finish before freeing a session
	gpio: mpc8xxx: Don't overwrite default irq_set_type callback
	scripts/kallsyms: fix definitely-lost memory leak
	cdrom: respect device capabilities during opening action
	perf regs: Make perf_reg_name() return "unknown" instead of NULL
	libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h
	s390/cpum_sf: Check for SDBT and SDB consistency
	ocfs2: fix passing zero to 'PTR_ERR' warning
	kernel: sysctl: make drop_caches write-only
	ALSA: hda - Downgrade error message for single-cmd fallback
	Make filldir[64]() verify the directory entry filename is valid
	filldir[64]: remove WARN_ON_ONCE() for bad directory entries
	net: davinci_cpdma: use dma_addr_t for DMA address
	netfilter: ebtables: compat: reject all padding in matches/watchers
	6pack,mkiss: fix possible deadlock
	netfilter: bridge: make sure to pull arp header in br_nf_forward_arp()
	net: icmp: fix data-race in cmp_global_allow()
	hrtimer: Annotate lockless access to timer->state
	mmc: sdhci: Update the tuning failed messages to pr_debug level
	tcp: do not send empty skb from tcp_write_xmit()
	Linux 4.4.208

Change-Id: I1c710061be5b595f822b45a87d852b85512d7783
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-01-04 13:50:26 +01:00
Eric Dumazet
8f9fc3e649 hrtimer: Annotate lockless access to timer->state
commit 56144737e67329c9aaed15f942d46a6302e2e3d8 upstream.

syzbot reported various data-race caused by hrtimer_is_queued() reading
timer->state. A READ_ONCE() is required there to silence the warning.

Also add the corresponding WRITE_ONCE() when timer->state is set.

In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid
loading timer->state twice.

KCSAN reported these cases:

BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check

write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0:
 __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
 __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
 __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1:
 tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline]
 tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225
 tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044
 tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558
 tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717
 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696
 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
 sk_backlog_rcv include/net/sock.h:945 [inline]
 __release_sock+0x135/0x1e0 net/core/sock.c:2435
 release_sock+0x61/0x160 net/core/sock.c:2951
 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0x9f/0xc0 net/socket.c:657

BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check

write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0:
 __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991
 __run_hrtimer kernel/time/hrtimer.c:1496 [inline]
 __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576
 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0xbb/0xe0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830

read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1:
 __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265
 tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline]
 tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708
 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561
 sk_backlog_rcv include/net/sock.h:945 [inline]
 __release_sock+0x135/0x1e0 net/core/sock.c:2435
 release_sock+0x61/0x160 net/core/sock.c:2951
 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145
 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393
 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434
 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg+0x9f/0xc0 net/socket.c:657
 __sys_sendto+0x21f/0x320 net/socket.c:1952
 __do_sys_sendto net/socket.c:1964 [inline]
 __se_sys_sendto net/socket.c:1960 [inline]
 __x64_sys_sendto+0x89/0xb0 net/socket.c:1960
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

[ tglx: Added comments ]

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04 13:34:38 +01:00
Dmitry Shmidt
b558f17a13 Merge tag 'v4.4.16' into android-4.4.y
This is the 4.4.16 stable release

Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
2016-08-01 15:57:55 -07:00
John Stultz
286d25ba46 BACKPORT: timer: convert timer_slack_ns from unsigned long to u64
This backports da8b44d5a9f8bf26da637b7336508ca534d6b319 from upstream.

This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).

The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems.  It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.

The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.

With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:

$ time sleep 1

real    0m10.747s
user    0m0.001s
sys     0m0.005s

The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s.  Let me
know if it makes sense to break that up more or not.

Other than that things are fairly straightforward.

This patch (of 2):

The timer_slack_ns value in the task struct is currently a unsigned
long.  This means that on 32bit applications, the maximum slack is just
over 4 seconds.  However, on 64bit machines, its much much larger (~500
years).

This disparity could make application development a little (as well as
the default_slack) to a u64.  This means both 32bit and 64bit systems
have the same effective internal slack range.

Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.

This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-01 18:12:06 -07:00
Thomas Gleixner
dd0d511548 hrtimer: Handle remaining time proper for TIME_LOW_RES
commit 203cbf77de59fc8f13502dcfd11350c6d4a5c95f upstream.

If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.

Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.

Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.

Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-17 12:30:57 -08:00
Thomas Gleixner
683be13a28 timer: Minimize nohz off overhead
If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.

Before:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.73%  hog       [.] main
  15.36%  [kernel]  [k] _raw_spin_lock_irqsave
   9.77%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.61%  [kernel]  [k] lock_timer_base.isra.38
   6.42%  [kernel]  [k] mod_timer
   3.90%  [kernel]  [k] detach_if_pending
   3.76%  [kernel]  [k] del_timer
   2.41%  [kernel]  [k] internal_add_timer
   1.39%  [kernel]  [k] __internal_add_timer
   0.76%  [kernel]  [k] timerfn

We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 15:18:28 +02:00
Thomas Gleixner
bc7a34b8b9 timer: Reduce timer migration overhead if disabled
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.

We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.

The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.

With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.

Before:
  47.76%  hog       [.] main
  14.84%  [kernel]  [k] _raw_spin_lock_irqsave
   9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.71%  [kernel]  [k] mod_timer
   6.24%  [kernel]  [k] lock_timer_base.isra.38
   3.76%  [kernel]  [k] detach_if_pending
   3.71%  [kernel]  [k] del_timer
   2.50%  [kernel]  [k] internal_add_timer
   1.51%  [kernel]  [k] get_nohz_timer_target
   1.28%  [kernel]  [k] __internal_add_timer
   0.78%  [kernel]  [k] timerfn
   0.48%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu


Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 15:18:28 +02:00
Peter Zijlstra
887d9dc989 hrtimer: Allow hrtimer::function() to free the timer
Currently an hrtimer callback function cannot free its own timer
because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
after it. Freeing the timer would result in a clear use-after-free.

Solve this by using a scheme similar to regular timers; track the
current running timer in hrtimer_clock_base::running.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 00:09:56 +02:00
Oleg Nesterov
c04dca02bc hrtimer: Remove HRTIMER_STATE_MIGRATE
I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally
confused it looks buggy and simply unneeded.

migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this
is not enough: this can fool, say, hrtimer_is_queued() in
dequeue_signal().

Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED?
This fixes the race and we can kill STATE_MIGRATE.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 00:09:56 +02:00
Borislav Petkov
d711b8b30c hrtimers: Make sure hrtimer_resolution is unsigned int
... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like
this one:

net/sched/sch_api.c: In function ‘psched_show’:
net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
      (u32)NSEC_PER_SEC / hrtimer_resolution);

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1433583000-32090-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
2015-06-08 15:46:06 +02:00
Thomas Gleixner
61699e1307 hrtimer: Remove hrtimer_start() return value
No user was ever interested whether the timer was active or not when
it was started. All abusers of the return value are gone, so get rid
of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.483556394@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
02a171af1a hrtimer: Make hrtimer_start() a inline wrapper
No point for an extra export just to set the extra argument of
hrtimer_start_range_ns() to 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.808544539@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
58f1f803f1 hrtimer: Get rid of __hrtimer_start_range_ns()
No more callers. Remove the leftovers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.707871492@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
c1ad348b45 tick: Nohz: Rework next timer evaluation
The evaluation of the next timer in the nohz code is based on jiffies
while all the tick internals are nano seconds based. We have also to
convert hrtimer nanoseconds to jiffies in the !highres case. That's
just wrong and introduces interesting corner cases.

Turn it around and convert the next timer wheel timer expiry and the
rcu event to clock monotonic and base all calculations on
nanoseconds. That identifies the case where no timer is pending
clearly with an absolute expiry value of KTIME_MAX.

Makes the code more readable and gets rid of the jiffies magic in the
nohz code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
c6eb3f70d4 hrtimer: Get rid of hrtimer softirq
hrtimer softirq is a leftover from the initial implementation and
serves only the purpose to handle the enqueueing of already expired
timers in the high resolution timer mode. We discussed whether we
change the return value and force all start sites to handle that the
timer is already expired, but that would be a Herculean task and I'm
not sure whether its a good idea to enforce that handling on
everyone.

A simpler solution is to enforce a timer interrupt instead of raising
and scheduling a softirq. Just use the existing infrastructure to do
so and remove all the softirq leftovers.

The HRTIMER softirq enum is now unused, but kept around because trace
parsers rely on the existing numbering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.840834708@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
895bdfa793 hrtimer: Keep pointer to first timer and simplify __remove_hrtimer()
__remove_hrtimer() needs to evaluate the expiry time to figure out
whether the timer which is removed is eventually the first expiring
timer on the cpu. Keep a pointer to it, which is lazily updated, so we
can avoid the evaluation dance and retrieve the information from there.

Generates slightly better code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.752838019@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
b8e38413ac hrtimer: Align the hrtimer clock bases as well
We don't use cacheline_align here because that might waste lot of
space on 32bit machine with 64 bytes cachelines and on 64bit machines
with 128 bytes cachelines.

The size of struct hrtimer_clock_base is 64byte on 64bit and 32byte on
32bit machines. So we utilize the cache lines proper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.498165771@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
6d9a141139 hrtimer: Cache line align the hrtimer cpu base
We really want that data structure to start at a cache line boundary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.417597627@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
e19ffe8be2 hrtimer: Use bits for various boolean indicators
No point in wasting 12 byte storage space. Generates better code as well.

Text size reduction:
       x8664 -64, i386 -16, ARM -132, ARM64 -0, power64 -48

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.227955358@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
868a3e915f hrtimer: Make offset update smarter
On every tick/hrtimer interrupt we update the offset variables of the
clock bases. That's silly because these offsets change very seldom.

Add a sequence counter to the time keeping code which keeps track of
the offset updates (clock_was_set()). Have a sequence cache in the
hrtimer cpu bases to evaluate whether the offsets must be updated or
not. This allows us later to avoid pointless cacheline pollution.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20150414203501.132820245@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
21d6d52a1b hrtimer: Get rid of softirq time
The softirq time field in the clock bases is an optimization from the
early days of hrtimers. It provides a coarse "jiffies" like time
mostly for self rearming timers.

But that comes with a price:
    - Larger code size
    - Extra storage space
    - Duplicated functions with really small differences
   
The benefit of this is optimization is marginal for contemporary
systems.

Consolidate everything on the high resolution timer
implementation. This makes further optimizations possible.

Text size reduction:
       x8664 -95, i386 -356, ARM -148, ARM64 -40, power64 -16

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.039977424@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
a6ffebce7f hrtimer: Make the statistics fields smaller
No point in having usigned long for /proc/timer_list statistics. Make
them unsigned int.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203500.959773467@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
056a3cacbc hrtimer: Get rid of hrtimer_get_res()
The resolution is directly accessible now. So its simpler just to fill
in the values of the timespec and be done with it.

Text size reduction (combined with "hrtimer: Get rid of the resolution
field in hrtimer_clock_base"):
       x8664 -61, i386 -221, ARM -60, power64 -48

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203500.879888080@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
398ca17fb5 hrtimer: Get rid of the resolution field in hrtimer_clock_base
The field has no value because all clock bases have the same
resolution. The resolution only changes when we switch to high
resolution timer mode. We can evaluate that from a single static
variable as well. In the !HIGHRES case its simply a constant.

Export the variable, so we can simplify the usage sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203500.645454122@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:48 +02:00
Thomas Gleixner
91e5a2170e hrtimer: Document hrtimer_forward[_now]() proper
Document the calling context conditions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150413210035.178751779@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:48 +02:00
Thomas Gleixner
9bc7491906 hrtimer: Prevent stale expiry time in hrtimer_interrupt()
hrtimer_interrupt() has the following subtle issue:

hrtimer_interrupt()
  lock(cpu_base);
  expires_next = KTIME_MAX;

  expire_timers(CLOCK_MONOTONIC);
  expires = get_next_timer(CLOCK_MONOTONIC);
  if (expires < expires_next)
    expires_next = expires;

  expire_timers(CLOCK_REALTIME);
    unlock(cpu_base);
    wakeup()
    hrtimer_start(CLOCK_MONOTONIC, newtimer);
    lock(cpu_base();  
  expires = get_next_timer(CLOCK_REALTIME);
  if (expires < expires_next)
    expires_next = expires;

So because we already evaluated the next expiring timer of
CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
earlier than the overall next expiry time in hrtimer_interrupt().

To solve this, remove the caching of the next expiry value from
hrtimer_interrupt() and reevaluate all active clock bases for the next
expiry value. To avoid another code duplication, create a shared
evaluation function and use it for hrtimer_get_next_event(),
hrtimer_force_reprogram() and hrtimer_interrupt().

There is another subtlety in this mechanism:

While hrtimer_interrupt() is running, we want to avoid to touch the
hardware device because we will reprogram it anyway at the end of
hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
via the HRTIMER_RESTART mechanism, because we drop out when the
callback on that CPU is running. But that fails, if a new timer gets
enqueued like in the example above.

This has another implication: While hrtimer_interrupt() is running we
refuse remote enqueueing of timers - see hrtimer_interrupt() and
hrtimer_check_target().

hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
to KTIME_MAX, but that fails if a new timer gets queued.

Prevent both the hardware access and the remote enqueue
explicitely. We can loosen the restriction on the remote enqueue now
due to reevaluation of the next expiry value, but that needs a
seperate patch.

Folded in a fix from Vignesh Radhakrishnan.

Reported-and-tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Based-on-patch-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: vigneshr@codeaurora.org
Cc: john.stultz@linaro.org
Cc: viresh.kumar@linaro.org
Cc: fweisbec@gmail.com
Cc: cl@linux.com
Cc: stuart.w.hayes@gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-23 12:13:20 +01:00
Thomas Gleixner
8b094cd03b time: Consolidate the time accessor prototypes
Right now we have time related prototypes in 3 different header
files. Move it to a single timekeeping header file and move the core
internal stuff into a core private header.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-23 10:17:54 -07:00
Thomas Gleixner
166afb6451 ktime: Sanitize ktime_to_us/ms conversion
With the plain nanoseconds based ktime_t we can simply use
ktime_divns() instead of going through loops and hoops of
timespec/timeval conversion.

Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-23 10:16:50 -07:00
John Stultz
76f4108892 hrtimer: Cleanup hrtimer accessors to the timekepeing state
Rather then having two similar but totally different implementations
that provide timekeeping state to the hrtimer code, try to unify the
two implementations to be more simliar.

Thus this clarifies ktime_get_update_offsets to
ktime_get_update_offsets_now and changes get_xtime...  to
ktime_get_update_offsets_tick.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-23 10:16:50 -07:00
Viresh Kumar
cddd02489f hrtimer: Store cpu-number in struct hrtimer_cpu_base
In lowres mode, hrtimers are serviced by the tick instead of a clock
event. Now it works well as long as the tick stays periodic but we
must also make sure that the hrtimers are serviced in dynticks mode.

Part of that job consist in kicking a dynticks hrtimer target in order
to make it reconsider the next tick to schedule to correctly handle the
hrtimer's expiring time. And that part isn't handled by the hrtimers
subsystem.

To prepare for fixing this, we need __hrtimer_start_range_ns() to be
able to resolve the CPU target associated to a hrtimer's object
'cpu_base' so that the kick can be centralized there.

So lets store it in the 'struct hrtimer_cpu_base' to resolve the CPU
without overhead. It is set once at CPU's online notification.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1403393357-2070-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-23 11:23:47 +02:00
Viresh Kumar
e2e680fb75 hrtimer: Rearrange comments in the order struct members are declared
Rearrange kernel doc comments in the order members of struct hrtimer are
declared.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1db1a3cfbe8a9ea49396af75c6ac04a2e67e3ab0.1395226248.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-20 12:35:45 +01:00
John Stultz
90adda98b8 hrtimer: Add hrtimer support for CLOCK_TAI
Add hrtimer support for CLOCK_TAI, as well as posix timer interfaces.

Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-03-22 16:19:59 -07:00
Thomas Gleixner
f6c06abfb3 timekeeping: Provide hrtimer update function
To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-07-11 23:34:39 +02:00
John Stultz
f55a6faa38 hrtimer: Provide clock_was_set_delayed()
clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().

For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.

Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.

[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
  rid of all this ifdeffery ASAP ]

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-07-11 23:34:37 +02:00
Vitaliy Ivanov
4d258b25d9 Fix some kernel-doc warnings
Fix 'make htmldocs' warnings:

  Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid'
  Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device'
  Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock'

Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-27 16:06:19 -07:00
Thomas Gleixner
68fa61c026 hrtimers: Reorder clock bases
The ordering of the clock bases is historical due to the
CLOCK_REALTIME and CLOCK_MONOTONIC constants. Now the hrtimer bases
have their own enumeration due to the gap between CLOCK_MONOTONIC and
CLOCK_BOOTTIME. So we can be more clever as most timers end up on the
CLOCK_MONOTONIC base due to the virtue of POSIX declaring that
relative CLOCK_REALTIME timers are not affected by time changes. In
desktop environments this is slowly changing as applications switch to
absolute timers, but I've observed empty CLOCK_REALTIME bases often
enough. There is no performance penalty or overhead when
CLOCK_REALTIME timers are active, but in case they are not we don't
skip over a full cache line.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2011-05-23 13:59:54 +02:00
Thomas Gleixner
ab8177bc53 hrtimers: Avoid touching inactive timer bases
Instead of iterating over all possible timer bases avoid it by marking
the active bases in the cpu base.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2011-05-23 13:59:54 +02:00
Thomas Gleixner
f24444b01b hrtimers: Make struct hrtimer_cpu_base layout less stupid
In the HIGHRES=y case we access the members at the end of struct
hrtimer_cpu_base first and then the one at the beginning. Move the
hrtimer data to front, so we have linear progressing access.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2011-05-23 13:59:53 +02:00
Thomas Gleixner
9ec2690758 timerfd: Manage cancelable timers in timerfd
Peter is concerned about the extra scan of CLOCK_REALTIME_COS in the
timer interrupt. Yes, I did not think about it, because the solution
was so elegant. I didn't like the extra list in timerfd when it was
proposed some time ago, but with a rcu based list the list walk it's
less horrible than the original global lock, which was held over the
list iteration.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2011-05-23 13:59:53 +02:00
Thomas Gleixner
99ee5315da timerfd: Allow timers to be cancelled when clock was set
Some applications must be aware of clock realtime being set
backward. A simple example is a clock applet which arms a timer for
the next minute display. If clock realtime is set backward then the
applet displays a stale time for the amount of time which the clock
was set backwards. Due to that applications poll the time because we
don't have an interface.

Extend the timerfd interface by adding a flag which puts the timer
onto a different internal realtime clock. All timers on this clock are
expired whenever the clock was set.

The timerfd core records the monotonic offset when the timer is
created. When the timer is armed, then the current offset is compared
to the previous recorded offset. When it has changed, then
timerfd_settime returns -ECANCELED. When a timer is read the offset is
compared and if it changed -ECANCELED returned to user space. Periodic
timers are not rearmed in the cancelation case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Chris Friesen <chris.friesen@genband.com>
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Reviewed-by: Alexander Shishkin <virtuoso@slind.org>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:39:15 +02:00
Thomas Gleixner
b12a03ce48 hrtimers: Prepare for cancel on clock was set timers
Make clock_was_set() unconditional and rename hres_timers_resume to
hrtimers_resume. This is a preparatory patch for hrtimers which are
cancelled when clock realtime was set.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-02 21:37:58 +02:00
Thomas Gleixner
53370d2e8c hrtimer: Update hrtimer->state documentation
We changed some of the state bits and combinations thereof over time,
but never updated the documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-10 19:15:58 +01:00
John Stultz
70a08cca12 timers: Add CLOCK_BOOTTIME hrtimer base
CLOCK_MONOTONIC stops while the system is in suspend. This is because
to applications system suspend is invisible. However, there is a
growing set of applications that are wanting to be suspend-aware,
but do not want to deal with the complications of CLOCK_REALTIME
(which might jump around if settimeofday is called).

For these applications, I propose a new clockid: CLOCK_BOOTTIME.
CLOCK_BOOTTIME is idential to CLOCK_MONOTONIC, except it also
includes any time spent in suspend.

This patch add hrtimer base for CLOCK_BOOTTIME, using
get_monotonic_boottime/ktime_get_boottime, to allow
in kernel users to set timers against.

CC: Jamie Lokier <jamie@shareable.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Shishkin <virtuoso@slind.org>
CC: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-21 12:53:08 -08:00
John Stultz
abb3a4ea2e time: Introduce get_monotonic_boottime and ktime_get_boottime
This adds new functions that return the monotonic time since boot
(in other words, CLOCK_MONOTONIC + suspend time).

CC: Jamie Lokier <jamie@shareable.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Shishkin <virtuoso@slind.org>
CC: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-21 12:53:05 -08:00
John Stultz
e06383db9e hrtimers: extend hrtimer base code to handle more then 2 clockids
The hrtimer code is written mainly with CLOCK_REALTIME and CLOCK_MONOTONIC
in mind. These are clockids 0 and 1 resepctively. However, if we are
to introduce any new hrtimer bases, using new clockids, we have to skip
the cputimers (clockids 2,3) as well as other clockids that may not impelement
timers.

This patch adds a little bit of indirection between the clockid and
the base, so that we can extend the base by one when we add
a new clockid at number 7 or so.

CC: Jamie Lokier <jamie@shareable.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Shishkin <virtuoso@slind.org>
CC: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-02-21 12:53:04 -08:00
Randy Dunlap
175881db89 hrtimer.h: fix kernel-doc warning
Fix new kernel-doc notation warning in hrtimer.h:

  Warning(include/linux/hrtimer.h:150): Excess struct/union/enum/typedef member 'first' description in 'hrtimer_clock_base'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-10 07:38:53 -08:00
John Stultz
998adc3dda hrtimers: Convert hrtimers to use timerlist infrastructure
Converts the hrtimer code to use the new timerlist infrastructure

Signed-off-by: John Stultz <john.stultz@linaro.org>
LKML Reference: <1290136329-18291-3-git-send-email-john.stultz@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Alessandro Zummo <a.zummo@towertech.it>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Richard Cochran <richardcochran@gmail.com>
2010-12-10 11:54:35 -08:00
Yong Zhang
5e4f083f78 hrtimer: Remove stale comment on curr_timer
curr_timer doesn't resident in struct hrtimer_cpu_base anymore.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
LKML-Reference: <1287892253-2587-1-git-send-email-yong.zhang0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-11-10 12:39:52 +01:00
Carsten Emde
351b3f7a21 hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME
The current version of schedule_hrtimeout() always uses the
monotonic clock. Some system calls such as mq_timedsend()
and mq_timedreceive(), however, require the use of the wall
clock due to the definition of the system call.

This patch provides the infrastructure to use schedule_hrtimeout() 
with a CLOCK_REALTIME timer.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Tested-by: Pradyumna Sampath <pradysam@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Veen <arjan@infradead.org>
LKML-Reference: <20100402204331.167439615@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-06 21:50:03 +02:00
Thomas Gleixner
ecb49d1a63 hrtimers: Convert to raw_spinlocks
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:34 +01:00