Changes in 4.19.307
PCI: mediatek: Clear interrupt status before dispatching handler
include/linux/units.h: add helpers for kelvin to/from Celsius conversion
units: Add Watt units
units: change from 'L' to 'UL'
units: add the HZ macros
serial: sc16is7xx: set safe default SPI clock frequency
driver core: add device probe log helper
spi: introduce SPI_MODE_X_MASK macro
serial: sc16is7xx: add check for unsupported SPI modes during probe
ext4: allow for the last group to be marked as trimmed
crypto: api - Disallow identical driver names
PM: hibernate: Enforce ordering during image compression/decompression
hwrng: core - Fix page fault dead lock on mmap-ed hwrng
rpmsg: virtio: Free driver_override when rpmsg_remove()
parisc/firmware: Fix F-extend for PDC addresses
nouveau/vmm: don't set addr on the fail path to avoid warning
block: Remove special-casing of compound pages
powerpc: Use always instead of always-y in for crtsavres.o
x86/CPU/AMD: Fix disabling XSAVES on AMD family 0x17 due to erratum
driver core: Annotate dev_err_probe() with __must_check
Revert "driver core: Annotate dev_err_probe() with __must_check"
driver code: print symbolic error code
drivers: core: fix kernel-doc markup for dev_err_probe()
net/smc: fix illegal rmb_desc access in SMC-D connection dump
vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING
llc: make llc_ui_sendmsg() more robust against bonding changes
llc: Drop support for ETH_P_TR_802_2.
net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv
tracing: Ensure visibility when inserting an element into tracing_map
tcp: Add memory barrier to tcp_push()
netlink: fix potential sleeping issue in mqueue_flush_file
net/mlx5: Use kfree(ft->g) in arfs_create_groups()
net/mlx5e: fix a double-free in arfs_create_groups
netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
fjes: fix memleaks in fjes_hw_setup
net: fec: fix the unhandled context fault from smmu
btrfs: don't warn if discard range is not aligned to sector
btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
netfilter: nf_tables: reject QUEUE/DROP verdict parameters
gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
drm: Don't unref the same fb many times by mistake due to deadlock handling
drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking
drm/bridge: nxp-ptn3460: simplify some error checking
drm/exynos: gsc: minor fix for loop iteration in gsc_runtime_resume
gpio: eic-sprd: Clear interrupt after set the interrupt type
mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan
tick/sched: Preserve number of idle sleeps across CPU hotplug events
x86/entry/ia32: Ensure s32 is sign extended to s64
net/sched: cbs: Fix not adding cbs instance to list
powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
powerpc: Fix build error due to is_valid_bugaddr()
powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
powerpc/lib: Validate size for vector operations
audit: Send netlink ACK before setting connection in auditd_set
ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop
PNP: ACPI: fix fortify warning
ACPI: extlog: fix NULL pointer dereference check
FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
UBSAN: array-index-out-of-bounds in dtSplitRoot
jfs: fix slab-out-of-bounds Read in dtSearch
jfs: fix array-index-out-of-bounds in dbAdjTree
jfs: fix uaf in jfs_evict_inode
pstore/ram: Fix crash when setting number of cpus to an odd number
crypto: stm32/crc32 - fix parsing list of devices
afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock()
jfs: fix array-index-out-of-bounds in diNewExt
s390/ptrace: handle setting of fpc register correctly
KVM: s390: fix setting of fpc register
SUNRPC: Fix a suspicious RCU usage warning
ext4: fix inconsistent between segment fstrim and full fstrim
ext4: unify the type of flexbg_size to unsigned int
ext4: remove unnecessary check from alloc_flex_gd()
ext4: avoid online resizing failures due to oversized flex bg
scsi: lpfc: Fix possible file string name overflow when updating firmware
PCI: Add no PM reset quirk for NVIDIA Spectrum devices
bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk
ARM: dts: imx7s: Fix lcdif compatible
ARM: dts: imx7s: Fix nand-controller #size-cells
wifi: ath9k: Fix potential array-index-out-of-bounds read in ath9k_htc_txstatus()
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
scsi: libfc: Don't schedule abort twice
scsi: libfc: Fix up timeout error in fc_fcp_rec_error()
ARM: dts: rockchip: fix rk3036 hdmi ports node
ARM: dts: imx25/27-eukrea: Fix RTC node name
ARM: dts: imx: Use flash@0,0 pattern
ARM: dts: imx27: Fix sram node
ARM: dts: imx1: Fix sram node
ARM: dts: imx27-apf27dev: Fix LED name
ARM: dts: imx23-sansa: Use preferred i2c-gpios properties
ARM: dts: imx23/28: Fix the DMA controller node name
md: Whenassemble the array, consult the superblock of the freshest device
wifi: rtl8xxxu: Add additional USB IDs for RTL8192EU devices
wifi: rtlwifi: rtl8723{be,ae}: using calculate_bit_shift()
wifi: cfg80211: free beacon_ies when overridden from hidden BSS
f2fs: fix to check return value of f2fs_reserve_new_block()
ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument
fast_dput(): handle underflows gracefully
RDMA/IPoIB: Fix error code return in ipoib_mcast_join
drm/drm_file: fix use of uninitialized variable
drm/framebuffer: Fix use of uninitialized variable
drm/mipi-dsi: Fix detach call without attach
media: stk1160: Fixed high volume of stk1160_dbg messages
media: rockchip: rga: fix swizzling for RGB formats
PCI: add INTEL_HDA_ARL to pci_ids.h
ALSA: hda: Intel: add HDA_ARL PCI ID support
drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time
IB/ipoib: Fix mcast list locking
media: ddbridge: fix an error code problem in ddb_probe
drm/msm/dpu: Ratelimit framedone timeout msgs
clk: hi3620: Fix memory leak in hi3620_mmc_clk_init()
clk: mmp: pxa168: Fix memory leak in pxa168_clk_init()
drm/amdgpu: Let KFD sync with VM fences
drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()'
leds: trigger: panic: Don't register panic notifier if creating the trigger failed
um: Fix naming clash between UML and scheduler
um: Don't use vfprintf() for os_info()
um: net: Fix return type of uml_net_start_xmit()
mfd: ti_am335x_tscadc: Fix TI SoC dependencies
PCI: Only override AMD USB controller if required
usb: hub: Replace hardcoded quirk value with BIT() macro
libsubcmd: Fix memory leak in uniq()
virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings
blk-mq: fix IO hang from sbitmap wakeup race
ceph: fix deadlock or deadcode of misusing dget()
drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
scsi: isci: Fix an error code problem in isci_io_request_build()
net: remove unneeded break
ixgbe: Remove non-inclusive language
ixgbe: Refactor returning internal error codes
ixgbe: Refactor overtemp event handling
ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
ipv6: Ensure natural alignment of const ipv6 loopback and router addresses
llc: call sock_orphan() at release time
netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
net: ipv4: fix a memleak in ip_setup_cork
af_unix: fix lockdep positive in sk_diag_dump_icons()
net: sysfs: Fix /sys/class/net/<iface> path
HID: apple: Add support for the 2021 Magic Keyboard
HID: apple: Swap the Fn and Left Control keys on Apple keyboards
HID: apple: Add 2021 magic keyboard FN key mapping
bonding: remove print in bond_verify_device_path
dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
atm: idt77252: fix a memleak in open_card_ubr0
hwmon: (aspeed-pwm-tacho) mutex for tach reading
hwmon: (coretemp) Fix out-of-bounds memory access
hwmon: (coretemp) Fix bogus core_id to attr name mapping
inet: read sk->sk_family once in inet_recv_error()
rxrpc: Fix response to PING RESPONSE ACKs to a dead call
tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
ppp_async: limit MRU to 64K
netfilter: nft_compat: reject unused compat flag
netfilter: nft_compat: restrict match/target protocol to u16
net/af_iucv: clean up a try_then_request_module()
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
USB: serial: option: add Fibocom FM101-GL variant
USB: serial: cp210x: add ID for IMST iM871A-USB
Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
vhost: use kzalloc() instead of kmalloc() followed by memset()
hrtimer: Report offline hrtimer enqueue
btrfs: forbid creating subvol qgroups
btrfs: send: return EOPNOTSUPP on unknown flags
spi: ppc4xx: Drop write-only variable
ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
Documentation: net-sysfs: describe missing statistics
net: sysfs: Fix /sys/class/net/<iface> path for statistics
MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
i40e: Fix waiting for queues of all VSIs to be disabled
tracing/trigger: Fix to return error if failed to alloc snapshot
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
HID: wacom: Do not register input devices until after hid_hw_start
USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
usb: f_mass_storage: forbid async queue when shutdown happen
scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
firewire: core: correct documentation of fw_csr_string() kernel API
nfc: nci: free rx_data_reassembly skb on NCI device cleanup
xen-netback: properly sync TX responses
binder: signal epoll threads of self-work
ext4: fix double-free of blocks due to wrong extents moved_len
staging: iio: ad5933: fix type mismatch regression
ring-buffer: Clean ring_buffer_poll_wait() error return
serial: max310x: set default value when reading clock ready bit
serial: max310x: improve crystal stable clock detection
x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
ALSA: hda/conexant: Add quirk for SWS JS201D
nilfs2: fix data corruption in dsync block recovery for small block sizes
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
nfp: use correct macro for LengthSelect in BAR config
irqchip/irq-brcmstb-l2: Add write memory barrier before exit
pmdomain: core: Move the unused cleanup to a _sync initcall
Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"
sched/membarrier: reduce the ability to hammer on sys_membarrier
nilfs2: fix potential bug in end_buffer_async_write
lsm: new security_file_ioctl_compat() hook
netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
Linux 4.19.307
Change-Id: Ib05aec445afe9920e2502bcfce1c52db76e27139
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit dad6a09f3148257ac1773cd90934d721d68ab595 upstream.
The hrtimers migration on CPU-down hotplug process has been moved
earlier, before the CPU actually goes to die. This leaves a small window
of opportunity to queue an hrtimer in a blind spot, leaving it ignored.
For example a practical case has been reported with RCU waking up a
SCHED_FIFO task right before the CPUHP_AP_IDLE_DEAD stage, queuing that
way a sched/rt timer to the local offline CPU.
Make sure such situations never go unnoticed and warn when that happens.
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240129235646.3171983-4-boqun.feng@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 9a2fc41acb which is
commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 upstream.
It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.
Bug: 161946584
Change-Id: Ifdc5474de6e7488eb24959df1697b61b2e3e7880
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Changes in 4.19.302
spi: imx: add a device specific prepare_message callback
spi: imx: move wml setting to later than setup_transfer
spi: imx: correct wml as the last sg length
spi: imx: mx51-ecspi: Move some initialisation to prepare_message hook.
media: davinci: vpif_capture: fix potential double free
hrtimers: Push pending hrtimers away from outgoing CPU earlier
netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test
tg3: Move the [rt]x_dropped counters to tg3_napi
tg3: Increment tx_dropped in tg3_tso_bug()
kconfig: fix memory leak from range properties
drm/amdgpu: correct chunk_ptr to a pointer to chunk.
ipv6: fix potential NULL deref in fib6_add()
hv_netvsc: rndis_filter needs to select NLS
net: arcnet: Fix RESET flag handling
net: arcnet: com20020 fix error handling
arcnet: restoring support for multiple Sohard Arcnet cards
ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit()
net: hns: fix fake link up on xge port
netfilter: xt_owner: Add supplementary groups option
netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
tcp: do not accept ACK of bytes we never sent
RDMA/bnxt_re: Correct module description string
hwmon: (acpi_power_meter) Fix 4.29 MW bug
tracing: Fix a warning when allocating buffered events fails
scsi: be2iscsi: Fix a memleak in beiscsi_init_wrb_handle()
ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init
ARM: dts: imx: make gpt node name generic
ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt
ALSA: pcm: fix out-of-bounds in snd_pcm_state_names
packet: Move reference count in packet_sock to atomic_long_t
nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
tracing: Always update snapshot buffer size
tracing: Fix incomplete locking when disabling buffered events
tracing: Fix a possible race when disabling buffered events
perf/core: Add a new read format to get a number of lost samples
perf: Fix perf_event_validate_size()
gpiolib: sysfs: Fix error handling on failed export
usb: gadget: f_hid: fix report descriptor allocation
parport: Add support for Brainboxes IX/UC/PX parallel cards
usb: typec: class: fix typec_altmode_put_partner to put plugs
serial: sc16is7xx: address RX timeout interrupt errata
serial: 8250_omap: Add earlycon support for the AM654 UART controller
x86/CPU/AMD: Check vendor in the AMD microcode callback
KVM: s390/mm: Properly reset no-dat
nilfs2: fix missing error check for sb_set_blocksize call
netlink: don't call ->netlink_bind with table lock held
genetlink: add CAP_NET_ADMIN test for multicast bind
psample: Require 'CAP_NET_ADMIN' when joining "packets" group
drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
IB/isert: Fix unaligned immediate-data handling
devcoredump : Serialize devcd_del work
devcoredump: Send uevent once devcd is ready
Linux 4.19.302
Change-Id: If04a1c5d3950ac7c1cbe4b71df951dcf3e8e8ed1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 ]
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug")
solved the straight forward CPU hotplug deadlock vs. the scheduler
bandwidth timer. Yu discovered a more involved variant where a task which
has a bandwidth timer started on the outgoing CPU holds a lock and then
gets throttled. If the lock required by one of the CPU hotplug callbacks
the hotplug operation deadlocks because the unthrottling timer event is not
handled on the dying CPU and can only be recovered once the control CPU
reaches the hotplug state which pulls the pending hrtimers from the dead
CPU.
Solve this by pushing the hrtimers away from the dying CPU in the dying
callbacks. Nothing can queue a hrtimer on the dying CPU at that point because
all other CPUs spin in stop_machine() with interrupts disabled and once the
operation is finished the CPU is marked offline.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liu Tie <liutie4@huawei.com>
Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
Try to mitigate potential future driver core api changes by adding a
padding to struct hrtimer.
Based on a change made to the RHEL/CENTOS 8 kernel.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5432e05386265281d993199599c6f9dcd17a9daf
Try to mitigate potential future driver core api changes by adding a
padding to struct hrtimer.
Based on a change made to the RHEL/CENTOS 8 kernel.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5432e05386265281d993199599c6f9dcd17a9daf
* refs/heads/tmp-5d7443e:
Linux 4.19.93
spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
pinctrl: baytrail: Really serialize all register accesses
tty/serial: atmel: fix out of range clock divider handling
spi: fsl: don't map irq during probe
gtp: avoid zero size hashtable
gtp: fix an use-after-free in ipv4_pdp_find()
gtp: fix wrong condition in gtp_genl_dump_pdp()
tcp: do not send empty skb from tcp_write_xmit()
tcp/dccp: fix possible race __inet_lookup_established()
net: marvell: mvpp2: phylink requires the link interrupt
gtp: do not allow adding duplicate tid and ms_addr pdp context
net/dst: do not confirm neighbor for vxlan and geneve pmtu update
sit: do not confirm neighbor when do pmtu update
vti: do not confirm neighbor when do pmtu update
tunnel: do not confirm neighbor when do pmtu update
net/dst: add new function skb_dst_update_pmtu_no_confirm
gtp: do not confirm neighbor when do pmtu update
ip6_gre: do not confirm neighbor when do pmtu update
net: add bool confirm_neigh parameter for dst_ops.update_pmtu
vhost/vsock: accept only packets with the right dst_cid
udp: fix integer overflow while computing available space in sk_rcvbuf
tcp: Fix highest_sack and highest_sack_seq
ptp: fix the race between the release of ptp_clock and cdev
net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs
net/mlxfw: Fix out-of-memory error in mfa2 flash burning
net: ena: fix napi handler misbehavior when the napi budget is zero
hrtimer: Annotate lockless access to timer->state
net: icmp: fix data-race in cmp_global_allow()
net: add a READ_ONCE() in skb_peek_tail()
inetpeer: fix data-race in inet_putpeer / inet_putpeer
netfilter: bridge: make sure to pull arp header in br_nf_forward_arp()
6pack,mkiss: fix possible deadlock
netfilter: ebtables: compat: reject all padding in matches/watchers
filldir[64]: remove WARN_ON_ONCE() for bad directory entries
Make filldir[64]() verify the directory entry filename is valid
perf strbuf: Remove redundant va_end() in strbuf_addv()
bonding: fix active-backup transition after link failure
ALSA: hda - Downgrade error message for single-cmd fallback
netfilter: nf_queue: enqueue skbs with NULL dst
net, sysctl: Fix compiler warning when only cBPF is present
x86/mce: Fix possibly incorrect severity calculation on AMD
Revert "powerpc/vcpu: Assume dedicated processors as non-preempt"
userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
kernel: sysctl: make drop_caches write-only
mailbox: imx: Fix Tx doorbell shutdown path
ocfs2: fix passing zero to 'PTR_ERR' warning
s390/cpum_sf: Check for SDBT and SDB consistency
libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h
s390/zcrypt: handle new reply code FILTERED_BY_HYPERVISOR
perf regs: Make perf_reg_name() return "unknown" instead of NULL
perf script: Fix brstackinsn for AUXTRACE
cdrom: respect device capabilities during opening action
powerpc: Don't add -mabi= flags when building with Clang
scripts/kallsyms: fix definitely-lost memory leak
apparmor: fix unsigned len comparison with less than zero
gpio: mpc8xxx: Don't overwrite default irq_set_type callback
scsi: target: iscsi: Wait for all commands to finish before freeing a session
scsi: iscsi: Don't send data to unbound connection
scsi: NCR5380: Add disconnect_mask module parameter
scsi: scsi_debug: num_tgts must be >= 0
scsi: ufs: Fix error handing during hibern8 enter
scsi: pm80xx: Fix for SATA device discovery
watchdog: Fix the race between the release of watchdog_core_data and cdev
HID: rmi: Check that the RMI_STARTED bit is set before unregistering the RMI transport device
HID: Improve Windows Precision Touchpad detection.
libnvdimm/btt: fix variable 'rc' set but not used
ARM: 8937/1: spectre-v2: remove Brahma-B53 from hardening
HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
HID: quirks: Add quirk for HP MSU1465 PIXART OEM mouse
bcache: at least try to shrink 1 node in bch_mca_scan()
clk: pxa: fix one of the pxa RTC clocks
scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE
powerpc/security: Fix wrong message when RFI Flush is disable
PCI: rpaphp: Correctly match ibm, my-drc-index to drc-name when using drc-info
PCI: rpaphp: Annotate and correctly byte swap DRC properties
PCI: rpaphp: Don't rely on firmware feature to imply drc-info support
powerpc/pseries/cmm: Implement release() function for sysfs device
scsi: ufs: fix potential bug which ends in system hang
PCI: rpaphp: Fix up pointer to first drc-info entry
scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
irqchip: ingenic: Error out if IRQ domain creation failed
irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
clk: clk-gpio: propagate rate change to parent
clk: qcom: Allow constant ratio freq tables for rcg
f2fs: fix to update dir's i_pino during cross_rename
scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
jbd2: Fix statistics for the number of logged blocks
ext4: iomap that extends beyond EOF should be marked dirty
ext4: update direct I/O read lock pattern for IOCB_NOWAIT
powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
powerpc/security/book3s64: Report L1TF status in sysfs
clocksource/drivers/timer-of: Use unique device name instead of timer
clocksource/drivers/asm9260: Add a check for of_clk_get
leds: lm3692x: Handle failure to probe the regulator
dma-debug: add a schedule point in debug_dma_dump_mappings()
powerpc/tools: Don't quote $objdump in scripts
powerpc/pseries: Don't fail hash page table insert for bolted mapping
powerpc/pseries: Mark accumulate_stolen_time() as notrace
scsi: hisi_sas: Replace in_softirq() check in hisi_sas_task_exec()
scsi: csiostor: Don't enable IRQs too early
scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
scsi: target: compare full CHAP_A Algorithm strings
dmaengine: xilinx_dma: Clear desc_pendingcount in xilinx_dma_reset
iommu/tegra-smmu: Fix page tables in > 4 GiB memory
iommu: rockchip: Free domain on .domain_free
f2fs: fix to update time in lazytime mode
Input: atmel_mxt_ts - disable IRQ across suspend
scsi: lpfc: Fix locking on mailbox command completion
scsi: mpt3sas: Fix clear pending bit in ioctl status
scsi: lpfc: Fix discovery failures when target device connectivity bounces
Conflicts:
drivers/scsi/ufs/ufshcd.c
kernel/time/hrtimer.c
Change-Id: I9b69dbbc7cd2efa1c016ea10091fed73010b4c32
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
To isolate CPUs (isolate from hrtimers) from sysfs using cpusets, we need some
support from the hrtimer core. i.e. A routine hrtimer_quiesce_cpu() which would
migrate away all the unpinned hrtimers, but shouldn't touch the pinned ones.
This patch creates this routine.
Change-Id: Icf93622bdeed9c6c28eb1aca1637bc5b9522a533
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: d4d50a0ddc35e58ee95137ba4d14e74fea8b682f
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-4.9]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[satyap@codeaurora.org: Port to msm-4.19 and fix merge conflicts]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
'Pinned' information would be required in migrate_hrtimers() now, as we can
migrate non-pinned timers away without a hotplug (i.e. with cpuset.quiesce). And
so we may need to identify pinned timers now, as we can't migrate them.
This patch reuses the timer->state variable for setting this flag as there were
enough number of free bits available in this variable. And there is no point
increasing size of this struct by adding another field.
Change-Id: Id2cdd74903598e31fab6999c4cbab9c2fd17fe23
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: 62feaf1ed0b64c04868d143d8bdb92d60dc3189b
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[markivx: Port to 4.14, fix merge conflicts due to lack of timer stats code]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
[satyap@codeaurora.org: Port to 4.19, fix merge conflicts]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Revert commits
92af4dcb4e ("tracing: Unify the "boot" and "mono" tracing clocks")
127bfa5f43 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
7250a4047a ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
d6c7270e91 ("timekeeping: Remove boot time specific code")
f2d6fdbfd2 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
d6ed449afd ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
72199320d4 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")
As stated in the pull request for the unification of CLOCK_MONOTONIC and
CLOCK_BOOTTIME, it was clear that we might have to revert the change.
As reported by several folks systemd and other applications rely on the
documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
changes. After resume daemons time out and other timeout related issues are
observed. Rafael compiled this list:
* systemd kills daemons on resume, after >WatchdogSec seconds
of suspending (Genki Sky). [Verified that that's because systemd uses
CLOCK_MONOTONIC and expects it to not include the suspend time.]
* systemd-journald misbehaves after resume:
systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
corrupted or uncleanly shut down, renaming and replacing.
(Mike Galbraith).
* NetworkManager reports "networking disabled" and networking is broken
after resume 50% of the time (Pavel). [May be because of systemd.]
* MATE desktop dims the display and starts the screensaver right after
system resume (Pavel).
* Full system hang during resume (me). [May be due to systemd or NM or both.]
That happens on debian and open suse systems.
It's sad, that these problems were neither catched in -next nor by those
folks who expressed interest in this change.
Reported-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Reported-by: Genki Sky <sky@genki.is>,
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
* pm-cpuidle:
tick-sched: avoid a maybe-uninitialized warning
cpuidle: Add definition of residency to sysfs documentation
time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
nohz: Avoid duplication of code related to got_idle_tick
nohz: Gather tick_sched booleans under a common flag field
cpuidle: menu: Avoid selecting shallow states with stopped tick
cpuidle: menu: Refine idle state selection for running tick
sched: idle: Select idle state before stopping the tick
time: hrtimer: Introduce hrtimer_next_event_without()
time: tick-sched: Split tick_nohz_stop_sched_tick()
cpuidle: Return nohz hint from cpuidle_select()
jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
sched: idle: Do not stop the tick before cpuidle_idle_call()
sched: idle: Do not stop the tick upfront in the idle loop
time: tick-sched: Reorganize idle tick management code
* pm-qos:
PM / QoS: mark expected switch fall-throughs
The next set of changes will need to compute the time to the next
hrtimer event over all hrtimers except for the scheduler tick one.
To that end introduce a new helper function,
hrtimer_next_event_without(), for computing the time until the next
hrtimer event over all timers except for one and modify the underlying
code in __hrtimer_next_event_base() to prepare it for being called by
that new function.
No intentional changes in functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
hrtimer callbacks are always invoked in hard interrupt context. Several
users in tree require soft interrupt context for their callbacks and
achieve this by combining a hrtimer with a tasklet. The hrtimer schedules
the tasklet in hard interrupt context and the tasklet callback gets invoked
in softirq context later.
That's suboptimal and aside of that the real-time patch moves most of the
hrtimers into softirq context. So adding native support for hrtimers
expiring in softirq context is a valuable extension for both mainline and
the RT patch set.
Each valid hrtimer clock id has two associated hrtimer clock bases: one for
timers expiring in hardirq context and one for timers expiring in softirq
context.
Implement the functionality to associate a hrtimer with the hard or softirq
related clock bases and update the relevant functions to take them into
account when the next expiry time needs to be evaluated.
Add a check into the hard interrupt context handler functions to check
whether the first expiring softirq based timer has expired. If it's expired
the softirq is raised and the accounting of softirq based timers to
evaluate the next expiry time for programming the timer hardware is skipped
until the softirq processing has finished. At the end of the softirq
processing the regular processing is resumed.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently hrtimer callback functions are always executed in hard interrupt
context. Users of hrtimers, which need their timer function to be executed
in soft interrupt context, make use of tasklets to get the proper context.
Add additional hrtimer clock bases for timers which must expire in softirq
context, so the detour via the tasklet can be avoided. This is also
required for RT, where the majority of hrtimer is moved into softirq
hrtimer context.
The selection of the expiry mode happens via a mode bit. Introduce
HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED
bits and update the decoding of hrtimer_mode in tracepoints.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer
in a CPU base.
This pointer cannot be dereferenced and is solely used to check whether a
hrtimer which is removed is the hrtimer which is the first to expire in the
CPU base. If this is the case, then the timer hardware needs to be
reprogrammed to avoid an extra interrupt for nothing.
Again, this is conditional functionality, but there is no compelling reason
to make this conditional. As a preparation, hrtimer_cpu_base.next_timer
needs to be available unconditonally.
Aside of that the upcoming support for softirq based hrtimers requires access
to this pointer unconditionally as well, so our motivation is not entirely
simplicity based.
Make the update of hrtimer_cpu_base.next_timer unconditional and remove the
#ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is
marginal as it's just a store on an already dirtied cacheline.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_cpu_base.expires_next is used to cache the next event armed in the
timer hardware. The value is used to check whether an hrtimer can be
enqueued remotely. If the new hrtimer is expiring before expires_next, then
remote enqueue is not possible as the remote hrtimer hardware cannot be
accessed for reprogramming to an earlier expiry time.
The remote enqueue check is currently conditional on
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no
compelling reason to make this conditional.
Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y
guarded area and remove the conditionals in hrtimer_check_target().
The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the
!hrtimer_cpu_base.hres_active case because in these cases nothing updates
hrtimer_cpu_base.expires_next yet. This will be changed with later patches
which further reduce the #ifdef zoo in this code.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The pointer to the currently running timer is stored in hrtimer_cpu_base
before the base lock is dropped and the callback is invoked.
This results in two levels of indirections and the upcoming support for
softirq based hrtimer requires splitting the "running" storage into soft
and hard IRQ context expiry.
Storing both in the cpu base would require conditionals in all code paths
accessing that information.
It's possible to have a per clock base sequence count and running pointer
without changing the semantics of the related mechanisms because the timer
base pointer cannot be changed while a timer is running the callback.
Unfortunately this makes cpu_clock base larger than 32 bytes on 32-bit
kernels. Instead of having huge gaps due to alignment, remove the alignment
and let the compiler pack CPU base for 32-bit kernels. The resulting cache access
patterns are fortunately not really different from the current
behaviour. On 64-bit kernels the 64-byte alignment stays and the behaviour is
unchanged. This was determined by analyzing the resulting layout and
looking at the number of cache lines involved for the frequently used
clocks.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-12-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the
clock ID, instead of the customary predefined "clockid_t" type.
In hrtimer coding style the canonical variable name for the clock ID is
'clock_id', therefore change the name of the parameter here as well
to make it all consistent.
While at it, clean up the description for the 'clock_id' and 'mode'
function parameters. The clock modes and the clock IDs are not
restricted as the comment suggests.
Fix the mode description as well for the callers of schedule_hrtimeout_range_clock().
No functional changes intended.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Usage of these apis and their compat versions makes
the syscalls: clock_nanosleep and nanosleep and
their compat implementations simpler.
This is a preparatory patch to isolate data conversions to
struct timespec64 at userspace boundaries. This helps contain
the changes needed to transition to new y2038 safe types.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In our quest to simplify <linux/sched.h>'s header dependencies, remove
the <linux/wait.h> inclusion from <linux/hrtimer.h> - which does
not appear to be necessary, as hrtimer.h does not use waitqueues.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).
The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems. It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.
The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.
With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:
$ time sleep 1
real 0m10.747s
user 0m0.001s
sys 0m0.005s
The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s. Let me
know if it makes sense to break that up more or not.
Other than that things are fairly straightforward.
This patch (of 2):
The timer_slack_ns value in the task struct is currently a unsigned
long. This means that on 32bit applications, the maximum slack is just
over 4 seconds. However, on 64bit machines, its much much larger (~500
years).
This disparity could make application development a little (as well as
the default_slack) to a u64. This means both 32bit and 64bit systems
have the same effective internal slack range.
Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.
This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.
Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.
Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.
Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.
Before:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
After:
48.73% hog [.] main
15.36% [kernel] [k] _raw_spin_lock_irqsave
9.77% [kernel] [k] _raw_spin_unlock_irqrestore
6.61% [kernel] [k] lock_timer_base.isra.38
6.42% [kernel] [k] mod_timer
3.90% [kernel] [k] detach_if_pending
3.76% [kernel] [k] del_timer
2.41% [kernel] [k] internal_add_timer
1.39% [kernel] [k] __internal_add_timer
0.76% [kernel] [k] timerfn
We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.
The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.
With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.
Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu
After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like
this one:
net/sched/sch_api.c: In function ‘psched_show’:
net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
(u32)NSEC_PER_SEC / hrtimer_resolution);
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1433583000-32090-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>