Changes in 4.4.181: (244 commits)
x86/speculation/mds: Revert CPU buffer clear on double fault exit
x86/speculation/mds: Improve CPU buffer clear documentation
ARM: exynos: Fix a leaked reference by adding missing of_node_put
crypto: vmx - fix copy-paste error in CTR mode
crypto: crct10dif-generic - fix use via crypto_shash_digest()
crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
ALSA: usb-audio: Fix a memory leak bug
ALSA: hda/hdmi - Consider eld_valid when reporting jack event
ALSA: hda/realtek - EAPD turn on later
ASoC: max98090: Fix restore of DAPM Muxes
ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
mm/mincore.c: make mincore() more conservative
ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
ext4: actually request zeroing of inode table after grow
ext4: fix ext4_show_options for file systems w/o journal
Btrfs: do not start a transaction at iterate_extent_inodes()
bcache: fix a race between cache register and cacheset unregister
bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
ipmi:ssif: compare block number correctly for multi-part return messages
crypto: gcm - Fix error return code in crypto_gcm_create_common()
crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
crypto: chacha20poly1305 - set cra_name correctly
crypto: salsa20 - don't access already-freed walk.iv
crypto: arm/aes-neonbs - don't access already-freed walk.iv
writeback: synchronize sync(2) against cgroup writeback membership switches
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
ext4: zero out the unused memory region in the extent tree block
ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
net: avoid weird emergency message
net/mlx4_core: Change the error print to info print
ppp: deflate: Fix possible crash in deflate_init
tipc: switch order of device registration to fix a crash
tipc: fix modprobe tipc failed after switch order of device registration
stm class: Fix channel free in stm output free path
md: add mddev->pers to avoid potential NULL pointer dereference
intel_th: msu: Fix single mode with IOMMU
of: fix clang -Wunsequenced for be32_to_cpu()
cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()
media: ov6650: Fix sensor possibly not detected on probe
NFS4: Fix v4.0 client state corruption when mount
clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
fuse: fix writepages on 32bit
fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
ceph: flush dirty inodes before proceeding with remount
tracing: Fix partial reading of trace event's id file
memory: tegra: Fix integer overflow on tick value calculation
perf intel-pt: Fix instructions sampling rate
perf intel-pt: Fix improved sample timestamp
perf intel-pt: Fix sample timestamp wrt non-taken branches
fbdev: sm712fb: fix brightness control on reboot, don't set SR30
fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75
fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F
fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA
fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM
fbdev: sm712fb: fix support for 1024x768-16 mode
fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display
fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting
PCI: Mark Atheros AR9462 to avoid bus reset
dm delay: fix a crash when invalid device is specified
xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink
xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module
vti4: ipip tunnel deregistration fixes.
xfrm4: Fix uninitialized memory read in _decode_session4
KVM: arm/arm64: Ensure vcpu target is unset on reset failure
power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
perf bench numa: Add define for RUSAGE_THREAD if not present
Revert "Don't jump to compute_result state from check_result state"
md/raid: raid5 preserve the writeback action after the parity check
btrfs: Honour FITRIM range constraints during free space trim
fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough
ext4: do not delete unlinked inode from orphan list on failed truncate
KVM: x86: fix return value for reserved EFER
bio: fix improper use of smp_mb__before_atomic()
Revert "scsi: sd: Keep disk read-only when re-reading partition"
crypto: vmx - CTR: always increment IV as quadword
gfs2: Fix sign extension bug in gfs2_update_stats
Btrfs: fix race between ranged fsync and writeback of adjacent ranges
btrfs: sysfs: don't leak memory when failing add fsid
fbdev: fix divide error in fb_var_to_videomode
hugetlb: use same fault hash key for shared and private mappings
fbdev: fix WARNING in __alloc_pages_nodemask bug
media: cpia2: Fix use-after-free in cpia2_exit
media: vivid: use vfree() instead of kfree() for dev->bitmap_cap
ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit
at76c50x-usb: Don't register led_trigger if usb_register_driver failed
perf tools: No need to include bitops.h in util.h
tools include: Adopt linux/bits.h
gfs2: Fix lru_count going negative
cxgb4: Fix error path in cxgb4_init_module
mmc: core: Verify SD bus width
powerpc/boot: Fix missing check of lseek() return value
ASoC: imx: fix fiq dependencies
spi: pxa2xx: fix SCR (divisor) calculation
brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler()
rtc: 88pm860x: prevent use-after-free on device remove
w1: fix the resume command API
dmaengine: pl330: _stop: clear interrupt status
mac80211/cfg80211: update bss channel on channel switch
ASoC: fsl_sai: Update is_slave_mode with correct value
mwifiex: prevent an array overflow
net: cw1200: fix a NULL pointer dereference
bcache: return error immediately in bch_journal_replay()
bcache: fix failure in journal relplay
bcache: add failure check to run_cache_set() for journal replay
bcache: avoid clang -Wunintialized warning
x86/build: Move _etext to actual end of .text
smpboot: Place the __percpu annotation correctly
x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()
mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
HID: logitech-hidpp: use RAP instead of FAP to get the protocol version
pinctrl: pistachio: fix leaked of_node references
dmaengine: at_xdmac: remove BUG_ON macro in tasklet
media: coda: clear error return value before picture run
media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper
media: au0828: stop video streaming only when last user stops
media: ov2659: make S_FMT succeed even if requested format doesn't match
audit: fix a memory leak bug
media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable()
media: pvrusb2: Prevent a buffer overflow
powerpc/numa: improve control of topology updates
sched/core: Check quota and period overflow at usec to nsec conversion
sched/core: Handle overflow in cpu_shares_write_u64
USB: core: Don't unbind interfaces following device reset failure
x86/irq/64: Limit IST stack overflow check to #DB stack
i40e: don't allow changes to HW VLAN stripping on active port VLANs
RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
hwmon: (vt1211) Use request_muxed_region for Super-IO accesses
hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses
hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses
hwmon: (pc87427) Use request_muxed_region for Super-IO accesses
hwmon: (f71805f) Use request_muxed_region for Super-IO accesses
scsi: libsas: Do discovery on empty PHY to update PHY info
mmc_spi: add a status check for spi_sync_locked
mmc: sdhci-of-esdhc: add erratum eSDHC5 support
mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support
PM / core: Propagate dev->power.wakeup_path when no callbacks
extcon: arizona: Disable mic detect if running when driver is removed
s390: cio: fix cio_irb declaration
cpufreq: ppc_cbe: fix possible object reference leak
cpufreq/pasemi: fix possible object reference leak
cpufreq: pmac32: fix possible object reference leak
x86/build: Keep local relocations with ld.lld
iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
iio: hmc5843: fix potential NULL pointer dereferences
iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data
rtlwifi: fix a potential NULL pointer dereference
brcmfmac: fix missing checks for kmemdup
b43: shut up clang -Wuninitialized variable warning
brcmfmac: convert dev_init_lock mutex to completion
brcmfmac: fix race during disconnect when USB completion is in progress
scsi: ufs: Fix regulator load and icc-level configuration
scsi: ufs: Avoid configuring regulator with undefined voltage range
arm64: cpu_ops: fix a leaked reference by adding missing of_node_put
x86/ia32: Fix ia32_restore_sigcontext() AC leak
chardev: add additional check for minor range overlap
HID: core: move Usage Page concatenation to Main item
ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put
ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put
cxgb3/l2t: Fix undefined behaviour
spi: tegra114: reset controller on probe
media: wl128x: prevent two potential buffer overflows
virtio_console: initialize vtermno value for ports
tty: ipwireless: fix missing checks for ioremap
rcutorture: Fix cleanup path for invalid torture_type strings
usb: core: Add PM runtime calls to usb_hcd_platform_shutdown
scsi: qla4xxx: avoid freeing unallocated dma memory
media: m88ds3103: serialize reset messages in m88ds3103_set_frontend
media: go7007: avoid clang frame overflow warning with KASAN
media: saa7146: avoid high stack usage with clang
scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices
spi : spi-topcliff-pch: Fix to handle empty DMA buffers
spi: rspi: Fix sequencer reset during initialization
spi: Fix zero length xfer bug
ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM
ipv6: Consider sk_bound_dev_if when binding a raw socket to an address
llc: fix skb leak in llc_build_and_send_ui_pkt()
net-gro: fix use-after-free read in napi_gro_frags()
net: stmmac: fix reset gpio free missing
usbnet: fix kernel crash after disconnect
tipc: Avoid copying bytes beyond the supplied data
bnxt_en: Fix aggregation buffer leak under OOM condition.
net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
crypto: vmx - ghash: do nosimd fallback manually
xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
Revert "tipc: fix modprobe tipc failed after switch order of device registration"
tipc: fix modprobe tipc failed after switch order of device registration -v2
sparc64: Fix regression in non-hypervisor TLB flush xcall
include/linux/bitops.h: sanitize rotate primitives
xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
usb: xhci: avoid null pointer deref when bos field is NULL
USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor
USB: sisusbvga: fix oops in error path of sisusb_probe
USB: Add LPM quirk for Surface Dock GigE adapter
USB: rio500: refuse more than one device at a time
USB: rio500: fix memory leak in close after disconnect
media: usb: siano: Fix general protection fault in smsusb
media: usb: siano: Fix false-positive "uninitialized variable" warning
media: smsusb: better handle optional alignment
scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
Btrfs: fix race updating log root item during fsync
ALSA: hda/realtek - Set default power save node to 0
drm/nouveau/i2c: Disable i2c bus access after ->fini()
tty: serial: msm_serial: Fix XON/XOFF
tty: max310x: Fix external crystal register setup
memcg: make it work on sparse non-0-node systems
kernel/signal.c: trace_signal_deliver when signal_group_exit
CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM
binder: Replace "%p" with "%pK" for stable
binder: replace "%p" with "%pK"
net: create skb_gso_validate_mac_len()
bnx2x: disable GSO where gso_size is too big for hardware
brcmfmac: Add length checks on firmware events
brcmfmac: screening firmware event packet
brcmfmac: revise handling events in receive path
brcmfmac: fix incorrect event channel deduction
brcmfmac: add length checks in scheduled scan result handler
brcmfmac: add subtype check for event handling in data path
userfaultfd: don't pin the user memory in userfaultfd_file_create()
Revert "x86/build: Move _etext to actual end of .text"
net: cdc_ncm: GetNtbFormat endian fix
usb: gadget: fix request length error for isoc transfer
media: uvcvideo: Fix uvc_alloc_entity() allocation alignment
ethtool: fix potential userspace buffer overflow
neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query
net: rds: fix memory leak in rds_ib_flush_mr_pool
pktgen: do not sleep with the thread lock held.
rcu: locking and unlocking need to always be at least barriers
parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
fuse: fallocate: fix return with locked inode
MIPS: pistachio: Build uImage.gz by default
genwqe: Prevent an integer overflow in the ioctl
drm/gma500/cdv: Check vbt config bits when detecting lvds panels
fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
fuse: Add FOPEN_STREAM to use stream_open()
ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled
ethtool: check the return value of get_regs_len
Linux 4.4.181
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Conflicts:
drivers/android/binder.c
[ Upstream commit b813afae7ab6a5e91b4e16cc567331d9c2ae1f04 ]
If the specified rcutorture.torture_type is not in the rcu_torture_init()
function's torture_ops[] array, rcutorture prints some console messages
and then invokes rcu_torture_cleanup() to set state so that a future
torture test can run. However, rcu_torture_cleanup() also attempts to
end the test that didn't actually start, and in doing so relies on the
value of cur_ops, a value that is not particularly relevant in this case.
This can result in confusing output or even follow-on failures due to
attempts to use facilities that have not been properly initialized.
This commit therefore sets the value of cur_ops to NULL in this case
and inserts a check near the beginning of rcu_torture_cleanup(),
thus avoiding relying on an irrelevant cur_ops value.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 4.4.177: (231 commits)
ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
KEYS: allow reaching the keys quotas exactly
mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
mfd: twl-core: Fix section annotations on {,un}protect_pm_master
mfd: db8500-prcmu: Fix some section annotations
mfd: ab8500-core: Return zero in get_register_interruptible()
mfd: qcom_rpm: write fw_version to CTRL_REG
mfd: wm5110: Add missing ASRC rate register
mfd: mc13xxx: Fix a missing check of a register-read failure
net: hns: Fix use after free identified by SLUB debug
MIPS: ath79: Enable OF serial ports in the default config
scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
scsi: isci: initialize shost fully before calling scsi_add_host()
MIPS: jazz: fix 64bit build
isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
atm: he: fix sign-extension overflow on large shift
leds: lp5523: fix a missing check of return value of lp55xx_read
isdn: avm: Fix string plus integer warning from Clang
RDMA/srp: Rework SCSI device reset handling
KEYS: user: Align the payload buffer
KEYS: always initialize keyring_index_key::desc_len
batman-adv: fix uninit-value in batadv_interface_tx()
net/packet: fix 4gb buffer limit due to overflow check
team: avoid complex list operations in team_nl_cmd_options_set()
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
ARCv2: Enable unaligned access in early ASM code
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
libceph: handle an empty authorize reply
scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
drm/msm: Unblock writer if reader closes file
ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
ALSA: compress: prevent potential divide by zero bugs
thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
usb: gadget: Potential NULL dereference on allocation error
ASoC: dapm: change snprintf to scnprintf for possible overflow
ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
ARC: fix __ffs return value to avoid build warnings
mac80211: fix miscounting of ttl-dropped frames
serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
net: altera_tse: fix connect_local_phy error path
ibmveth: Do not process frames after calling napi_reschedule
mac80211: don't initiate TDLS connection if station is not associated to AP
cfg80211: extend range deviation for DMG
KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
arm/arm64: KVM: Feed initialized memory to MMIO accesses
KVM: arm/arm64: Fix MMIO emulation data handling
powerpc: Always initialize input array when calling epapr_hypercall()
mmc: spi: Fix card detection during probe
mm: enforce min addr even if capable() in expand_downwards()
x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
USB: serial: option: add Telit ME910 ECM composition
USB: serial: cp210x: add ID for Ingenico 3070
USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
cpufreq: Use struct kobj_attribute instead of struct global_attr
sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
ncpfs: fix build warning of strncpy
isdn: isdn_tty: fix build warning of strncpy
staging: lustre: fix buffer overflow of string buffer
net-sysfs: Fix mem leak in netdev_register_kobject
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
team: Free BPF filter when unregistering netdev
bnxt_en: Drop oversize TX packets to prevent errors.
net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
xen-netback: fix occasional leak of grant ref mappings under memory pressure
net: Add __icmp_send helper.
net: avoid use IPCB in cipso_v4_error
net: phy: Micrel KSZ8061: link failure after cable connect
x86/CPU/AMD: Set the CPB bit unconditionally on F17h
applicom: Fix potential Spectre v1 vulnerabilities
MIPS: irq: Allocate accurate order pages for irq stack
hugetlbfs: fix races and page leaks during migration
netlabel: fix out-of-bounds memory accesses
net: dsa: mv88e6xxx: Fix u64 statistics
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
media: uvcvideo: Fix 'type' check leading to overflow
vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
perf tools: Handle TOPOLOGY headers with no CPU
IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
ipvs: Fix signed integer overflow when setsockopt timeout
iommu/amd: Fix IOMMU page flush when detach device from a domain
xtensa: SMP: fix ccount_timer_shutdown
xtensa: SMP: fix secondary CPU initialization
xtensa: smp_lx200_defconfig: fix vectors clash
xtensa: SMP: mark each possible CPU as present
xtensa: SMP: limit number of possible CPUs by NR_CPUS
net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
gpio: vf610: Mask all GPIO interrupts
nfs: Fix NULL pointer dereference of dev_name
scsi: libfc: free skb when receiving invalid flogi resp
platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
cifs: fix computation for MAX_SMB2_HDR_SIZE
x86/kexec: Don't setup EFI info if EFI runtime is not enabled
x86_64: increase stack size for KASAN_EXTRA
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
autofs: drop dentry reference only when it is never used
autofs: fix error return in autofs_fill_super()
ARM: pxa: ssp: unneeded to free devm_ allocated data
irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
dmaengine: dmatest: Abort test in case of mapping error
s390/qeth: fix use-after-free in error path
perf symbols: Filter out hidden symbols from labels
MIPS: Remove function size check in get_frame_info()
Input: wacom_serial4 - add support for Wacom ArtPad II tablet
Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
iscsi_ibft: Fix missing break in switch statement
futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
Revert "x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls"
ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420
udplite: call proper backlog handlers
netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES
netfilter: nfnetlink_log: just returns error for unknown command
netfilter: nfnetlink_acct: validate NFACCT_FILTER parameters
netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options
KEYS: restrict /proc/keys by credentials at open time
l2tp: fix infoleak in l2tp_ip6_recvmsg()
net: hsr: fix memory leak in hsr_dev_finalize()
net: sit: fix UBSAN Undefined behaviour in check_6rd
net/x25: fix use-after-free in x25_device_event()
net/x25: reset state in x25_connect()
pptp: dst_release sk_dst_cache in pptp_sock_destruct
ravb: Decrease TxFIFO depth of Q3 and Q2 to one
route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race
tcp: handle inet_csk_reqsk_queue_add() failures
net/mlx4_core: Fix reset flow when in command polling mode
net/mlx4_core: Fix qp mtt size calculation
net/x25: fix a race in x25_bind()
mdio_bus: Fix use-after-free on device_register fails
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
missing barriers in some of unix_sock ->addr and ->path accesses
ipvlan: disallow userns cap_net_admin to change global mode/flags
vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
vxlan: Fix GRO cells race condition between receive and link delete
net/hsr: fix possible crash in add_timer()
gro_cells: make sure device is up in gro_cells_receive()
tcp/dccp: remove reqsk_put() from inet_child_forget()
ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56
fs/9p: use fscache mutex rather than spinlock
It's wrong to add len to sector_nr in raid10 reshape twice
media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
9p: use inode->i_lock to protect i_size_write() under 32-bit
9p/net: fix memory leak in p9_client_create
ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
stm class: Fix an endless loop in channel allocation
crypto: caam - fixed handling of sg list
crypto: ahash - fix another early termination in hash walk
gpu: ipu-v3: Fix i.MX51 CSI control registers offset
gpu: ipu-v3: Fix CSI offsets for imx53
s390/dasd: fix using offset into zero size array error
ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
Input: matrix_keypad - use flush_delayed_work()
i2c: cadence: Fix the hold bit setting
Input: st-keyscan - fix potential zalloc NULL dereference
ARM: 8824/1: fix a migrating irq bug when hotplug cpu
assoc_array: Fix shortcut creation
scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
net: systemport: Fix reception of BPDUs
pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
ASoC: topology: free created components in tplg load error
arm64: Relax GIC version check during early boot
tmpfs: fix link accounting when a tmpfile is linked in
ARC: uacces: remove lp_start, lp_end from clobber list
phonet: fix building with clang
mac80211_hwsim: propagate genlmsg_reply return code
net: set static variable an initial value in atl2_probe()
tmpfs: fix uninitialized return value in shmem_link
stm class: Prevent division by zero
crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
CIFS: Fix read after write for files with read caching
tracing: Do not free iter->trace in fail path of tracing_open_pipe()
ACPI / device_sysfs: Avoid OF modalias creation for removed device
regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
regulator: s2mpa01: Fix step values for some LDOs
clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
s390/virtio: handle find on invalid queue gracefully
scsi: virtio_scsi: don't send sc payload with tmfs
scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
m68k: Add -ffreestanding to CFLAGS
btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
Btrfs: fix corruption reading shared and compressed extents after hole punching
crypto: pcbc - remove bogus memcpy()s with src == dest
cpufreq: tegra124: add missing of_node_put()
cpufreq: pxa2xx: remove incorrect __init annotation
ext4: fix crash during online resizing
ext2: Fix underflow in ext2_max_size()
clk: ingenic: Fix round_rate misbehaving with non-integer dividers
dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit
mm/vmalloc: fix size check for remap_vmalloc_range_partial()
kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
intel_th: Don't reference unassigned outputs
parport_pc: fix find_superio io compare code, should use equal test.
i2c: tegra: fix maximum transfer size
perf bench: Copy kernel files needed to build mem{cpy,set} x86_64 benchmarks
serial: 8250_pci: Fix number of ports for ACCES serial cards
serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
jbd2: clear dirty flag when revoking a buffer from an older transaction
jbd2: fix compile warning when using JBUFFER_TRACE
powerpc/32: Clear on-stack exception marker upon exception return
powerpc/wii: properly disable use of BATs when requested.
powerpc/powernv: Make opal log only readable by root
powerpc/83xx: Also save/restore SPRG4-7 during suspend
ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
dm: fix to_sector() for 32bit
NFS41: pop some layoutget errors to application
perf intel-pt: Fix CYC timestamp calculation after OVF
perf auxtrace: Define auxtrace record alignment
perf intel-pt: Fix overlap calculation for padding
md: Fix failed allocation of md_register_thread
NFS: Fix an I/O request leakage in nfs_do_recoalesce
NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
nfsd: fix memory corruption caused by readdir
nfsd: fix wrong check in write_v4_end_grace()
PM / wakeup: Rework wakeup source timer cancellation
rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
drm/radeon/evergreen_cs: fix missing break in switch statement
KVM: nVMX: Sign extend displacements of VMX instr's mem operands
KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
KVM: X86: Fix residual mmio emulation request to userspace
Linux 4.4.177
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Conflicts:
arch/arm/kernel/irq.c
sound/core/compress_offload.c
commit 1d1f898df6586c5ea9aeaf349f13089c6fa37903 upstream.
The rcu_gp_kthread_wake() function is invoked when it might be necessary
to wake the RCU grace-period kthread. Because self-wakeups are normally
a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
this kthread, it naturally refuses to do the wakeup.
Unfortunately, natural though it might be, this heuristic fails when
rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
that interrupted the grace-period kthread just after the final check of
the wait-event condition but just before the schedule() call. In this
case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
is within the RCU grace-period kthread's context. Failing to provide
this wakeup can result in grace periods failing to start, which in turn
results in out-of-memory conditions.
This race window is quite narrow, but it actually did happen during real
testing. It would of course need to be fixed even if it was strictly
theoretical in nature.
This patch does not Cc stable because it does not apply cleanly to
earlier kernel versions.
Fixes: 48a7639ce8 ("rcu: Make callers awaken grace-period kthread")
Reported-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com>
Co-developed-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "xiao, jin" <jin.xiao@intel.com>
Co-developed-by: Bai, Jie A <jie.a.bai@intel.com>
Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off: "He, Bo" <bo.he@intel.com>
Signed-off: "xiao, jin" <jin.xiao@intel.com>
Signed-off: Bai, Jie A <jie.a.bai@intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
[ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
!in_serving_softirq() to avoid redundant wakeups and to also handle the
interrupt-handler scenario as well as the softirq-handler scenario that
actually occurred in testing. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.4.174: (35 commits)
inet: frags: change inet_frags_init_net() return value
inet: frags: add a pointer to struct netns_frags
inet: frags: refactor ipfrag_init()
inet: frags: refactor ipv6_frag_init()
inet: frags: refactor lowpan_net_frag_init()
rhashtable: add rhashtable_lookup_get_insert_key()
rhashtable: Add rhashtable_lookup()
rhashtable: add schedule points
inet: frags: use rhashtables for reassembly units
net: ieee802154: 6lowpan: fix frag reassembly
ipfrag: really prevent allocation on netns exit
inet: frags: remove some helpers
inet: frags: get rif of inet_frag_evicting()
inet: frags: remove inet_frag_maybe_warn_overflow()
inet: frags: break the 2GB limit for frags storage
inet: frags: do not clone skb in ip_expire()
ipv6: frags: rewrite ip6_expire_frag_queue()
rhashtable: reorganize struct rhashtable layout
inet: frags: reorganize struct netns_frags
inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
inet: frags: fix ip6frag_low_thresh boundary
ip: discard IPv4 datagrams with overlapping segments.
net: modify skb_rbtree_purge to return the truesize of all purged skbs.
ipv6: defrag: drop non-last frags smaller than min mtu
net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
ip: use rb trees for IP frag queue.
ip: add helpers to process in-order fragments faster.
ip: process in-order fragments efficiently
ip: frags: fix crash in ip_do_fragment()
ipv4: frags: precedence bug in ip_expire()
inet: frags: better deal with smp races
net: fix pskb_trim_rcsum_slow() with odd trim offset
net: ipv4: do not handle duplicate fragments as overlapping
rcu: Force boolean subscript for expedited stall warnings
Linux 4.4.174
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
commit ec3833ed02ae6ef2a933ece9de7cbab0c64c699e upstream.
The cpu_online() function can return values other than 0 and 1, which
can result in subscript overflow when applied to a two-element array.
This commit allows for this behavior by using "!!" on the return value
from cpu_online() when used as a subscript.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "Rantala, Tommi" <tommi.t.rantala@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28585a832602747cbfa88ad8934013177a3aae38 upstream.
A number of architecture invoke rcu_irq_enter() on exception entry in
order to allow RCU read-side critical sections in the exception handler
when the exception is from an idle or nohz_full CPU. This works, at
least unless the exception happens in an NMI handler. In that case,
rcu_nmi_enter() would already have exited the extended quiescent state,
which would mean that rcu_irq_enter() would (incorrectly) cause RCU
to think that it is again in an extended quiescent state. This will
in turn result in lockdep splats in response to later RCU read-side
critical sections.
This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
take no action if there is an rcu_nmi_enter() in effect, thus avoiding
the unscheduled return to RCU quiescent state. This in turn should
make the kernel safe for on-demand RCU voyeurism.
Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Every RCU stall need to be debugged, So collect the ram
dumps on every RCU stall to debug further by inducing
non secure watchdog bite whenever rcu stall detected.
Change-Id: I6c1cfddc92f06b48c3f22fe9970b70f2ec670bf6
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.
kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).
Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.
This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.
We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:
https://github.com/google/syzkaller/wiki/Found-Bugs
We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.
Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.
kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.
Based on a patch by Quentin Casasnovas.
[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5c9a8750a6409c63a0f01d51a9024861022f6593)
Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: Ia7d116ded35eb31a514406f864d7d591606c65f3
x86_64:allmodconfig fails to build with the following error.
ERROR: "rcu_sync_lockdep_assert" [kernel/locking/locktorture.ko] undefined!
Introduced by commit 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem:
Optimize readers and reduce global impact"). The applied upstream version
exports the missing symbol, so let's do the same.
Change-Id: If4e516715c3415fe8c82090f287174857561550d
Fixes: 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem: Optimize ...")
Signed-off-by: Guenter Roeck <groeck@chromium.org>
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.
The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.
Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").
We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.
Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.
The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.
Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").
We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.
Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: 0c3240a1ef
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
Earlier versions of synchronize_sched_expedited() can prematurely end
grace periods due to the fact that a CPU marked as cpu_is_offline()
can still be using RCU read-side critical sections during the time that
CPU makes its last pass through the scheduler and into the idle loop
and during the time that a given CPU is in the process of coming online.
This commit therefore eliminates this window by adding additional
interaction with the CPU-hotplug operations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit redirects synchronize_rcu_expedited()'s wait to
synchronize_sched_expedited_wait(), thus enabling RCU CPU
stall warnings.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds task-print ability to the expedited RCU CPU stall
warning messages in preparation for adding stall warnings to
synchornize_rcu_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after the CPU number. A "O" indicates global
offline, a "." global online, and a "o" indicates RCU believes that the
CPU is offline for the current grace period and "." otherwise, and an
"N" indicates that RCU believes that the CPU will be offline for the
next grace period, and "." otherwise, all right after the CPU number.
So for CPU 10, you would normally see "10-...:" indicating that everything
believes that the CPU is online.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus()
are identical aside from the the argument to smp_call_function_single(),
this commit consolidates them with a functional argument.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit brings sync_sched_exp_select_cpus() into alignment with
sync_rcu_exp_select_cpus(), as a first step towards consolidating them
into one function.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that synchronize_sched_expedited() uses IPIs, a hook in
rcu_sched_qs(), and the ->expmask field in the rcu_node combining
tree, it is no longer necessary to exclude CPU hotplug. Any
races with CPU hotplug will be detected when attempting to send
the IPI. This commit therefore removes the code excluding
CPU hotplug operations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This reverts commit af859beaab (rcu: Silence lockdep false positive
for expedited grace periods). Because synchronize_rcu_expedited()
no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
acquisition is no longer nested, so the false positive no longer happens.
This commit therefore removes the extra lockdep data structures, as they
are no longer needed.
This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
to smp_call_function_single(), thus moving from an IPI and a pair of
context switches to an IPI and a single pass through the scheduler.
Of course, if the scheduler actually does decide to switch to a different
task, there will still be a pair of context switches, but there would
likely have been a pair of context switches anyway, just a bit later.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The torturing_tasks() function is used only in kernels built with
CONFIG_PROVE_RCU=y, so the second definition can result in unused-function
compiler warnings. This commit adds __maybe_unused to suppress these
warnings.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The rcutorture module has a list of torture types, and specifying a
type not on this list is supposed to cleanly fail the module load.
Unfortunately, the "fail" happens without the "cleanly". This commit
therefore adds the needed clean-up after an incorrect torture_type.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
change it to use rcu_lockdep_assert().
2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
unconditonally, this way we can remove the same check from
rcu_sync_lockdep_assert() and clearly isolate the debugging
code.
Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but
this needs some simple preparations in the core RCU code to avoid the
code duplication.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit allows rcu_sync structures to be safely deallocated,
The trick is to add a new ->wait field to the gp_ops array.
This field is a pointer to the rcu_barrier() function corresponding
to the flavor of RCU in question. This allows a new rcu_sync_dtor()
to wait for any outstanding callbacks before freeing the rcu_sync
structure.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit validates that the caller of rcu_sync_is_idle() holds the
corresponding type of RCU read-side lock, but only in kernels built
with CONFIG_PROVE_RCU=y. This validation is carried out via a new
rcu_sync_ops->held() method that is checked within rcu_sync_is_idle().
Note that although this does add code to the fast path, it only does so
in kernels built with CONFIG_PROVE_RCU=y.
Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit adds the new struct rcu_sync_ops which holds sync/call
methods, and turns the function pointers in rcu_sync_struct into an array
of struct rcu_sync_ops. This simplifies the "init" helpers by collapsing
a switch statement and explicit multiple definitions into a simple
assignment and a helper macro, respectively.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The rcu_sync infrastructure can be thought of as infrastructure to be
used to implement reader-writer primitives having extremely lightweight
readers during times when there are no writers. The first use is in
the percpu_rwsem used by the VFS subsystem.
This infrastructure is functionally equivalent to
struct rcu_sync_struct {
atomic_t counter;
};
/* Check possibility of fast-path read-side operations. */
static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
{
return atomic_read(&rss->counter) == 0;
}
/* Tell readers to use slowpaths. */
static inline void rcu_sync_enter(struct rcu_sync_struct *rss)
{
atomic_inc(&rss->counter);
synchronize_sched();
}
/* Allow readers to once again use fastpaths. */
static inline void rcu_sync_exit(struct rcu_sync_struct *rss)
{
synchronize_sched();
atomic_dec(&rss->counter);
}
The main difference is that it records the state and only calls
synchronize_sched() if required. At least some of the calls to
synchronize_sched() will be optimized away when rcu_sync_enter() and
rcu_sync_exit() are invoked repeatedly in quick succession.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit moves cond_resched_rcu_qs() into stutter_wait(), saving
a line and also avoiding RCU CPU stall warnings from all torture
loops containing a stutter_wait().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit corrects the comment for the values of the ->gp_state field,
which previously incorrectly said that these were for the ->gp_flags
field.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Commit commit 4cdfc175c2 ("rcu: Move quiescent-state forcing
into kthread") started the process of folding the old ->fqs_state into
->gp_state, but did not complete it. This situation does not cause
any malfunction, but can result in extremely confusing trace output.
This commit completes this task of eliminating ->fqs_state in favor
of ->gp_state.
The old ->fqs_state was also used to decide when to collect dyntick-idle
snapshots. For this purpose, we add a boolean variable into the kthread,
which is set on the first call to rcu_gp_fqs() for a given grace period
and clear otherwise.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Currently, __srcu_read_lock() cannot be invoked from restricted
environments because it contains calls to preempt_disable() and
preempt_enable(), both of which can invoke lockdep, which is a bad
idea in some restricted execution modes. This commit therefore moves
the preempt_disable() and preempt_enable() from __srcu_read_lock()
to srcu_read_lock(). It also inserts the preempt_disable() and
preempt_enable() around the call to __srcu_read_lock() in do_exit().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after a hyphen following the CPU number. A "O"
indicates that the global CPU-hotplug system believes that the CPU is
online, a "o" that RCU perceived the CPU to be online at the beginning
of the current expedited grace period, and an "N" that RCU currently
believes that it will perceive the CPU as being online at the beginning
of the next expedited grace period, with "." otherwise for all three
indications. So for CPU 10, you would normally see "10-OoN:" indicating
that everything believes that the CPU is online.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit loosens rcutree.rcu_fanout_leaf range checks
and replaces a panic() with a fallback to compile-time values.
This fallback is accompanied by a WARN_ON(), and both occur when the
rcutree.rcu_fanout_leaf value is too small to accommodate the number of
CPUs. For example, given the current four-level limit for the rcu_node
tree, a system with more than 16 CPUs built with CONFIG_FANOUT=2 must
have rcutree.rcu_fanout_leaf larger than 2.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Because preempt_disable() maps to barrier() for non-debug builds,
it forces the compiler to spill and reload registers. Because Tree
RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
barrier() instances generate needless extra code for each instance of
rcu_read_lock() and rcu_read_unlock(). This extra code slows down Tree
RCU and bloats Tiny RCU.
This commit therefore removes the preempt_disable() and preempt_enable()
from the non-preemptible implementations of __rcu_read_lock() and
__rcu_read_unlock(), respectively. However, for debug purposes,
preempt_disable() and preempt_enable() are still invoked if
CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
atomic sections in non-preemptible kernels.
However, Tiny and Tree RCU operates by coalescing all RCU read-side
critical sections on a given CPU that lie between successive quiescent
states. It is therefore necessary to compensate for removing barriers
from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
couple of the RCU functions invoked during quiescent states, namely to
rcu_all_qs() and rcu_note_context_switch(). However, note that the latter
is more paranoia than necessity, at least until link-time optimizations
become more aggressive.
This is based on an earlier patch by Paul E. McKenney, fixing
a bug encountered in kernels built with CONFIG_PREEMPT=n and
CONFIG_PREEMPT_COUNT=y.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
We have had the call_rcu_func_t typedef for a quite awhile, but we still
use explicit function pointer types in some places. These types can
confuse cscope and can be hard to read. This patch therefore replaces
these types with the call_rcu_func_t typedef.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.
Besides, this can also help cscope to generate a better database for
code reading.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit converts the rcu_data structure's ->cpu_no_qs field
to a union. The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period. The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.
For now, only .norm is used. A later commit will introduce the expedited
usage.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs. This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.
So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check. Otherwise, it needs to report the next
quiescent state.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, synchronize_sched_expedited() uses a single global counter
to track the number of remaining context switches that the current
expedited grace period must wait on. This is problematic on large
systems, where the resulting memory contention can be pathological.
This commit therefore makes synchronize_sched_expedited() instead use
the combining tree in the same manner as synchronize_rcu_expedited(),
keeping memory contention down to a dull roar.
This commit creates a temporary function sync_sched_exp_select_cpus()
that is very similar to sync_rcu_exp_select_cpus(). A later commit
will consolidate these two functions, which becomes possible when
synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
smp_call_function_single().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The current preemptible-RCU expedited grace-period algorithm invokes
synchronize_sched_expedited() to enqueue all tasks currently running
in a preemptible-RCU read-side critical section, then waits for all the
->blkd_tasks lists to drain. This works, but results in both an IPI and
a double context switch even on CPUs that do not happen to be running
in a preemptible RCU read-side critical section.
This commit implements a new algorithm that causes less OS jitter.
This new algorithm IPIs all online CPUs that are not idle (from an
RCU perspective), but refrains from self-IPIs. If a CPU receiving
this IPI is not in a preemptible RCU read-side critical section (or
is just now exiting one), it pushes quiescence up the rcu_node tree,
otherwise, it sets a flag that will be handled by the upcoming outermost
rcu_read_unlock(), which will then push quiescence up the tree.
The expedited grace period must of course wait on any pre-existing blocked
readers, and newly blocked readers must be queued carefully based on
the state of both the normal and the expedited grace periods. This
new queueing approach also avoids the need to update boost state,
courtesy of the fact that blocked tasks are no longer ever migrated to
the root rcu_node structure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit replaces sync_rcu_preempt_exp_init1(() and
sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
and sync_exp_reset_tree(), which will also be used by
synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
contains code specific to synchronize_rcu_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>