* refs/heads/tmp-dabb11d:
Revert "dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example"
Linux 4.19.94
perf/x86/intel/bts: Fix the use of page_private()
xen/blkback: Avoid unmapping unmapped grant pages
s390/smp: fix physical to logical CPU map for SMT
ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps
net: add annotations on hh->hh_len lockless accesses
xfs: periodically yield scrub threads to the scheduler
ath9k_htc: Discard undersized packets
ath9k_htc: Modify byte order for an error message
net: core: limit nested device depth
tcp: annotate tp->rcv_nxt lockless reads
rxrpc: Fix possible NULL pointer access in ICMP handling
KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag
selftests: rtnetlink: add addresses with fixed life time
powerpc/pseries/hvconsole: Fix stack overread via udbg
drm/mst: Fix MST sideband up-reply failure handling
scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails
bdev: Refresh bdev size for disks without partitioning
bdev: Factor out bdev revalidation into a common helper
fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
tty: serial: msm_serial: Fix lockup for sysrq and oops
arm64: dts: meson: odroid-c2: Disable usb_otg bus to avoid power failed warning
dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example
media: usb: fix memory leak in af9005_identify_state
regulator: ab8500: Remove AB8505 USB regulator
media: flexcop-usb: ensure -EIO is returned on error condition
Bluetooth: Fix memory leak in hci_connect_le_scan
Bluetooth: delete a stray unlock
Bluetooth: btusb: fix PM leak in error case of setup
platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
xfs: don't check for AG deadlock for realtime files in bunmapi
ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100
HID: i2c-hid: Reset ALPS touchpads on resume
nfsd4: fix up replay_matches_cache()
PM / devfreq: Check NULL governor in available_governors_show
drm/msm: include linux/sched/task.h
ftrace: Avoid potential division by zero in function profiler
arm64: Revert support for execute-only user mappings
exit: panic before exit_mm() on global init exit
ALSA: firewire-motu: Correct a typo in the clock proc string
ALSA: cs4236: fix error return comparison of an unsigned integer
apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
tracing: Fix endianness bug in histogram trigger
tracing: Have the histogram compare functions convert to u64 first
tracing: Avoid memory leak in process_system_preds()
tracing: Fix lock inversion in trace_event_enable_tgid_record()
rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30
riscv: ftrace: correct the condition logic in function graph tracer
gpiolib: fix up emulated open drain outputs
libata: Fix retrieving of active qcs
ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINE
ata: ahci_brcm: Add missing clock management during recovery
ata: ahci_brcm: Allow optional reset controller to be used
ata: ahci_brcm: Fix AHCI resources management
ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE
compat_ioctl: block: handle Persistent Reservations
dmaengine: Fix access to uninitialized dma_slave_caps
locks: print unsigned ino in /proc/locks
pstore/ram: Write new dumps to start of recycled zones
mm: move_pages: return valid node id in status if the page is already on the target node
memcg: account security cred as well to kmemcg
mm/zsmalloc.c: fix the migrated zspage statistics.
media: cec: check 'transmit_in_progress', not 'transmitting'
media: cec: avoid decrementing transmit_queue_sz if it is 0
media: cec: CEC 2.0-only bcast messages were ignored
media: pulse8-cec: fix lost cec_transmit_attempt_done() call
MIPS: Avoid VDSO ABI breakage due to global register variable
drm/sun4i: hdmi: Remove duplicate cleanup calls
ALSA: hda/realtek - Add headset Mic no shutup for ALC283
ALSA: usb-audio: set the interface format after resume on Dell WD19
ALSA: usb-audio: fix set_format altsetting sanity check
ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
netfilter: nft_tproxy: Fix port selector on Big Endian
drm: limit to INT_MAX in create_blob ioctl
taskstats: fix data-race
xfs: fix mount failure crash on invalid iclog memory access
ALSA: hda - fixup for the bass speaker on Lenovo Carbon X1 7th gen
ALSA: hda/realtek - Enable the bass speaker of ASUS UX431FLC
ALSA: hda/realtek - Add Bass Speaker and fixed dac for bass speaker
PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
xen/balloon: fix ballooned page accounting without hotplug enabled
xen-blkback: prevent premature module unload
IB/mlx5: Fix steering rule of drop and count
IB/mlx4: Follow mirror sequence of device add during device removal
s390/cpum_sf: Avoid SBD overflow condition in irq handler
s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
md: raid1: check rdev before reference in raid1_sync_request func
afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
net: make socket read/write_iter() honor IOCB_NOWAIT
usb: gadget: fix wrong endpoint desc
drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
scsi: libsas: stop discovering if oob mode is disconnected
scsi: iscsi: qla4xxx: fix double free in probe
scsi: qla2xxx: Ignore PORT UPDATE after N2N PLOGI
scsi: qla2xxx: Send Notify ACK after N2N PLOGI
scsi: qla2xxx: Configure local loop for N2N target
scsi: qla2xxx: Fix PLOGI payload and ELS IOCB dump length
scsi: qla2xxx: Don't call qlt_async_event twice
scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
rxe: correctly calculate iCRC for unaligned payloads
RDMA/cma: add missed unregister_pernet_subsys in init failure
afs: Fix SELinux setting security label on /afs
afs: Fix afs_find_server lookups for ipv4 peers
PM / devfreq: Don't fail devfreq_dev_release if not in list
PM / devfreq: Set scaling_max_freq to max on OPP notifier error
PM / devfreq: Fix devfreq_notifier_call returning errno
iio: adc: max9611: Fix too short conversion time delay
drm/amd/display: Fixed kernel panic when booting with DP-to-HDMI dongle
drm/amdgpu: add cache flush workaround to gfx8 emit_fence
drm/amdgpu: add check before enabling/disabling broadcast mode
nvme-fc: fix double-free scenarios on hw queues
nvme_fc: add module to ops template to allow module references
Conflicts:
drivers/devfreq/devfreq.c
drivers/gpu/drm/drm_dp_mst_topology.c
drivers/gpu/drm/drm_property.c
Change-Id: Iad80571bea0a2197ea36a70c425c61a66c0cf0bc
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
commit 43cf75d96409a20ef06b756877a2e72b10a026fc upstream.
Currently, when global init and all threads in its thread-group have exited
we panic via:
do_exit()
-> exit_notify()
-> forget_original_parent()
-> find_child_reaper()
This makes it hard to extract a useable coredump for global init from a
kernel crashdump because by the time we panic exit_mm() will have already
released global init's mm.
This patch moves the panic futher up before exit_mm() is called. As was the
case previously, we only panic when global init and all its threads in the
thread-group have exited.
Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
[christian.brauner@ubuntu.com: fix typo, rewrite commit message]
Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
upstream commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.
But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar. Which makes it very hard to verify that the access has
actually been range-checked.
If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin(). But
nothing really forces the range check.
By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses. We have way too long a history of people
trying to avoid them.
Bug: 135368228
Change-Id: I4ca0e4566ea080fa148c5e768bb1a0b6f7201c01
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* refs/heads/tmp-5118163:
Linux 4.19.66
spi: bcm2835: Fix 3-wire mode if DMA is enabled
cgroup: Fix css_task_iter_advance_css_set() cset skip condition
cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
cgroup: Include dying leaders with live threads in PROCS iterations
cgroup: Implement css_task_iter_skip()
cgroup: Call cgroup_release() before __exit_signal()
compat_ioctl: pppoe: fix PPPOEIOCSFWD handling
r8169: don't use MSI before RTL8168d
net/mlx5e: Prevent encap flow counter update async to user query
net/mlx5: Fix modify_cq_in alignment
tun: mark small packets as owned by the tap sock
tipc: compat: allow tipc commands without arguments
ocelot: Cancel delayed work before wq destruction
NFC: nfcmrvl: fix gpio-handling regression
net/smc: do not schedule tx_work in SMC_CLOSED state
net: sched: use temporary variable for actions indexes
net sched: update vlan action for batched events operations
net: sched: Fix a possible null-pointer dereference in dequeue_func()
net: qualcomm: rmnet: Fix incorrect UL checksum offload logic
net: phylink: Fix flow control for fixed-link
net/mlx5: Use reversed order when unregister devices
net/mlx5e: always initialize frag->last_in_page
net: fix ifindex collision during namespace removal
net: bridge: mcast: don't delete permanent entries when fast leave is enabled
net: bridge: delete local fdb on device init failure
mvpp2: refactor MTU change code
mvpp2: fix panic on module removal
mlxsw: spectrum: Fix error path in mlxsw_sp_module_init()
ipip: validate header length in ipip_tunnel_xmit
ip6_tunnel: fix possible use-after-free on xmit
ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6
ife: error out when nla attributes are empty
bnx2x: Disable multi-cos feature.
atm: iphase: Fix Spectre v1 vulnerability
IB: directly cast the sockaddr union to aockaddr
HID: Add quirk for HP X1200 PIXART OEM mouse
HID: wacom: fix bit shift for Cintiq Companion 2
libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
libnvdimm/region: Register badblocks before namespaces
libnvdimm/bus: Prevent duplicate device_unregister() calls
drivers/base: Introduce kill_device()
driver core: Establish order of operations for device_add and device_del via bitflag
gcc-9: don't warn about uninitialized variable
scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure
Change-Id: Id10478edd741fd55ffa841c4ed7c608d9ece3bbb
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
tsk->exit_state = EXIT_DEAD
pidfd_poll
if (tsk->exit_state)
However nothing prevents the following sequence:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
pidfd_poll
if (tsk->exit_state)
tsk->exit_state = EXIT_DEAD
This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.
To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.
Fixes: b53b0b9d9a6 ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit b191d6491be67cef2b3fa83015561caca1394ab9)
Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: Ife81348ae3c3b6f5f8caac0c2f9bacd656582b31
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
commit 6b115bf58e6f013ca75e7115aabcbd56c20ff31d upstream.
cgroup_release() calls cgroup_subsys->release() which is used by the
pids controller to uncharge its pid. We want to use it to manage
iteration of dying tasks which requires putting it before
__unhash_process(). Move cgroup_release() above __exit_signal().
While this makes it uncharge before the pid is freed, pid is RCU freed
anyway and the window is very narrow.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-d885da6:
Revert "coresight: etm4x: Add support to enable ETMv4.2"
Revert "usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded"
Linux 4.19.34
kprobes/x86: Blacklist non-attachable interrupt functions
bcache: fix potential div-zero error of writeback_rate_p_term_inverse
ACPI / video: Extend chassis-type detection with a "Lunch Box" check
net: stmmac: Avoid one more sometimes uninitialized Clang warning
drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device
dmaengine: tegra: avoid overflow of byte tracking
clk: rockchip: fix frac settings of GPLL clock for rk3328
clk: meson: clean-up clock registration
drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup
x86/build: Mark per-CPU symbols as absolute explicitly for LLD
wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
brcmfmac: Use firmware_request_nowarn for the clm_blob
selinux: do not override context on context mounts
x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
drm/nouveau: Stop using drm_crtc_force_disable
drm: Auto-set allow_fb_modifiers when given modifiers at plane init
pinctrl: meson: meson8b: add the eth_rxd2 and eth_rxd3 pins
regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
media: rcar-vin: Allow independent VIN link enablement
netfilter: physdev: relax br_netfilter dependency
dmaengine: qcom_hidma: initialize tx flags in hidma_prep_dma_*
dmaengine: qcom_hidma: assign channel cookie correctly
dmaengine: imx-dma: fix warning comparison of distinct pointer types
cpu/hotplug: Mute hotplug lockdep during init
hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
f2fs: UBSAN: set boolean value iostat_enable correctly
HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit
soc/tegra: fuse: Fix illegal free of IO base address
hwrng: virtio - Avoid repeated init of completion
media: mt9m111: set initial frame size other than 0x0
perf script python: Add trace_context extension module to sys.modules
perf script python: Use PyBytes for attr in trace-event-python
platform/x86: intel-hid: Missing power button release on some Dell models
usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded
ALSA: dice: add support for Solid State Logic Duende Classic/Mini
drm/amd/display: Enable vblank interrupt during CRC capture
powerpc/pseries: Perform full re-add of CPU for topology update post-migration
tty: increase the default flip buffer limit to 2*640K
backlight: pwm_bl: Use gpiod_get_value_cansleep() to get initial state
cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting
powerpc/64s: Clear on-stack exception marker upon exception return
selftests/bpf: skip verifier tests for unsupported program types
bpf: fix missing prototype warnings
block, bfq: fix in-service-queue check for queue merging
ARM: avoid Cortex-A9 livelock on tight dmb loops
ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of
mt7601u: bump supported EEPROM version
soc: qcom: gsbi: Fix error handling in gsbi_probe()
efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation
drm/vkms: Bugfix extra vblank frame
sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
efi/memattr: Don't bail on zero VA if it equals the region's PA
sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
iwlwifi: mvm: fix RFH config command with >=10 CPUs
staging: spi: mt7621: Add return code check on device_reset()
i2c: of: Try to find an I2C adapter matching the parent
platform/x86: intel_pmc_core: Fix PCH IP sts reading
e1000e: Exclude device from suspend direct complete optimization
e1000e: fix cyclic resets at link up with active tx
perf/aux: Make perf_event accessible to setup_aux()
drm/amd/display: Disconnect mpcc when changing tg
drm/amd/display: Don't re-program planes for DPMS changes
drm: rcar-du: add missing of_node_put
cdrom: Fix race condition in cdrom_sysctl_register
fbdev: fbmem: fix memory access if logo is bigger than the screen
net: phy: consider latched link-down status in polling mode
iw_cxgb4: fix srqidx leak during connection abort
net: marvell: mvpp2: fix stuck in-band SGMII negotiation
genirq: Avoid summation loops for /proc/stat
bcache: improve sysfs_strtoul_clamp()
bcache: fix potential div-zero error of writeback_rate_i_term_inverse
bcache: fix input overflow to sequential_cutoff
bcache: fix input overflow to cache set sysfs file io_error_halflife
sched/topology: Fix percpu data types in struct sd_data & struct s_data
usb: f_fs: Avoid crash due to out-of-scope stack ptr access
ath10k: fix shadow register implementation for WCN3990
ALSA: PCM: check if ops are defined before suspending PCM
ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins
ARM: 8833/1: Ensure that NEON code always compiles with Clang
netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm
kprobes: Prohibit probing on RCU debug routine
kprobes: Prohibit probing on bsearch()
selftests: skip seccomp get_metadata test if not real root
ACPI / video: Refactor and fix dmi_is_desktop()
iwlwifi: pcie: fix emergency path
perf report: Add s390 diagnosic sampling descriptor size
leds: lp55xx: fix null deref on firmware load failure
jbd2: fix race when writing superblock
cgroup, rstat: Don't flush subtree root unless necessary
HID: intel-ish-hid: avoid binding wrong ishtp_cl_device
vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
xen/gntdev: Do not destroy context while dma-bufs are in use
mt76: usb: do not run mt76u_queues_deinit twice
media: mtk-jpeg: Correct return type for mem2mem buffer helpers
media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
media: s5p-g2d: Correct return type for mem2mem buffer helpers
media: rockchip/rga: Correct return type for mem2mem buffer helpers
media: s5p-jpeg: Correct return type for mem2mem buffer helpers
media: sh_veu: Correct return type for mem2mem buffer helpers
media: ov7740: fix runtime pm initialization
SoC: imx-sgtl5000: add missing put_device()
perf report: Don't shadow inlined symbol with different addr range
mwifiex: don't advertise IBSS features without FW support
perf test: Fix failure of 'evsel-tp-sched' test on s390
drm/amd/display: Clear stream->mode_changed after commit
scsi: fcoe: make use of fip_mode enum complete
scsi: megaraid_sas: return error when create DMA pool failed
s390/ism: ignore some errors during deregistration
efi: cper: Fix possible out-of-bounds access
cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of()
perf annotate: Fix getting source line failure
clk: fractional-divider: check parent rate only if flag is set
IB/mlx4: Increase the timeout for CM cache
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
platform/mellanox: mlxreg-hotplug: Fix KASAN warning
platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN
mlxsw: spectrum: Avoid -Wformat-truncation warnings
e1000e: Fix -Wformat-truncation warnings
net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat
mmc: omap: fix the maximum timeout setting
btrfs: qgroup: Make qgroup async transaction commit more aggressive
powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables
ARM: 8840/1: use a raw_spinlock_t in unwind
serial: 8250_pxa: honor the port number from devicetree
coresight: etm4x: Add support to enable ETMv4.2
powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
usb: chipidea: Grab the (legacy) USB PHY by phandle first
crypto: cavium/zip - fix collision with generic cra_driver_name
crypto: crypto4xx - add missing of_node_put after of_device_is_available
mt76: fix a leaked reference by adding a missing of_node_put
wil6210: check null pointer in _wil_cfg80211_merge_extra_ies
PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
tools lib traceevent: Fix buffer overflow in arg_eval
fs: fix guard_bio_eod to check for real EOD errors
jbd2: fix invalid descriptor block checksum
netfilter: conntrack: tcp: only close if RST matches exact sequence
netfilter: nf_tables: check the result of dereferencing base_chain->stats
cifs: Fix NULL pointer dereference of devname
cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED
f2fs: fix to check inline_xattr_size boundary correctly
dm thin: add sanity checks to thin-pool and external snapshot creation
cifs: use correct format characters
page_poison: play nicely with KASAN
fs/file.c: initialize init_files.resize_wait
f2fs: do not use mutex lock in atomic context
ocfs2: fix a panic problem caused by o2cb_ctl
mm/slab.c: kmemleak no scan alien caches
mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
mm, mempolicy: fix uninit memory access
memcg: killed threads should not invoke memcg OOM killer
mm,oom: don't kill global init via memory.oom.group
mm, swap: bounds check swap_info array accesses to avoid NULL derefs
mm/page_ext.c: fix an imbalance with kmemleak
mm/cma.c: cma_declare_contiguous: correct err handling
mm/sparse: fix a bad comparison
perf c2c: Fix c2c report for empty numa node
x86/hyperv: Fix kernel panic when kexec on HyperV
iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver
scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
scsi: hisi_sas: Set PHY linkrate when disconnected
libbpf: force fixdep compilation at the start of the build
enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
net: stmmac: Avoid sometimes uninitialized Clang warnings
sysctl: handle overflow for file-max
include/linux/relay.h: fix percpu annotation in struct rchan
gpio: gpio-omap: fix level interrupt idling
net/mlx5: Avoid panic when setting vport mac, getting vport config
net/mlx5: Avoid panic when setting vport rate
tracing: kdb: Fix ftdump to not sleep
f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
CIFS: fix POSIX lock leak and invalid ptr deref
tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
tty/serial: atmel: Add is_half_duplex helper
ext4: cleanup bh release code in ext4_ind_remove_space()
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
ANDROID: cuttlefish_defconfig: Enable CONFIG_OVERLAY_FS
ANDROID: cuttlefish: enable CONFIG_NET_SCH_INGRESS=y
Conflicts:
drivers/usb/gadget/function/f_fs.c
mm/page_alloc.c
Change-Id: Ia2a8e99bfdae84d3933749f45ba86f33c5acd713
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
[ Upstream commit 51bee5abeab2058ea5813c5615d6197a23dbf041 ]
The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which
needs pids_free() to uncharge the pid.
However, ->free() is called from __put_task_struct()->cgroup_free() and this
is too late. Even the trivial program which does
for (;;) {
int pid = fork();
assert(pid >= 0);
if (pid)
wait(NULL);
else
exit(0);
}
can run out of limits because release_task()->call_rcu(delayed_put_task_struct)
implies an RCU gp after the task/pid goes away and before the final put().
Test-case:
mkdir -p /tmp/CG
mount -t cgroup2 none /tmp/CG
echo '+pids' > /tmp/CG/cgroup.subtree_control
mkdir /tmp/CG/PID
echo 2 > /tmp/CG/PID/pids.max
perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' &
echo $! > /tmp/CG/PID/cgroup.procs
Without this patch the forking process fails soon after migration.
Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite
into the new helper, cgroup_release(), called by release_task() which actually
frees the pid(s).
Reported-by: Herton R. Krzesinski <hkrzesin@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
* refs/heads/tmp-36d178b:
Linux 4.19.27
x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
MIPS: eBPF: Fix icache flush end address
MIPS: BCM63XX: provide DMA masks for ethernet devices
MIPS: fix truncation in __cmpxchg_small for short values
hugetlbfs: fix races and page leaks during migration
drm: Block fb changes for async plane updates
mm: enforce min addr even if capable() in expand_downwards()
mmc: sdhci-esdhc-imx: correct the fix of ERR004536
mmc: cqhci: Fix a tiny potential memory leak on error condition
mmc: cqhci: fix space allocated for transfer descriptor
mmc: core: Fix NULL ptr crash from mmc_should_fail_request
mmc: tmio: fix access width of Block Count Register
mmc: tmio_mmc_core: don't claim spurious interrupts
mmc: spi: Fix card detection during probe
kvm: selftests: Fix region overlap check in kvm_util
KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
svm: Fix AVIC incomplete IPI emulation
cfg80211: extend range deviation for DMG
mac80211: Add attribute aligned(2) to struct 'action'
mac80211: don't initiate TDLS connection if station is not associated to AP
ibmveth: Do not process frames after calling napi_reschedule
net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
net: usb: asix: ax88772_bind return error when hw_reset fail
drm/msm: Fix A6XX support for opp-level
nvme-multipath: drop optimization for static ANA group IDs
nvme-rdma: fix timeout handler
hv_netvsc: Fix hash key value reset after other ops
hv_netvsc: Refactor assignments of struct netvsc_device_info
hv_netvsc: Fix ethtool change hash key error
net: altera_tse: fix connect_local_phy error path
scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
scsi: lpfc: nvmet: avoid hang / use-after-free when destroying targetport
scsi: lpfc: nvme: avoid hang / use-after-free when destroying localport
writeback: synchronize sync(2) against cgroup writeback membership switches
direct-io: allow direct writes to empty inodes
staging: android: ion: Support cpu access during dma_buf_detach
drm/sun4i: hdmi: Fix usage of TMDS clock
serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
tty: serial: qcom_geni_serial: Allow mctrl when flow control is disabled
drm/amd/powerplay: OD setting fix on Vega10
locking/rwsem: Fix (possible) missed wakeup
futex: Fix (possible) missed wakeup
sched/wake_q: Fix wakeup ordering for wake_q
sched/wait: Fix rcuwait_wake_up() ordering
mac80211: fix miscounting of ttl-dropped frames
staging: rtl8723bs: Fix build error with Clang when inlining is disabled
drivers: thermal: int340x_thermal: Fix sysfs race condition
ARC: show_regs: lockdep: avoid page allocator...
ARC: fix __ffs return value to avoid build warnings
irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
selftests: gpio-mockup-chardev: Check asprintf() for error
selftests: seccomp: use LDLIBS instead of LDFLAGS
phy: ath79-usb: Fix the main reset name to match the DT binding
phy: ath79-usb: Fix the power on error path
selftests/vm/gup_benchmark.c: match gup struct to kernel
ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
ASoC: dapm: change snprintf to scnprintf for possible overflow
ASoC: rt5682: Fix PLL source register definitions
x86/mm/mem_encrypt: Fix erroneous sizeof()
genirq: Make sure the initial affinity is not empty
selftests: rtc: rtctest: add alarm test on minute boundary
selftests: rtc: rtctest: fix alarm tests
usb: gadget: Potential NULL dereference on allocation error
usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
usb: dwc3: gadget: synchronize_irq dwc irq in suspend
thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
clk: vc5: Abort clock configuration without upstream clock
clk: sysfs: fix invalid JSON in clk_dump
clk: tegra: dfll: Fix a potential Oop in remove()
ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized
ALSA: compress: prevent potential divide by zero bugs
ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
drm/msm: Unblock writer if reader closes file
scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
mac80211: Change default tx_sk_pacing_shift to 7
genirq/matrix: Improve target CPU selection for managed interrupts.
irq/matrix: Spread managed interrupts on allocation
irq/matrix: Split out the CPU selection code into a helper
FROMGIT: binder: create node flag to request sender's security context
Modify include/uapi/linux/android/binder.h, as commit:
FROMGIT: binder: create node flag to request sender's security context
introduces enums and structures, which are already defined in other
userspace files that include the binder uapi file. Thus, the
redeclaration of these enums and structures can lead to
build errors. To avoid this, guard the redundant declarations
in the uapi header with the __KERNEL__ header guard, so they
are not exported to userspace.
Conflicts:
drivers/staging/android/ion/ion.c
sound/core/compress_offload.c
Change-Id: Ibf422b9b32ea1315515c33036b20ae635b8c8e4c
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
[ Upstream commit 6dc080eeb2ba01973bfff0d79844d7a59e12542e ]
For some peculiar reason rcuwait_wake_up() has the right barrier in
the comment, but not in the code.
This mistake has been observed to cause a deadlock in the following
situation:
P1 P2
percpu_up_read() percpu_down_write()
rcu_sync_is_idle() // false
rcu_sync_enter()
...
__percpu_up_read()
[S] ,- __this_cpu_dec(*sem->read_count)
| smp_rmb();
[L] | task = rcu_dereference(w->task) // NULL
|
| [S] w->task = current
| smp_mb();
| [L] readers_active_check() // fail
`-> <store happens here>
Where the smp_rmb() (obviously) fails to constrain the store.
[ peterz: Added changelog. ]
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8f95c90ceb ("sched/wait, RCU: Introduce rcuwait machinery")
Link: https://lkml.kernel.org/r/1543590656-7157-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
* refs/heads/tmp-c28f73f:
Linux 4.19.20
cifs: Always resolve hostname before reconnecting
md/raid5: fix 'out of memory' during raid cache recovery
of: overlay: do not duplicate properties from overlay for new nodes
of: overlay: use prop add changeset entry for property in new nodes
of: overlay: add missing of_node_get() in __of_attach_node_sysfs
of: overlay: add tests to validate kfrees from overlay removal
of: Convert to using %pOFn instead of device_node.name
mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
mm: hwpoison: use do_send_sig_info() instead of force_sig()
mm, oom: fix use-after-free in oom_kill_process
mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
oom, oom_reaper: do not enqueue same task twice
mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
btrfs: On error always free subvol_name in btrfs_mount
Btrfs: fix deadlock when allocating tree block during leaf/node split
mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
IB/hfi1: Remove overly conservative VM_EXEC flag check
ALSA: hda/realtek - Fixed hp_pin no value
ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
mmc: bcm2835: Fix DMA channel leak on probe error
gfs2: Revert "Fix loop in gfs2_rbm_find"
gpio: sprd: Fix incorrect irq type setting for the async EIC
gpio: sprd: Fix the incorrect data register
gpio: pcf857x: Fix interrupts on multiple instances
gpiolib: fix line event timestamps for nested irqs
gpio: altera-a10sr: Set proper output level for direction_output
arm64: hibernate: Clean the __hyp_text to PoC after resume
arm64: hyp-stub: Forbid kprobing of the hyp-stub
arm64: Do not issue IPIs for user executable ptes
arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
NFS: Fix up return value on fatal errors in nfs_page_async_flush()
selftests/seccomp: Enhance per-arch ptrace syscall skip tests
iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
CIFS: Do not consider -ENODATA as stat failure for reads
CIFS: Fix trace command logging for SMB2 reads and writes
CIFS: Do not count -ENODATA as failure for query directory
virtio_net: Differentiate sk_buff and xdp_frame on freeing
virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
virtio_net: Don't process redirected XDP frames when XDP is disabled
virtio_net: Fix out of bounds access of sq
virtio_net: Fix not restoring real_num_rx_queues
virtio_net: Don't call free_old_xmit_skbs for xdp_frames
virtio_net: Don't enable NAPI when interface is down
sctp: set flow sport from saddr only when it's 0
sctp: set chunk transport correctly when it's a new asoc
Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
ip6mr: Fix notifiers call on mroute_clean_tables()
net/mlx5e: Allow MAC invalidation while spoofchk is ON
sctp: improve the events for sctp stream adding
net: ip6_gre: always reports o_key to userspace
vhost: fix OOB in get_rx_bufs()
ucc_geth: Reset BQL queue when stopping device
tun: move the call to tun_set_real_num_queues
sctp: improve the events for sctp stream reset
ravb: expand rx descriptor data to accommodate hw checksum
net: set default network namespace in init_dummy_netdev()
net/rose: fix NULL ax25_cb kernel panic
netrom: switch to sock timer API
net/mlx4_core: Add masking for a few queries on HCA caps
net: ip_gre: use erspan key field for tunnel lookup
net: ip_gre: always reports o_key to userspace
l2tp: fix reading optional fields of L2TPv3
l2tp: copy 4 more bytes to linear part if necessary
ipvlan, l3mdev: fix broken l3s mode wrt local routes
ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
ipv6: Consider sk_bound_dev_if when binding a socket to an address
drm/msm/gpu: fix building without debugfs
Fix "net: ipv4: do not handle duplicate fragments as overlapping"
UPSTREAM: net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
UPSTREAM: binder: filter out nodes when showing binder procs
UPSTREAM: xfrm: Make set-mark default behavior backward compatible
ANDROID: cuttlefish_defconfig: Enable CONFIG_RTC_HCTOSYS
Conflicts:
mm/oom_kill.c
Change-Id: I95452a9f286b95924096afd2dfe439f9d638d404
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
commit 8fb335e078378c8426fabeed1ebee1fbf915690c upstream.
Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.
zap_pid_ns_processes() can get stuck on waiting tasks from the dead
list. In this case, we will have one unkillable process with one or
more dead children.
Thanks to Oleg for the advice to release tasks in find_child_reaper().
Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes: 7c8bd2322c ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This snapshot is taken from msm-4.14 as of
commit 871eac76e6be567 ("sched: Improve the scheduler").
Change-Id: Ib4e0b39526d3009cedebb626ece5a767d8247846
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
This passes the information we already have at the call sight
into group_send_sig_info. Ultimatelly allowing for to better handle
signals sent to a group of processes.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Everywhere except in the pid array we distinguish between a tasks pid and
a tasks tgid (thread group id). Even in the enumeration we want that
distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid
we almost have an implementation of PIDTYPE_TGID in struct signal_struct.
Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
into the pids array. Then remove the __PIDTYPE_TGID special case and the
leader_pid in signal_struct.
The net size increase is just an extra pointer added to struct pid and
an extra pair of pointers of an hlist_node added to task_struct.
The effect on code maintenance is the removal of a number of special
cases today and the potential to remove many more special cases as
PIDTYPE_TGID gets used to it's fullest. The long term potential
is allowing zombie thread group leaders to exit, which will remove
a lot more special cases in the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
gcc toggle -fisolate-erroneous-paths-dereference (default at -O2
onwards) isolates faulty code paths such as null pointer access, divide
by zero etc. If gcc port doesnt implement __builtin_trap, an abort() is
generated which causes kernel link error.
In this case, gcc is generating abort due to 'divide by zero' in
lib/mpi/mpih-div.c.
Currently 'frv' and 'arc' are failing. Previously other arch was also
broken like m32r was fixed by commit d22e3d69ee ("m32r: fix build
failure").
Let's define this weak function which is common for all arch and fix the
problem permanently. We can even remove the arch specific 'abort' after
this is done.
Link: http://lkml.kernel.org/r/1513118956-8718-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As pointed out by Linus and David, the earlier waitid() fix resulted in
a (currently harmless) unbalanced user_access_end() call. This fixes it
to just directly return EFAULT on access_ok() failure.
Fixes: 96ca579a1e ("waitid(): Add missing access_ok() checks")
Acked-by: David Daney <david.daney@cavium.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel_waitid() can return a PID, an error or 0. rusage is filled in the first
case and waitid(2) rusage should've been copied out exactly in that case, *not*
whenever kernel_waitid() has not returned an error. Compat variant shares that
braino; none of kernel_wait4() callers do, so the below ought to fix it.
Reported-and-tested-by: Alexander Potapenko <glider@google.com>
Fixes: ce72a16fa7 ("wait4(2)/waitid(2): separate copying rusage to userland")
Cc: stable@vger.kernel.org # v4.13
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull namespace updates from Eric Biederman:
"Life has been busy and I have not gotten half as much done this round
as I would have liked. I delayed it so that a minor conflict
resolution with the mips tree could spend a little time in linux-next
before I sent this pull request.
This includes two long delayed user namespace changes from Kirill
Tkhai. It also includes a very useful change from Serge Hallyn that
allows the security capability attribute to be used inside of user
namespaces. The practical effect of this is people can now untar
tarballs and install rpms in user namespaces. It had been suggested to
generalize this and encode some of the namespace information
information in the xattr name. Upon close inspection that makes the
things that should be hard easy and the things that should be easy
more expensive.
Then there is my bugfix/cleanup for signal injection that removes the
magic encoding of the siginfo union member from the kernel internal
si_code. The mips folks reported the case where I had used FPE_FIXME
me is impossible so I have remove FPE_FIXME from mips, while at the
same time including a return statement in that case to keep gcc from
complaining about unitialized variables.
I almost finished the work to get make copy_siginfo_to_user a trivial
copy to user. The code is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3
But I did not have time/energy to get the code posted and reviewed
before the merge window opened.
I was able to see that the security excuse for just copying fields
that we know are initialized doesn't work in practice there are buggy
initializations that don't initialize the proper fields in siginfo. So
we still sometimes copy unitialized data to userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Introduce v3 namespaced file capabilities
mips/signal: In force_fcr31_sig return in the impossible case
signal: Remove kernel interal si_code magic
fcntl: Don't use ambiguous SIG_POLL si_codes
prctl: Allow local CAP_SYS_ADMIN changing exe_file
security: Use user_namespace::level to avoid redundant iterations in cap_capable()
userns,pidns: Verify the userns for new pid namespaces
signal/testing: Don't look for __SI_FAULT in userspace
signal/mips: Document a conflict with SI_USER with SIGFPE
signal/sparc: Document a conflict with SI_USER with SIGFPE
signal/ia64: Document a conflict with SI_USER with SIGFPE
signal/alpha: Document a conflict with SI_USER for SIGTRAP
Pull locking updates from Ingo Molnar:
- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.
- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)
- Fix static keys (Paolo Bonzini)
- Add killable versions of down_read() et al (Kirill Tkhai)
- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)
- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)
- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
it appears that all callers could do just as well with a lock/unlock pair.
This commit therefore replaces the spin_unlock_wait() call in do_exit()
with spin_lock() followed immediately by spin_unlock(). This should be
safe from a performance perspective because the lock is a per-task lock,
and this is happening only at task-exit time.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
APIs for do_exit() use. This has the benefit of confining the use of the
tasks_rcu_exit_srcu variable to one file, allowing it to become static.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS
While this looks plausible on the surface, in practice this situation has
not worked well.
- Injected positive signals are not copied to user space properly
unless they have these magic high bits set.
- Injected positive signals are not reported properly by signalfd
unless they have these magic high bits set.
- These kernel internal values leaked to userspace via ptrace_peek_siginfo
- It was possible to inject these kernel internal values and cause the
the kernel to misbehave.
- Kernel developers got confused and expected these kernel internal values
in userspace in kernel self tests.
- Kernel developers got confused and set si_code to __SI_FAULT which
is SI_USER in userspace which causes userspace to think an ordinary user
sent the signal and that it was not kernel generated.
- The values make it impossible to reorganize the code to transform
siginfo_copy_to_user into a plain copy_to_user. As si_code must
be massaged before being passed to userspace.
So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.
To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used. Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.
A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like. The good news is only problem
architectures pay the cost.
Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values. Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.
Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first. The high bits are no
longer used to hold a magic union member.
Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.
The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.
The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.
Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
We lose the distinction between "found a PID" and "nothing, but that's not
an error" a bit too early in waitid(). Easily fixed, fortunately...
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Fixes: 67d7ddded3 ("waitid(2): leave copyout of siginfo to syscall itself")
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull wait syscall updates from Al Viro:
"Consolidating sys_wait* and compat counterparts.
Gets rid of set_fs()/double-copy mess, simplifies the whole thing
(lifting the copyouts to the syscalls means less headache in the part
that does actual work - fewer failure exits, to start with), gets rid
of the overhead of field-by-field __put_user()"
* 'work.sys_wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
osf_wait4: switch to kernel_wait4()
waitid(): switch copyout of siginfo to unsafe_put_user()
wait_task_zombie: consolidate info logics
kill wait_noreap_copyout()
lift getrusage() from wait_noreap_copyout()
waitid(2): leave copyout of siginfo to syscall itself
kernel_wait4()/kernel_waitid(): delay copying status to userland
wait4(2)/waitid(2): separate copying rusage to userland
move compat wait4 and waitid next to native variants
Rename:
wait_queue_t => wait_queue_entry_t
'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.
Start sorting this out by renaming it to 'wait_queue_entry_t'.
This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
have kernel_waitid() collect the information needed for siginfo into
a small structure (waitid_info) passed to it; deal with copyout in
sys_waitid()/compat_sys_waitid().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
New helpers: kernel_waitid() and kernel_wait4(). sys_waitid(),
sys_wait4() and their compat variants switched to those. Copying
struct rusage to userland is left to syscall itself. For
compat_sys_wait4() that eliminates the use of set_fs() completely.
For compat_sys_waitid() it's still needed (for siginfo handling);
that will change shortly.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Patch series "userfaultfd non-cooperative further update for 4.11 merge
window".
Unfortunately I noticed one relevant bug in userfaultfd_exit while doing
more testing. I've been doing testing before and this was also tested
by kbuild bot and exercised by the selftest, but this bug never
reproduced before.
I dropped userfaultfd_exit as result. I dropped it because of
implementation difficulty in receiving signals in __mmput and because I
think -ENOSPC as result from the background UFFDIO_COPY should be enough
already.
Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit
wasn't exercised by the selftest and when I tried to exercise it, after
moving it to a more correct place in __mmput where it would make more
sense and where the vma list is stable, it resulted in the
event_wait_completion in D state. So then I added the second patch to
be sure even if we call userfaultfd_event_wait_completion too late
during task exit(), we won't risk to generate tasks in D state. The
same check exists in handle_userfault() for the same reason, except it
makes a difference there, while here is just a robustness check and it's
run under WARN_ON_ONCE.
While looking at the userfaultfd_event_wait_completion() function I
looked back at its callers too while at it and I think it's not ok to
stop executing dup_fctx on the fcs list because we relay on
userfaultfd_event_wait_completion to execute
userfaultfd_ctx_put(fctx->orig) which is paired against
userfaultfd_ctx_get(fctx->orig) in dup_userfault just before
list_add(fcs). This change only takes care of fctx->orig but this area
also needs further review looking for similar problems in fctx->new.
The only patch that is urgent is the first because it's an use after
free during a SMP race condition that affects all processes if
CONFIG_USERFAULTFD=y. Very hard to reproduce though and probably
impossible without SLUB poisoning enabled.
This patch (of 3):
I once reproduced this oops with the userfaultfd selftest, it's not
easily reproducible and it requires SLUB poisoning to reproduce.
general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G ------------ T 3.10.0+ #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000
RIP: 0010:[<ffffffff81451299>] [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
RSP: 0018:ffff8801f833fe80 EFLAGS: 00010202
RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600
RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000
R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700
FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
do_exit+0x297/0xd10
SyS_exit+0x17/0x20
tracesys+0xdd/0xe2
Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 <4c> 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80
RIP [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
RSP <ffff8801f833fe80>
---[ end trace 9fecd6dcb442846a ]---
In the debugger I located the "mm" pointer in the stack and walking
mm->mmap->vm_next through the end shows the vma->vm_next list is fully
consistent and it is null terminated list as expected. So this has to
be an SMP race condition where userfaultfd_exit was running while the
vma list was being modified by another CPU.
When userfaultfd_exit() run one of the ->vm_next pointers pointed to
SLAB_POISON (RBX is the vma pointer and is 0x6b6b..).
The reason is that it's not running in __mmput but while there are still
other threads running and it's not holding the mmap_sem (it can't as it
has to wait the even to be received by the manager). So this is an use
after free that was happening for all processes.
One more implementation problem aside from the race condition:
userfaultfd_exit has really to check a flag in mm->flags before walking
the vma or it's going to slowdown the exit() path for regular tasks.
One more implementation problem: at that point signals can't be
delivered so it would also create a task in D state if the manager
doesn't read the event.
The major design issue: it overall looks superfluous as the manager can
check for -ENOSPC in the background transfer:
if (mmget_not_zero(ctx->mm)) {
[..]
} else {
return -ENOSPC;
}
It's safer to roll it back and re-introduce it later if at all.
[rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT]
Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>