Commit Graph

592 Commits

Author SHA1 Message Date
Ivaylo Georgiev
44bb576a7a Merge android-4.19.73 (8ca5759) into msm-4.19
* refs/heads/tmp-8ca5759:
  BACKPORT: make 'user_access_begin()' do 'access_ok()'
  ABI update for 4.19.72
  ANDROID: first pass cuttlefish GKI modularization
  ANDROID: GKI: enable CONFIG_TIPC for x86
  ANDROID: GKI: enable CONFIG_SPI for x86
  ANDROID: update abi for 4.19.69
  ANDROID: update ABI dump
  UPSTREAM: lib/test_meminit.c: use GFP_ATOMIC in RCU critical section
  UPSTREAM: mm: slub: Fix slab walking for init_on_free
  UPSTREAM: lib/test_meminit.c: minor test fixes
  UPSTREAM: lib/test_meminit.c: fix -Wmaybe-uninitialized false positive
  UPSTREAM: lib: introduce test_meminit module
  UPSTREAM: mm: init: report memory auto-initialization features at boot time
  UPSTREAM: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
  UPSTREAM: arm64: move jump_label_init() before parse_early_param()
  ANDROID: update ABI dump
  ANDROID: gki_defconfig: enable CONFIG_QCOM_{COMMAND_DB,RPMH,PDC}
  ANDROID: cuttlefish: overlayfs: regression
  ANDROID: gki_defconfig enable CONFIG_SPARSEMEM_VMEMMAP
  ANDROID: update ABI for EFI, SCHED_TUNE
  ANDROID: gki_defconfig: Enable SCHED_TUNE
  ANDROID: gki_defconfig: Minimally enable EFI
  ANDROID: Add a tracepoint for mapping inode to full path
  ANDROID: update ABI for CONFIG_NR_CPUS=32
  ANDROID: gki_defconfig: set CONFIG_NR_CPUS=32
  ANDROID: gki_defconfig: set CONFIG_NR_CPUS=32 (x86_64)
  ANDROID: update ABI for CONFIG_TIPC
  ANDROID: gki_defconfig: enable CONFIG_TIPC
  BACKPORT: arch: add pidfd and io_uring syscalls everywhere
  ANDROID: update ABI dump
  UPSTREAM: dma-buf: add show_fdinfo handler
  UPSTREAM: dma-buf: add DMA_BUF_SET_NAME ioctls
  UPSTREAM: dma-buf: give each buffer a full-fledged inode
  ANDROID: Update the expected ABI
  UPSTREAM: drm/virtio: Fix cache entry creation race.
  UPSTREAM: drm/virtio: Wake up all waiters when capset response comes in.
  UPSTREAM: drm/virtio: Ensure cached capset entries are valid before copying.
  UPSTREAM: drm/virtio: use u64_to_user_ptr macro
  UPSTREAM: drm/virtio: remove irrelevant DRM_UNLOCKED flag
  UPSTREAM: drm/virtio: Remove redundant return type
  UPSTREAM: drm/virtio: allocate fences with GFP_KERNEL
  UPSTREAM: drm/virtio: add trace events for commands
  UPSTREAM: drm/virtio: trace drm_fence_emit
  BACKPORT: drm/virtio: set seqno for dma-fence
  UPSTREAM: drm/virtio: move drm_connector_update_edid_property() call
  UPSTREAM: drm/virtio: add missing drm_atomic_helper_shutdown() call.
  UPSTREAM: drm/virtio: rework resource creation workflow.
  UPSTREAM: drm/virtio: params struct for virtio_gpu_cmd_create_resource_3d()
  UPSTREAM: drm/virtio: params struct for virtio_gpu_cmd_create_resource()
  UPSTREAM: drm/virtio: use struct to pass params to virtio_gpu_object_create()
  UPSTREAM: drm/virtio: move virtio_gpu_object_{attach, detach} calls.
  UPSTREAM: drm/virtio: add virtio-gpu-features debugfs file.
  UPSTREAM: drm/virtio: remove set but not used variable 'vgdev'
  BACKPORT: drm/virtio: implement prime export
  UPSTREAM: drm/virtio: remove prime pin/unpin callbacks.
  UPSTREAM: drm/virtio: implement prime mmap
  BACKPORT: Revert "drm/virtio: drop prime import/export callbacks"
  UPSTREAM: drm/virtio: drop prime import/export callbacks
  UPSTREAM: drm/virtio: do NOT reuse resource ids
  UPSTREAM: drm/virtio: drop virtio_gpu_fence_cleanup()
  UPSTREAM: drm/virtio: fix pageflip flush
  UPSTREAM: drm/virtio: log error responses
  UPSTREAM: drm/virtio: Add missing virtqueue reset
  UPSTREAM: drm/virtio: Remove incorrect kfree()
  UPSTREAM: drm/virtio: switch to generic fbdev emulation
  UPSTREAM: drm/virtio: virtio_gpu_cmd_resource_create_3d: drop unused fence arg
  UPSTREAM: drm/virtio: fence: pass plain pointer
  UPSTREAM: drm/virtio: add edid support
  UPSTREAM: virtio-gpu: add VIRTIO_GPU_F_EDID feature
  UPSTREAM: drm/virtio: fix memory leak of vfpriv on error return path
  UPSTREAM: drm/virtio: bump driver version after explicit synchronization addition
  UPSTREAM: drm/virtio: add in/out fence support for explicit synchronization
  UPSTREAM: drm/virtio: add uapi for in and out explicit fences
  UPSTREAM: drm/virtio: add virtio_gpu_alloc_fence()
  UPSTREAM: drm/virtio: Use IDAs more efficiently
  UPSTREAM: drm/virtio: Handle error from virtio_gpu_resource_id_get
  UPSTREAM: gpu/drm/virtio/virtgpu_vq.c: Use kmem_cache_zalloc
  UPSTREAM: drm/virtio: Handle context ID allocation errors
  UPSTREAM: drm/virtio: Replace IDRs with IDAs
  UPSTREAM: drm/virtio: fix resource id handling
  UPSTREAM: drm/virtio: drop resource_id argument.
  UPSTREAM: drm/virtio: use virtio_gpu_object->hw_res_handle in virtio_gpu_resource_create_ioctl()
  UPSTREAM: drm/virtio: use virtio_gpu_object->hw_res_handle in virtio_gpu_mode_dumb_create()
  UPSTREAM: drm/virtio: use virtio_gpu_object->hw_res_handle in virtio_gpufb_create()
  BACKPORT: drm/virtio: track created object state
  UPSTREAM: drm/virtio: document drm_dev_set_unique workaround
  UPSTREAM: virtio: Support prime objects vmap/vunmap
  BACKPORT: virtio: Rework virtio_gpu_object_kmap()
  UPSTREAM: drm/virtio: pass virtio_gpu_object to virtio_gpu_cmd_transfer_to_host_{2d, 3d}
  UPSTREAM: drm/virtio: add dma sync for dma mapped virtio gpu framebuffer pages
  UPSTREAM: drm/virtio: Remove set but not used variable 'bo'
  UPSTREAM: drm/virtio: add iommu support.
  UPSTREAM: drm/virtio: add virtio_gpu_object_detach() function
  UPSTREAM: drm/virtio: track virtual output state
  UPSTREAM: drm/virtio: fix bounds check in virtio_gpu_cmd_get_capset()
  UPSTREAM: drm/virtio: Replace ttm_bo_unref with ttm_bo_put
  UPSTREAM: drm/virtio: Replace ttm_bo_reference with ttm_bo_get
  UPSTREAM: drm/virtio: Replace drm_dev_unref with drm_dev_put
  UPSTREAM: gpu: drm: virtio: code cleanup
  UPSTREAM: drm: byteorder: add DRM_FORMAT_HOST_*
  UPSTREAM: drm: add drm_connector_attach_edid_property()
  UPSTREAM: drm/prime: Add drm_gem_prime_mmap()
  ANDROID: Remove unused cuttlefish build infra
  f2fs: fix build error on android tracepoints
  ANDROID: sched/fair: Cap transient util in stune
  ANDROID: update ABI for 4.19.66
  Adding GKI Ramdisk to gki config
  ANDROID: Removed unnecessary modules from cuttlefish.
  UPSTREAM: pidfd: fix a poll race when setting exit_state
  BACKPORT: arch: wire-up pidfd_open()
  UPSTREAM: pid: add pidfd_open()
  UPSTREAM: pidfd: add polling support
  UPSTREAM: signal: improve comments
  UPSTREAM: fork: do not release lock that wasn't taken
  UPSTREAM: signal: support CLONE_PIDFD with pidfd_send_signal
  UPSTREAM: clone: add CLONE_PIDFD
  UPSTREAM: Make anon_inodes unconditional
  UPSTREAM: signal: use fdget() since we don't allow O_PATH
  UPSTREAM: signal: don't silently convert SI_USER signals to non-current pidfd
  BACKPORT: signal: add pidfd_send_signal() syscall

Conflicts:
	arch/arm64/configs/cuttlefish_defconfig
	arch/x86/configs/x86_64_cuttlefish_defconfig
	arch/x86/entry/syscalls/syscall_64.tbl
	build.config.cuttlefish.aarch64
	build.config.cuttlefish.x86_64
	drivers/dma-buf/dma-buf.c
	fs/userfaultfd.c
	include/linux/dma-buf.h
	kernel/sched/fair.c

Change-Id: I65d7949be7c228000f94ad9118f2d80a8fa45a1b
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2020-02-24 07:44:16 -08:00
Ivaylo Georgiev
9e52135608 Merge android-4.19-q.94 (dabb11d) into msm-4.19
* refs/heads/tmp-dabb11d:
  Revert "dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example"
  Linux 4.19.94
  perf/x86/intel/bts: Fix the use of page_private()
  xen/blkback: Avoid unmapping unmapped grant pages
  s390/smp: fix physical to logical CPU map for SMT
  ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps
  net: add annotations on hh->hh_len lockless accesses
  xfs: periodically yield scrub threads to the scheduler
  ath9k_htc: Discard undersized packets
  ath9k_htc: Modify byte order for an error message
  net: core: limit nested device depth
  tcp: annotate tp->rcv_nxt lockless reads
  rxrpc: Fix possible NULL pointer access in ICMP handling
  KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag
  selftests: rtnetlink: add addresses with fixed life time
  powerpc/pseries/hvconsole: Fix stack overread via udbg
  drm/mst: Fix MST sideband up-reply failure handling
  scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails
  bdev: Refresh bdev size for disks without partitioning
  bdev: Factor out bdev revalidation into a common helper
  fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
  tty: serial: msm_serial: Fix lockup for sysrq and oops
  arm64: dts: meson: odroid-c2: Disable usb_otg bus to avoid power failed warning
  dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example
  media: usb: fix memory leak in af9005_identify_state
  regulator: ab8500: Remove AB8505 USB regulator
  media: flexcop-usb: ensure -EIO is returned on error condition
  Bluetooth: Fix memory leak in hci_connect_le_scan
  Bluetooth: delete a stray unlock
  Bluetooth: btusb: fix PM leak in error case of setup
  platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
  xfs: don't check for AG deadlock for realtime files in bunmapi
  ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100
  HID: i2c-hid: Reset ALPS touchpads on resume
  nfsd4: fix up replay_matches_cache()
  PM / devfreq: Check NULL governor in available_governors_show
  drm/msm: include linux/sched/task.h
  ftrace: Avoid potential division by zero in function profiler
  arm64: Revert support for execute-only user mappings
  exit: panic before exit_mm() on global init exit
  ALSA: firewire-motu: Correct a typo in the clock proc string
  ALSA: cs4236: fix error return comparison of an unsigned integer
  apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
  tracing: Fix endianness bug in histogram trigger
  tracing: Have the histogram compare functions convert to u64 first
  tracing: Avoid memory leak in process_system_preds()
  tracing: Fix lock inversion in trace_event_enable_tgid_record()
  rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30
  riscv: ftrace: correct the condition logic in function graph tracer
  gpiolib: fix up emulated open drain outputs
  libata: Fix retrieving of active qcs
  ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINE
  ata: ahci_brcm: Add missing clock management during recovery
  ata: ahci_brcm: Allow optional reset controller to be used
  ata: ahci_brcm: Fix AHCI resources management
  ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
  compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE
  compat_ioctl: block: handle Persistent Reservations
  dmaengine: Fix access to uninitialized dma_slave_caps
  locks: print unsigned ino in /proc/locks
  pstore/ram: Write new dumps to start of recycled zones
  mm: move_pages: return valid node id in status if the page is already on the target node
  memcg: account security cred as well to kmemcg
  mm/zsmalloc.c: fix the migrated zspage statistics.
  media: cec: check 'transmit_in_progress', not 'transmitting'
  media: cec: avoid decrementing transmit_queue_sz if it is 0
  media: cec: CEC 2.0-only bcast messages were ignored
  media: pulse8-cec: fix lost cec_transmit_attempt_done() call
  MIPS: Avoid VDSO ABI breakage due to global register variable
  drm/sun4i: hdmi: Remove duplicate cleanup calls
  ALSA: hda/realtek - Add headset Mic no shutup for ALC283
  ALSA: usb-audio: set the interface format after resume on Dell WD19
  ALSA: usb-audio: fix set_format altsetting sanity check
  ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
  netfilter: nft_tproxy: Fix port selector on Big Endian
  drm: limit to INT_MAX in create_blob ioctl
  taskstats: fix data-race
  xfs: fix mount failure crash on invalid iclog memory access
  ALSA: hda - fixup for the bass speaker on Lenovo Carbon X1 7th gen
  ALSA: hda/realtek - Enable the bass speaker of ASUS UX431FLC
  ALSA: hda/realtek - Add Bass Speaker and fixed dac for bass speaker
  PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
  xen/balloon: fix ballooned page accounting without hotplug enabled
  xen-blkback: prevent premature module unload
  IB/mlx5: Fix steering rule of drop and count
  IB/mlx4: Follow mirror sequence of device add during device removal
  s390/cpum_sf: Avoid SBD overflow condition in irq handler
  s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
  md: raid1: check rdev before reference in raid1_sync_request func
  afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
  net: make socket read/write_iter() honor IOCB_NOWAIT
  usb: gadget: fix wrong endpoint desc
  drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
  scsi: libsas: stop discovering if oob mode is disconnected
  scsi: iscsi: qla4xxx: fix double free in probe
  scsi: qla2xxx: Ignore PORT UPDATE after N2N PLOGI
  scsi: qla2xxx: Send Notify ACK after N2N PLOGI
  scsi: qla2xxx: Configure local loop for N2N target
  scsi: qla2xxx: Fix PLOGI payload and ELS IOCB dump length
  scsi: qla2xxx: Don't call qlt_async_event twice
  scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
  scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
  rxe: correctly calculate iCRC for unaligned payloads
  RDMA/cma: add missed unregister_pernet_subsys in init failure
  afs: Fix SELinux setting security label on /afs
  afs: Fix afs_find_server lookups for ipv4 peers
  PM / devfreq: Don't fail devfreq_dev_release if not in list
  PM / devfreq: Set scaling_max_freq to max on OPP notifier error
  PM / devfreq: Fix devfreq_notifier_call returning errno
  iio: adc: max9611: Fix too short conversion time delay
  drm/amd/display: Fixed kernel panic when booting with DP-to-HDMI dongle
  drm/amdgpu: add cache flush workaround to gfx8 emit_fence
  drm/amdgpu: add check before enabling/disabling broadcast mode
  nvme-fc: fix double-free scenarios on hw queues
  nvme_fc: add module to ops template to allow module references

Conflicts:
	drivers/devfreq/devfreq.c
	drivers/gpu/drm/drm_dp_mst_topology.c
	drivers/gpu/drm/drm_property.c

Change-Id: Iad80571bea0a2197ea36a70c425c61a66c0cf0bc
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2020-02-03 21:41:48 -08:00
chenqiwu
b9227aacdc exit: panic before exit_mm() on global init exit
commit 43cf75d96409a20ef06b756877a2e72b10a026fc upstream.

Currently, when global init and all threads in its thread-group have exited
we panic via:
do_exit()
-> exit_notify()
   -> forget_original_parent()
      -> find_child_reaper()
This makes it hard to extract a useable coredump for global init from a
kernel crashdump because by the time we panic exit_mm() will have already
released global init's mm.
This patch moves the panic futher up before exit_mm() is called. As was the
case previously, we only panic when global init and all its threads in the
thread-group have exited.

Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
[christian.brauner@ubuntu.com: fix typo, rewrite commit message]
Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:02 +01:00
Linus Torvalds
ac351de9dd BACKPORT: make 'user_access_begin()' do 'access_ok()'
upstream commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")

Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.

But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar.  Which makes it very hard to verify that the access has
actually been range-checked.

If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin().  But
nothing really forces the range check.

By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses.  We have way too long a history of people
trying to avoid them.

Bug: 135368228
Change-Id: I4ca0e4566ea080fa148c5e768bb1a0b6f7201c01
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-12 11:28:03 +00:00
Ivaylo Georgiev
a8afaca46b Merge android-4.19-q.66 (5118163) into msm-4.19
* refs/heads/tmp-5118163:
  Linux 4.19.66
  spi: bcm2835: Fix 3-wire mode if DMA is enabled
  cgroup: Fix css_task_iter_advance_css_set() cset skip condition
  cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
  cgroup: Include dying leaders with live threads in PROCS iterations
  cgroup: Implement css_task_iter_skip()
  cgroup: Call cgroup_release() before __exit_signal()
  compat_ioctl: pppoe: fix PPPOEIOCSFWD handling
  r8169: don't use MSI before RTL8168d
  net/mlx5e: Prevent encap flow counter update async to user query
  net/mlx5: Fix modify_cq_in alignment
  tun: mark small packets as owned by the tap sock
  tipc: compat: allow tipc commands without arguments
  ocelot: Cancel delayed work before wq destruction
  NFC: nfcmrvl: fix gpio-handling regression
  net/smc: do not schedule tx_work in SMC_CLOSED state
  net: sched: use temporary variable for actions indexes
  net sched: update vlan action for batched events operations
  net: sched: Fix a possible null-pointer dereference in dequeue_func()
  net: qualcomm: rmnet: Fix incorrect UL checksum offload logic
  net: phylink: Fix flow control for fixed-link
  net/mlx5: Use reversed order when unregister devices
  net/mlx5e: always initialize frag->last_in_page
  net: fix ifindex collision during namespace removal
  net: bridge: mcast: don't delete permanent entries when fast leave is enabled
  net: bridge: delete local fdb on device init failure
  mvpp2: refactor MTU change code
  mvpp2: fix panic on module removal
  mlxsw: spectrum: Fix error path in mlxsw_sp_module_init()
  ipip: validate header length in ipip_tunnel_xmit
  ip6_tunnel: fix possible use-after-free on xmit
  ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6
  ife: error out when nla attributes are empty
  bnx2x: Disable multi-cos feature.
  atm: iphase: Fix Spectre v1 vulnerability
  IB: directly cast the sockaddr union to aockaddr
  HID: Add quirk for HP X1200 PIXART OEM mouse
  HID: wacom: fix bit shift for Cintiq Companion 2
  libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
  libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant
  libnvdimm/region: Register badblocks before namespaces
  libnvdimm/bus: Prevent duplicate device_unregister() calls
  drivers/base: Introduce kill_device()
  driver core: Establish order of operations for device_add and device_del via bitflag
  gcc-9: don't warn about uninitialized variable
  scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure

Change-Id: Id10478edd741fd55ffa841c4ed7c608d9ece3bbb
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-08-14 04:56:34 -07:00
Suren Baghdasaryan
0c4addb718 UPSTREAM: pidfd: fix a poll race when setting exit_state
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
  tsk->exit_state = EXIT_DEAD
                                  pidfd_poll
                                     if (tsk->exit_state)

However nothing prevents the following sequence:

CPU 0                            CPU 1
------------------------------------------------
exit_notify
  do_notify_parent
    do_notify_pidfd
                                   pidfd_poll
                                      if (tsk->exit_state)
  tsk->exit_state = EXIT_DEAD

This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.

To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.

Fixes: b53b0b9d9a6 ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>

(cherry picked from commit b191d6491be67cef2b3fa83015561caca1394ab9)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: Ife81348ae3c3b6f5f8caac0c2f9bacd656582b31
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-08-12 13:36:37 -04:00
Tejun Heo
7528e95b75 cgroup: Call cgroup_release() before __exit_signal()
commit 6b115bf58e6f013ca75e7115aabcbd56c20ff31d upstream.

cgroup_release() calls cgroup_subsys->release() which is used by the
pids controller to uncharge its pid.  We want to use it to manage
iteration of dying tasks which requires putting it before
__unhash_process().  Move cgroup_release() above __exit_signal().
While this makes it uncharge before the pid is freed, pid is RCU freed
anyway and the window is very narrow.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-09 17:52:34 +02:00
Ivaylo Georgiev
35c3faa19f Merge android-4.19.34 (d885da6) into msm-4.19
* refs/heads/tmp-d885da6:
  Revert "coresight: etm4x: Add support to enable ETMv4.2"
  Revert "usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded"
  Linux 4.19.34
  kprobes/x86: Blacklist non-attachable interrupt functions
  bcache: fix potential div-zero error of writeback_rate_p_term_inverse
  ACPI / video: Extend chassis-type detection with a "Lunch Box" check
  net: stmmac: Avoid one more sometimes uninitialized Clang warning
  drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
  Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device
  dmaengine: tegra: avoid overflow of byte tracking
  clk: rockchip: fix frac settings of GPLL clock for rk3328
  clk: meson: clean-up clock registration
  drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup
  x86/build: Mark per-CPU symbols as absolute explicitly for LLD
  wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
  brcmfmac: Use firmware_request_nowarn for the clm_blob
  selinux: do not override context on context mounts
  x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
  drm/nouveau: Stop using drm_crtc_force_disable
  drm: Auto-set allow_fb_modifiers when given modifiers at plane init
  pinctrl: meson: meson8b: add the eth_rxd2 and eth_rxd3 pins
  regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
  media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
  media: rcar-vin: Allow independent VIN link enablement
  netfilter: physdev: relax br_netfilter dependency
  dmaengine: qcom_hidma: initialize tx flags in hidma_prep_dma_*
  dmaengine: qcom_hidma: assign channel cookie correctly
  dmaengine: imx-dma: fix warning comparison of distinct pointer types
  cpu/hotplug: Mute hotplug lockdep during init
  hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
  f2fs: UBSAN: set boolean value iostat_enable correctly
  HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit
  soc/tegra: fuse: Fix illegal free of IO base address
  hwrng: virtio - Avoid repeated init of completion
  media: mt9m111: set initial frame size other than 0x0
  perf script python: Add trace_context extension module to sys.modules
  perf script python: Use PyBytes for attr in trace-event-python
  platform/x86: intel-hid: Missing power button release on some Dell models
  usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded
  ALSA: dice: add support for Solid State Logic Duende Classic/Mini
  drm/amd/display: Enable vblank interrupt during CRC capture
  powerpc/pseries: Perform full re-add of CPU for topology update post-migration
  tty: increase the default flip buffer limit to 2*640K
  backlight: pwm_bl: Use gpiod_get_value_cansleep() to get initial state
  cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting
  powerpc/64s: Clear on-stack exception marker upon exception return
  selftests/bpf: skip verifier tests for unsupported program types
  bpf: fix missing prototype warnings
  block, bfq: fix in-service-queue check for queue merging
  ARM: avoid Cortex-A9 livelock on tight dmb loops
  ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of
  mt7601u: bump supported EEPROM version
  soc: qcom: gsbi: Fix error handling in gsbi_probe()
  efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
  ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation
  drm/vkms: Bugfix extra vblank frame
  sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
  efi/memattr: Don't bail on zero VA if it equals the region's PA
  sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
  ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
  iwlwifi: mvm: fix RFH config command with >=10 CPUs
  staging: spi: mt7621: Add return code check on device_reset()
  i2c: of: Try to find an I2C adapter matching the parent
  platform/x86: intel_pmc_core: Fix PCH IP sts reading
  e1000e: Exclude device from suspend direct complete optimization
  e1000e: fix cyclic resets at link up with active tx
  perf/aux: Make perf_event accessible to setup_aux()
  drm/amd/display: Disconnect mpcc when changing tg
  drm/amd/display: Don't re-program planes for DPMS changes
  drm: rcar-du: add missing of_node_put
  cdrom: Fix race condition in cdrom_sysctl_register
  fbdev: fbmem: fix memory access if logo is bigger than the screen
  net: phy: consider latched link-down status in polling mode
  iw_cxgb4: fix srqidx leak during connection abort
  net: marvell: mvpp2: fix stuck in-band SGMII negotiation
  genirq: Avoid summation loops for /proc/stat
  bcache: improve sysfs_strtoul_clamp()
  bcache: fix potential div-zero error of writeback_rate_i_term_inverse
  bcache: fix input overflow to sequential_cutoff
  bcache: fix input overflow to cache set sysfs file io_error_halflife
  sched/topology: Fix percpu data types in struct sd_data & struct s_data
  usb: f_fs: Avoid crash due to out-of-scope stack ptr access
  ath10k: fix shadow register implementation for WCN3990
  ALSA: PCM: check if ops are defined before suspending PCM
  ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins
  ARM: 8833/1: Ensure that NEON code always compiles with Clang
  netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm
  kprobes: Prohibit probing on RCU debug routine
  kprobes: Prohibit probing on bsearch()
  selftests: skip seccomp get_metadata test if not real root
  ACPI / video: Refactor and fix dmi_is_desktop()
  iwlwifi: pcie: fix emergency path
  perf report: Add s390 diagnosic sampling descriptor size
  leds: lp55xx: fix null deref on firmware load failure
  jbd2: fix race when writing superblock
  cgroup, rstat: Don't flush subtree root unless necessary
  HID: intel-ish-hid: avoid binding wrong ishtp_cl_device
  vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
  xen/gntdev: Do not destroy context while dma-bufs are in use
  mt76: usb: do not run mt76u_queues_deinit twice
  media: mtk-jpeg: Correct return type for mem2mem buffer helpers
  media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
  media: s5p-g2d: Correct return type for mem2mem buffer helpers
  media: rockchip/rga: Correct return type for mem2mem buffer helpers
  media: s5p-jpeg: Correct return type for mem2mem buffer helpers
  media: sh_veu: Correct return type for mem2mem buffer helpers
  media: ov7740: fix runtime pm initialization
  SoC: imx-sgtl5000: add missing put_device()
  perf report: Don't shadow inlined symbol with different addr range
  mwifiex: don't advertise IBSS features without FW support
  perf test: Fix failure of 'evsel-tp-sched' test on s390
  drm/amd/display: Clear stream->mode_changed after commit
  scsi: fcoe: make use of fip_mode enum complete
  scsi: megaraid_sas: return error when create DMA pool failed
  s390/ism: ignore some errors during deregistration
  efi: cper: Fix possible out-of-bounds access
  cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
  ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of()
  perf annotate: Fix getting source line failure
  clk: fractional-divider: check parent rate only if flag is set
  IB/mlx4: Increase the timeout for CM cache
  loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
  platform/mellanox: mlxreg-hotplug: Fix KASAN warning
  platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN
  mlxsw: spectrum: Avoid -Wformat-truncation warnings
  e1000e: Fix -Wformat-truncation warnings
  net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat
  mmc: omap: fix the maximum timeout setting
  btrfs: qgroup: Make qgroup async transaction commit more aggressive
  powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
  iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables
  ARM: 8840/1: use a raw_spinlock_t in unwind
  serial: 8250_pxa: honor the port number from devicetree
  coresight: etm4x: Add support to enable ETMv4.2
  powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
  kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
  scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
  powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
  usb: chipidea: Grab the (legacy) USB PHY by phandle first
  crypto: cavium/zip - fix collision with generic cra_driver_name
  crypto: crypto4xx - add missing of_node_put after of_device_is_available
  mt76: fix a leaked reference by adding a missing of_node_put
  wil6210: check null pointer in _wil_cfg80211_merge_extra_ies
  PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
  tools lib traceevent: Fix buffer overflow in arg_eval
  fs: fix guard_bio_eod to check for real EOD errors
  jbd2: fix invalid descriptor block checksum
  netfilter: conntrack: tcp: only close if RST matches exact sequence
  netfilter: nf_tables: check the result of dereferencing base_chain->stats
  cifs: Fix NULL pointer dereference of devname
  cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED
  f2fs: fix to check inline_xattr_size boundary correctly
  dm thin: add sanity checks to thin-pool and external snapshot creation
  cifs: use correct format characters
  page_poison: play nicely with KASAN
  fs/file.c: initialize init_files.resize_wait
  f2fs: do not use mutex lock in atomic context
  ocfs2: fix a panic problem caused by o2cb_ctl
  mm/slab.c: kmemleak no scan alien caches
  mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
  mm, mempolicy: fix uninit memory access
  memcg: killed threads should not invoke memcg OOM killer
  mm,oom: don't kill global init via memory.oom.group
  mm, swap: bounds check swap_info array accesses to avoid NULL derefs
  mm/page_ext.c: fix an imbalance with kmemleak
  mm/cma.c: cma_declare_contiguous: correct err handling
  mm/sparse: fix a bad comparison
  perf c2c: Fix c2c report for empty numa node
  x86/hyperv: Fix kernel panic when kexec on HyperV
  iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver
  scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
  scsi: hisi_sas: Set PHY linkrate when disconnected
  libbpf: force fixdep compilation at the start of the build
  enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
  net: stmmac: Avoid sometimes uninitialized Clang warnings
  sysctl: handle overflow for file-max
  include/linux/relay.h: fix percpu annotation in struct rchan
  gpio: gpio-omap: fix level interrupt idling
  net/mlx5: Avoid panic when setting vport mac, getting vport config
  net/mlx5: Avoid panic when setting vport rate
  tracing: kdb: Fix ftdump to not sleep
  f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
  f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
  h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
  CIFS: fix POSIX lock leak and invalid ptr deref
  tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
  tty/serial: atmel: Add is_half_duplex helper
  ext4: cleanup bh release code in ext4_ind_remove_space()
  arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
  ANDROID: cuttlefish_defconfig: Enable CONFIG_OVERLAY_FS
  ANDROID: cuttlefish: enable CONFIG_NET_SCH_INGRESS=y

Conflicts:
	drivers/usb/gadget/function/f_fs.c
	mm/page_alloc.c

Change-Id: Ia2a8e99bfdae84d3933749f45ba86f33c5acd713
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-05-16 04:50:34 -07:00
Oleg Nesterov
d0bc74c563 cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting
[ Upstream commit 51bee5abeab2058ea5813c5615d6197a23dbf041 ]

The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which
needs pids_free() to uncharge the pid.

However, ->free() is called from __put_task_struct()->cgroup_free() and this
is too late. Even the trivial program which does

	for (;;) {
		int pid = fork();
		assert(pid >= 0);
		if (pid)
			wait(NULL);
		else
			exit(0);
	}

can run out of limits because release_task()->call_rcu(delayed_put_task_struct)
implies an RCU gp after the task/pid goes away and before the final put().

Test-case:

	mkdir -p /tmp/CG
	mount -t cgroup2 none /tmp/CG
	echo '+pids' > /tmp/CG/cgroup.subtree_control

	mkdir /tmp/CG/PID
	echo 2 > /tmp/CG/PID/pids.max

	perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' &
	echo $! > /tmp/CG/PID/cgroup.procs

Without this patch the forking process fails soon after migration.

Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite
into the new helper, cgroup_release(), called by release_task() which actually
frees the pid(s).

Reported-by: Herton R. Krzesinski <hkrzesin@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:33:13 +02:00
Ivaylo Georgiev
bd72c908a7 Merge android-4.19.27 (36d178b) into msm-4.19
* refs/heads/tmp-36d178b:
  Linux 4.19.27
  x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
  MIPS: eBPF: Fix icache flush end address
  MIPS: BCM63XX: provide DMA masks for ethernet devices
  MIPS: fix truncation in __cmpxchg_small for short values
  hugetlbfs: fix races and page leaks during migration
  drm: Block fb changes for async plane updates
  mm: enforce min addr even if capable() in expand_downwards()
  mmc: sdhci-esdhc-imx: correct the fix of ERR004536
  mmc: cqhci: Fix a tiny potential memory leak on error condition
  mmc: cqhci: fix space allocated for transfer descriptor
  mmc: core: Fix NULL ptr crash from mmc_should_fail_request
  mmc: tmio: fix access width of Block Count Register
  mmc: tmio_mmc_core: don't claim spurious interrupts
  mmc: spi: Fix card detection during probe
  kvm: selftests: Fix region overlap check in kvm_util
  KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
  svm: Fix AVIC incomplete IPI emulation
  cfg80211: extend range deviation for DMG
  mac80211: Add attribute aligned(2) to struct 'action'
  mac80211: don't initiate TDLS connection if station is not associated to AP
  ibmveth: Do not process frames after calling napi_reschedule
  net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
  net: usb: asix: ax88772_bind return error when hw_reset fail
  drm/msm: Fix A6XX support for opp-level
  nvme-multipath: drop optimization for static ANA group IDs
  nvme-rdma: fix timeout handler
  hv_netvsc: Fix hash key value reset after other ops
  hv_netvsc: Refactor assignments of struct netvsc_device_info
  hv_netvsc: Fix ethtool change hash key error
  net: altera_tse: fix connect_local_phy error path
  scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
  scsi: lpfc: nvmet: avoid hang / use-after-free when destroying targetport
  scsi: lpfc: nvme: avoid hang / use-after-free when destroying localport
  writeback: synchronize sync(2) against cgroup writeback membership switches
  direct-io: allow direct writes to empty inodes
  staging: android: ion: Support cpu access during dma_buf_detach
  drm/sun4i: hdmi: Fix usage of TMDS clock
  serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
  tty: serial: qcom_geni_serial: Allow mctrl when flow control is disabled
  drm/amd/powerplay: OD setting fix on Vega10
  locking/rwsem: Fix (possible) missed wakeup
  futex: Fix (possible) missed wakeup
  sched/wake_q: Fix wakeup ordering for wake_q
  sched/wait: Fix rcuwait_wake_up() ordering
  mac80211: fix miscounting of ttl-dropped frames
  staging: rtl8723bs: Fix build error with Clang when inlining is disabled
  drivers: thermal: int340x_thermal: Fix sysfs race condition
  ARC: show_regs: lockdep: avoid page allocator...
  ARC: fix __ffs return value to avoid build warnings
  irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
  selftests: gpio-mockup-chardev: Check asprintf() for error
  selftests: seccomp: use LDLIBS instead of LDFLAGS
  phy: ath79-usb: Fix the main reset name to match the DT binding
  phy: ath79-usb: Fix the power on error path
  selftests/vm/gup_benchmark.c: match gup struct to kernel
  ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
  ASoC: dapm: change snprintf to scnprintf for possible overflow
  ASoC: rt5682: Fix PLL source register definitions
  x86/mm/mem_encrypt: Fix erroneous sizeof()
  genirq: Make sure the initial affinity is not empty
  selftests: rtc: rtctest: add alarm test on minute boundary
  selftests: rtc: rtctest: fix alarm tests
  usb: gadget: Potential NULL dereference on allocation error
  usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
  usb: dwc3: gadget: synchronize_irq dwc irq in suspend
  thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
  clk: vc5: Abort clock configuration without upstream clock
  clk: sysfs: fix invalid JSON in clk_dump
  clk: tegra: dfll: Fix a potential Oop in remove()
  ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized
  ALSA: compress: prevent potential divide by zero bugs
  ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
  drm/msm: Unblock writer if reader closes file
  scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
  mac80211: Change default tx_sk_pacing_shift to 7
  genirq/matrix: Improve target CPU selection for managed interrupts.
  irq/matrix: Spread managed interrupts on allocation
  irq/matrix: Split out the CPU selection code into a helper
  FROMGIT: binder: create node flag to request sender's security context

  Modify include/uapi/linux/android/binder.h, as commit:

  FROMGIT: binder: create node flag to request sender's security context

  introduces enums and structures, which are already defined in other
  userspace files that include the binder uapi file. Thus, the
  redeclaration of these enums and structures can lead to
  build errors. To avoid this, guard the redundant declarations
  in the uapi header with the __KERNEL__ header guard, so they
  are not exported to userspace.

Conflicts:
	drivers/staging/android/ion/ion.c
	sound/core/compress_offload.c

Change-Id: Ibf422b9b32ea1315515c33036b20ae635b8c8e4c
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2019-03-21 01:42:04 -07:00
Prateek Sood
5024f0a29a sched/wait: Fix rcuwait_wake_up() ordering
[ Upstream commit 6dc080eeb2ba01973bfff0d79844d7a59e12542e ]

For some peculiar reason rcuwait_wake_up() has the right barrier in
the comment, but not in the code.

This mistake has been observed to cause a deadlock in the following
situation:

    P1					P2

    percpu_up_read()			percpu_down_write()
      rcu_sync_is_idle() // false
					  rcu_sync_enter()
					  ...
      __percpu_up_read()

[S] ,-  __this_cpu_dec(*sem->read_count)
    |   smp_rmb();
[L] |   task = rcu_dereference(w->task) // NULL
    |
    |				    [S]	    w->task = current
    |					    smp_mb();
    |				    [L]	    readers_active_check() // fail
    `-> <store happens here>

Where the smp_rmb() (obviously) fails to constrain the store.

[ peterz: Added changelog. ]

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8f95c90ceb ("sched/wait, RCU: Introduce rcuwait machinery")
Link: https://lkml.kernel.org/r/1543590656-7157-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-05 17:58:49 +01:00
Ivaylo Georgiev
c26cab8987 Merge android-4.19.20 (c28f73f) into msm-4.19
* refs/heads/tmp-c28f73f:
  Linux 4.19.20
  cifs: Always resolve hostname before reconnecting
  md/raid5: fix 'out of memory' during raid cache recovery
  of: overlay: do not duplicate properties from overlay for new nodes
  of: overlay: use prop add changeset entry for property in new nodes
  of: overlay: add missing of_node_get() in __of_attach_node_sysfs
  of: overlay: add tests to validate kfrees from overlay removal
  of: Convert to using %pOFn instead of device_node.name
  mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  mm: hwpoison: use do_send_sig_info() instead of force_sig()
  mm, oom: fix use-after-free in oom_kill_process
  mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
  oom, oom_reaper: do not enqueue same task twice
  mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
  kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
  btrfs: On error always free subvol_name in btrfs_mount
  Btrfs: fix deadlock when allocating tree block during leaf/node split
  mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
  platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
  platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
  IB/hfi1: Remove overly conservative VM_EXEC flag check
  ALSA: hda/realtek - Fixed hp_pin no value
  ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
  mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
  mmc: bcm2835: Fix DMA channel leak on probe error
  gfs2: Revert "Fix loop in gfs2_rbm_find"
  gpio: sprd: Fix incorrect irq type setting for the async EIC
  gpio: sprd: Fix the incorrect data register
  gpio: pcf857x: Fix interrupts on multiple instances
  gpiolib: fix line event timestamps for nested irqs
  gpio: altera-a10sr: Set proper output level for direction_output
  arm64: hibernate: Clean the __hyp_text to PoC after resume
  arm64: hyp-stub: Forbid kprobing of the hyp-stub
  arm64: Do not issue IPIs for user executable ptes
  arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
  ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
  NFS: Fix up return value on fatal errors in nfs_page_async_flush()
  selftests/seccomp: Enhance per-arch ptrace syscall skip tests
  iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
  fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
  CIFS: Do not consider -ENODATA as stat failure for reads
  CIFS: Fix trace command logging for SMB2 reads and writes
  CIFS: Do not count -ENODATA as failure for query directory
  virtio_net: Differentiate sk_buff and xdp_frame on freeing
  virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
  virtio_net: Don't process redirected XDP frames when XDP is disabled
  virtio_net: Fix out of bounds access of sq
  virtio_net: Fix not restoring real_num_rx_queues
  virtio_net: Don't call free_old_xmit_skbs for xdp_frames
  virtio_net: Don't enable NAPI when interface is down
  sctp: set flow sport from saddr only when it's 0
  sctp: set chunk transport correctly when it's a new asoc
  Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
  ip6mr: Fix notifiers call on mroute_clean_tables()
  net/mlx5e: Allow MAC invalidation while spoofchk is ON
  sctp: improve the events for sctp stream adding
  net: ip6_gre: always reports o_key to userspace
  vhost: fix OOB in get_rx_bufs()
  ucc_geth: Reset BQL queue when stopping device
  tun: move the call to tun_set_real_num_queues
  sctp: improve the events for sctp stream reset
  ravb: expand rx descriptor data to accommodate hw checksum
  net: set default network namespace in init_dummy_netdev()
  net/rose: fix NULL ax25_cb kernel panic
  netrom: switch to sock timer API
  net/mlx4_core: Add masking for a few queries on HCA caps
  net: ip_gre: use erspan key field for tunnel lookup
  net: ip_gre: always reports o_key to userspace
  l2tp: fix reading optional fields of L2TPv3
  l2tp: copy 4 more bytes to linear part if necessary
  ipvlan, l3mdev: fix broken l3s mode wrt local routes
  ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
  ipv6: Consider sk_bound_dev_if when binding a socket to an address
  drm/msm/gpu: fix building without debugfs
  Fix "net: ipv4: do not handle duplicate fragments as overlapping"
  UPSTREAM: net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
  UPSTREAM: binder: filter out nodes when showing binder procs
  UPSTREAM: xfrm: Make set-mark default behavior backward compatible
  ANDROID: cuttlefish_defconfig: Enable CONFIG_RTC_HCTOSYS

Conflicts:
	mm/oom_kill.c

Change-Id: I95452a9f286b95924096afd2dfe439f9d638d404
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-02-27 02:23:31 -08:00
Andrei Vagin
c7122344f9 kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
commit 8fb335e078378c8426fabeed1ebee1fbf915690c upstream.

Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.

zap_pid_ns_processes() can get stuck on waiting tasks from the dead
list.  In this case, we will have one unkillable process with one or
more dead children.

Thanks to Oleg for the advice to release tasks in find_child_reaper().

Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes: 7c8bd2322c ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:14 +01:00
Satya Durga Srinivasu Prabhala
7ebdf76d85 sched: Add snapshot of Window Assisted Load Tracking (WALT)
This snapshot is taken from msm-4.14 as of
commit 871eac76e6be567 ("sched: Improve the scheduler").

Change-Id: Ib4e0b39526d3009cedebb626ece5a767d8247846
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2019-01-02 10:56:07 -08:00
Eric W. Biederman
0102498083 signal: Pass pid type into group_send_sig_info
This passes the information we already have at the call sight
into group_send_sig_info.  Ultimatelly allowing for to better handle
signals sent to a group of processes.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21 12:57:35 -05:00
Eric W. Biederman
6883f81aac pid: Implement PIDTYPE_TGID
Everywhere except in the pid array we distinguish between a tasks pid and
a tasks tgid (thread group id).  Even in the enumeration we want that
distinction sometimes so we have added __PIDTYPE_TGID.  With leader_pid
we almost have an implementation of PIDTYPE_TGID in struct signal_struct.

Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
into the pids array.  Then remove the __PIDTYPE_TGID special case and the
leader_pid in signal_struct.

The net size increase is just an extra pointer added to struct pid and
an extra pair of pointers of an hlist_node added to task_struct.

The effect on code maintenance is the removal of a number of special
cases today and the potential to remove many more special cases as
PIDTYPE_TGID gets used to it's fullest.  The long term potential
is allowing zombie thread group leaders to exit, which will remove
a lot more special cases in the code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21 10:43:12 -05:00
Eric W. Biederman
1fb53567a3 pids: Move task_pid_type into sched/signal.h
The function is general and inline so there is no need
to hide it inside of exit.c

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21 10:43:12 -05:00
Dominik Brodowski
d300b61081 kernel: use kernel_wait4() instead of sys_wait4()
All call sites of sys_wait4() set *rusage to NULL. Therefore, there is
no need for the copy_to_user() handling of *rusage, and we can use
kernel_wait4() directly.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02 20:14:51 +02:00
Andrew Morton
dc8635b78c kernel/exit.c: export abort() to modules
gcc -fisolate-erroneous-paths-dereference can generate calls to abort()
from modular code too.

[arnd@arndb.de: drop duplicate exports of abort()]
  Link: http://lkml.kernel.org/r/20180102103311.706364-1-arnd@arndb.de
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-04 16:45:09 -08:00
Sudip Mukherjee
7c2c11b208 arch: define weak abort()
gcc toggle -fisolate-erroneous-paths-dereference (default at -O2
onwards) isolates faulty code paths such as null pointer access, divide
by zero etc.  If gcc port doesnt implement __builtin_trap, an abort() is
generated which causes kernel link error.

In this case, gcc is generating abort due to 'divide by zero' in
lib/mpi/mpih-div.c.

Currently 'frv' and 'arc' are failing.  Previously other arch was also
broken like m32r was fixed by commit d22e3d69ee ("m32r: fix build
failure").

Let's define this weak function which is common for all arch and fix the
problem permanently.  We can even remove the arch specific 'abort' after
this is done.

Link: http://lkml.kernel.org/r/1513118956-8718-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:49 -08:00
Mark Rutland
6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Kees Cook
1c9fec470b waitid(): Avoid unbalanced user_access_end() on access_ok() error
As pointed out by Linus and David, the earlier waitid() fix resulted in
a (currently harmless) unbalanced user_access_end() call.  This fixes it
to just directly return EFAULT on access_ok() failure.

Fixes: 96ca579a1e ("waitid(): Add missing access_ok() checks")
Acked-by: David Daney <david.daney@cavium.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-20 15:32:54 -04:00
Kees Cook
96ca579a1e waitid(): Add missing access_ok() checks
Adds missing access_ok() checks.

CVE-2017-5123

Reported-by: Chris Salls <chrissalls5@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 4c48abe91b ("waitid(): switch copyout of siginfo to unsafe_put_user()")
Cc: stable@kernel.org # 4.13
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-09 17:03:31 -07:00
Al Viro
6c85501f2f fix infoleak in waitid(2)
kernel_waitid() can return a PID, an error or 0.  rusage is filled in the first
case and waitid(2) rusage should've been copied out exactly in that case, *not*
whenever kernel_waitid() has not returned an error.  Compat variant shares that
braino; none of kernel_wait4() callers do, so the below ought to fix it.

Reported-and-tested-by: Alexander Potapenko <glider@google.com>
Fixes: ce72a16fa7 ("wait4(2)/waitid(2): separate copying rusage to userland")
Cc: stable@vger.kernel.org # v4.13
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-29 13:43:15 -04:00
Linus Torvalds
dd198ce714 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "Life has been busy and I have not gotten half as much done this round
  as I would have liked. I delayed it so that a minor conflict
  resolution with the mips tree could spend a little time in linux-next
  before I sent this pull request.

  This includes two long delayed user namespace changes from Kirill
  Tkhai. It also includes a very useful change from Serge Hallyn that
  allows the security capability attribute to be used inside of user
  namespaces. The practical effect of this is people can now untar
  tarballs and install rpms in user namespaces. It had been suggested to
  generalize this and encode some of the namespace information
  information in the xattr name. Upon close inspection that makes the
  things that should be hard easy and the things that should be easy
  more expensive.

  Then there is my bugfix/cleanup for signal injection that removes the
  magic encoding of the siginfo union member from the kernel internal
  si_code. The mips folks reported the case where I had used FPE_FIXME
  me is impossible so I have remove FPE_FIXME from mips, while at the
  same time including a return statement in that case to keep gcc from
  complaining about unitialized variables.

  I almost finished the work to get make copy_siginfo_to_user a trivial
  copy to user. The code is available at:

     git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

  But I did not have time/energy to get the code posted and reviewed
  before the merge window opened.

  I was able to see that the security excuse for just copying fields
  that we know are initialized doesn't work in practice there are buggy
  initializations that don't initialize the proper fields in siginfo. So
  we still sometimes copy unitialized data to userspace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Introduce v3 namespaced file capabilities
  mips/signal: In force_fcr31_sig return in the impossible case
  signal: Remove kernel interal si_code magic
  fcntl: Don't use ambiguous SIG_POLL si_codes
  prctl: Allow local CAP_SYS_ADMIN changing exe_file
  security: Use user_namespace::level to avoid redundant iterations in cap_capable()
  userns,pidns: Verify the userns for new pid namespaces
  signal/testing: Don't look for __SI_FAULT in userspace
  signal/mips: Document a conflict with SI_USER with SIGFPE
  signal/sparc: Document a conflict with SI_USER with SIGFPE
  signal/ia64: Document a conflict with SI_USER with SIGFPE
  signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11 18:34:47 -07:00
Linus Torvalds
5f82e71a00 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Add 'cross-release' support to lockdep, which allows APIs like
   completions, where it's not the 'owner' who releases the lock, to be
   tracked. It's all activated automatically under
   CONFIG_PROVE_LOCKING=y.

 - Clean up (restructure) the x86 atomics op implementation to be more
   readable, in preparation of KASAN annotations. (Dmitry Vyukov)

 - Fix static keys (Paolo Bonzini)

 - Add killable versions of down_read() et al (Kirill Tkhai)

 - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

 - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

 - Remove smp_mb__before_spinlock() and convert its usages, introduce
   smp_mb__after_spinlock() (Peter Zijlstra)

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  locking/lockdep/selftests: Fix mixed read-write ABBA tests
  sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
  acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
  locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
  smp: Avoid using two cache lines for struct call_single_data
  locking/lockdep: Untangle xhlock history save/restore from task independence
  locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
  futex: Remove duplicated code and fix undefined behaviour
  Documentation/locking/atomic: Finish the document...
  locking/lockdep: Fix workqueue crossrelease annotation
  workqueue/lockdep: 'Fix' flush_work() annotation
  locking/lockdep/selftests: Add mixed read-write ABBA tests
  mm, locking/barriers: Clarify tlb_flush_pending() barriers
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
  locking/lockdep: Explicitly initialize wq_barrier::done::map
  locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
  locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
  locking/refcounts, x86/asm: Implement fast refcount overflow protection
  locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
  ...
2017-09-04 11:52:29 -07:00
Paul E. McKenney
656e7c0c0a Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD
doc.2017.08.17a: Documentation updates.
fixes.2017.08.17a: RCU fixes.
hotplug.2017.07.25b: CPU-hotplug updates.
misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
srcu.2017.07.27c: SRCU updates.
torture.2017.07.24c: Torture-test updates.
2017-08-17 08:10:04 -07:00
Paul E. McKenney
8083f29349 exit: Replace spin_unlock_wait() with lock/unlock pair
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and
it appears that all callers could do just as well with a lock/unlock pair.
This commit therefore replaces the spin_unlock_wait() call in do_exit()
with spin_lock() followed immediately by spin_unlock().  This should be
safe from a performance perspective because the lock is a per-task lock,
and this is happening only at task-exit time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17 08:08:57 -07:00
Paul E. McKenney
ccdd29ffff rcu: Create reasonable API for do_exit() TASKS_RCU processing
Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
APIs for do_exit() use.  This has the benefit of confining the use of the
tasks_rcu_exit_srcu variable to one file, allowing it to become static.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:05 -07:00
Byungchul Park
b09be676e0 locking/lockdep: Implement the 'crossrelease' feature
Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock.

So lockdep should track these locks to do a better job. The 'crossrelease'
implementation makes these primitives also be tracked.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:07 +02:00
Eric W. Biederman
cc731525f2 signal: Remove kernel interal si_code magic
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS

While this looks plausible on the surface, in practice this situation has
not worked well.

- Injected positive signals are not copied to user space properly
  unless they have these magic high bits set.

- Injected positive signals are not reported properly by signalfd
  unless they have these magic high bits set.

- These kernel internal values leaked to userspace via ptrace_peek_siginfo

- It was possible to inject these kernel internal values and cause the
  the kernel to misbehave.

- Kernel developers got confused and expected these kernel internal values
  in userspace in kernel self tests.

- Kernel developers got confused and set si_code to __SI_FAULT which
  is SI_USER in userspace which causes userspace to think an ordinary user
  sent the signal and that it was not kernel generated.

- The values make it impossible to reorganize the code to transform
  siginfo_copy_to_user into a plain copy_to_user.  As si_code must
  be massaged before being passed to userspace.

So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.

To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used.  Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.

A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like.  The good news is only problem
architectures pay the cost.

Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values.  Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.

Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first.  The high bits are no
longer used to hold a magic union member.

Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.

The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.

The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.

Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-07-24 14:30:28 -05:00
zhongjiang
dd83c161fb kernel/exit.c: avoid undefined behaviour when calling wait4()
wait4(-2147483648, 0x20, 0, 0xdd0000) triggers:
UBSAN: Undefined behaviour in kernel/exit.c:1651:9

The related calltrace is as follows:

  negation of -2147483648 cannot be represented in type 'int':
  CPU: 9 PID: 16482 Comm: zj Tainted: G    B          ---- -------   3.10.0-327.53.58.71.x86_64+ #66
  Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285          /BC11BTSA              , BIOS CTSAV036 04/27/2011
  Call Trace:
    dump_stack+0x19/0x1b
    ubsan_epilogue+0xd/0x50
    __ubsan_handle_negate_overflow+0x109/0x14e
    SyS_wait4+0x1cb/0x1e0
    system_call_fastpath+0x16/0x1b

Exclude the overflow to avoid the UBSAN warning.

Link: http://lkml.kernel.org/r/1497264618-20212-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:36 -07:00
Al Viro
634a816095 fix waitid(2) breakage
We lose the distinction between "found a PID" and "nothing, but that's not
an error" a bit too early in waitid().  Easily fixed, fortunately...

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Fixes: 67d7ddded3 ("waitid(2): leave copyout of siginfo to syscall itself")
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-08 11:26:39 -04:00
Mike Rapoport
57ecbd3831 kernel/exit.c: don't include unused userfaultfd_k.h
Commit dd0db88d80 ("userfaultfd: non-cooperative: rollback
userfaultfd_exit") removed userfaultfd callback from exit() which makes
the include of <linux/userfaultfd_k.h> unnecessary.

Link: http://lkml.kernel.org/r/1494930907-3060-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Linus Torvalds
4be95131bf Merge branch 'work.sys_wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull wait syscall updates from Al Viro:
 "Consolidating sys_wait* and compat counterparts.

  Gets rid of set_fs()/double-copy mess, simplifies the whole thing
  (lifting the copyouts to the syscalls means less headache in the part
  that does actual work - fewer failure exits, to start with), gets rid
  of the overhead of field-by-field __put_user()"

* 'work.sys_wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  osf_wait4: switch to kernel_wait4()
  waitid(): switch copyout of siginfo to unsafe_put_user()
  wait_task_zombie: consolidate info logics
  kill wait_noreap_copyout()
  lift getrusage() from wait_noreap_copyout()
  waitid(2): leave copyout of siginfo to syscall itself
  kernel_wait4()/kernel_waitid(): delay copying status to userland
  wait4(2)/waitid(2): separate copying rusage to userland
  move compat wait4 and waitid next to native variants
2017-07-05 14:10:19 -07:00
Davidlohr Bueso
f11cc0760b sched/core: Drop the unused try_get_task_struct() helper function
This function was introduced by:

  150593bf86 ("sched/api: Introduce task_rcu_dereference() and try_get_task_struct()")

... to allow easier usage of task_rcu_dereference(), however no users
were ever added. Drop the helper.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/20170615023730.22827-1-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:48:37 +02:00
Ingo Molnar
ac6424b981 sched/wait: Rename wait_queue_t => wait_queue_entry_t
Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:18:27 +02:00
Al Viro
92ebce5ac5 osf_wait4: switch to kernel_wait4()
... and sanitize copying rusage to userland

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:16:26 -04:00
Al Viro
4c48abe91b waitid(): switch copyout of siginfo to unsafe_put_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:16:18 -04:00
Al Viro
76d9871e11 wait_task_zombie: consolidate info logics
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:14:59 -04:00
Al Viro
bb380ec33a kill wait_noreap_copyout()
folds into callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:13:53 -04:00
Al Viro
e61a250229 lift getrusage() from wait_noreap_copyout()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:13:17 -04:00
Al Viro
67d7ddded3 waitid(2): leave copyout of siginfo to syscall itself
have kernel_waitid() collect the information needed for siginfo into
a small structure (waitid_info) passed to it; deal with copyout in
sys_waitid()/compat_sys_waitid().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:13:07 -04:00
Al Viro
359566faef kernel_wait4()/kernel_waitid(): delay copying status to userland
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:11:07 -04:00
Al Viro
ce72a16fa7 wait4(2)/waitid(2): separate copying rusage to userland
New helpers: kernel_waitid() and kernel_wait4().  sys_waitid(),
sys_wait4() and their compat variants switched to those.  Copying
struct rusage to userland is left to syscall itself.  For
compat_sys_wait4() that eliminates the use of set_fs() completely.
For compat_sys_waitid() it's still needed (for siginfo handling);
that will change shortly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:11:00 -04:00
Al Viro
7e95a22590 move compat wait4 and waitid next to native variants
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:10:51 -04:00
Andrea Arcangeli
dd0db88d80 userfaultfd: non-cooperative: rollback userfaultfd_exit
Patch series "userfaultfd non-cooperative further update for 4.11 merge
window".

Unfortunately I noticed one relevant bug in userfaultfd_exit while doing
more testing.  I've been doing testing before and this was also tested
by kbuild bot and exercised by the selftest, but this bug never
reproduced before.

I dropped userfaultfd_exit as result.  I dropped it because of
implementation difficulty in receiving signals in __mmput and because I
think -ENOSPC as result from the background UFFDIO_COPY should be enough
already.

Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit
wasn't exercised by the selftest and when I tried to exercise it, after
moving it to a more correct place in __mmput where it would make more
sense and where the vma list is stable, it resulted in the
event_wait_completion in D state.  So then I added the second patch to
be sure even if we call userfaultfd_event_wait_completion too late
during task exit(), we won't risk to generate tasks in D state.  The
same check exists in handle_userfault() for the same reason, except it
makes a difference there, while here is just a robustness check and it's
run under WARN_ON_ONCE.

While looking at the userfaultfd_event_wait_completion() function I
looked back at its callers too while at it and I think it's not ok to
stop executing dup_fctx on the fcs list because we relay on
userfaultfd_event_wait_completion to execute
userfaultfd_ctx_put(fctx->orig) which is paired against
userfaultfd_ctx_get(fctx->orig) in dup_userfault just before
list_add(fcs).  This change only takes care of fctx->orig but this area
also needs further review looking for similar problems in fctx->new.

The only patch that is urgent is the first because it's an use after
free during a SMP race condition that affects all processes if
CONFIG_USERFAULTFD=y.  Very hard to reproduce though and probably
impossible without SLUB poisoning enabled.

This patch (of 3):

I once reproduced this oops with the userfaultfd selftest, it's not
easily reproducible and it requires SLUB poisoning to reproduce.

    general protection fault: 0000 [#1] SMP
    Modules linked in:
    CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G               ------------ T 3.10.0+ #15
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000
    RIP: 0010:[<ffffffff81451299>]  [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
    RSP: 0018:ffff8801f833fe80  EFLAGS: 00010202
    RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600
    RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000
    R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700
    FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
      do_exit+0x297/0xd10
      SyS_exit+0x17/0x20
      tracesys+0xdd/0xe2
    Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 <4c> 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80
    RIP  [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
     RSP <ffff8801f833fe80>
    ---[ end trace 9fecd6dcb442846a ]---

In the debugger I located the "mm" pointer in the stack and walking
mm->mmap->vm_next through the end shows the vma->vm_next list is fully
consistent and it is null terminated list as expected.  So this has to
be an SMP race condition where userfaultfd_exit was running while the
vma list was being modified by another CPU.

When userfaultfd_exit() run one of the ->vm_next pointers pointed to
SLAB_POISON (RBX is the vma pointer and is 0x6b6b..).

The reason is that it's not running in __mmput but while there are still
other threads running and it's not holding the mmap_sem (it can't as it
has to wait the even to be received by the manager).  So this is an use
after free that was happening for all processes.

One more implementation problem aside from the race condition:
userfaultfd_exit has really to check a flag in mm->flags before walking
the vma or it's going to slowdown the exit() path for regular tasks.

One more implementation problem: at that point signals can't be
delivered so it would also create a task in D state if the manager
doesn't read the event.

The major design issue: it overall looks superfluous as the manager can
check for -ENOSPC in the background transfer:

	if (mmget_not_zero(ctx->mm)) {
[..]
	} else {
		return -ENOSPC;
	}

It's safer to roll it back and re-introduce it later if at all.

[rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT]
  Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:09 -08:00
Ingo Molnar
32ef5517c2 sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>
Introduce a trivial, mostly empty <linux/sched/cputime.h> header
to prepare for the moving of cputime functionality out of sched.h.

Update all code that relies on these facilities.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:39 +01:00
Ingo Molnar
68db0cf106 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:36 +01:00
Ingo Molnar
299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00