352 Commits

Author SHA1 Message Date
Matsvei Niaverau
c8c286e8e9 Merge branch 'android-4.14-stable' into lineage-20 2023-11-16 12:48:55 +01:00
Greg Kroah-Hartman
376d860a9a Merge 4.14.304 into android-4.14-stable
Changes in 4.14.304
	pNFS/filelayout: Fix coalescing test for single DS
	net/ethtool/ioctl: return -EOPNOTSUPP if we have no phy stats
	RDMA/srp: Move large values to a new enum for gcc13
	f2fs: let's avoid panic if extent_tree is not created
	nilfs2: fix general protection fault in nilfs_btree_insert()
	xhci-pci: set the dma max_seg_size
	usb: xhci: Check endpoint is valid before dereferencing it
	prlimit: do_prlimit needs to have a speculation check
	USB: serial: option: add Quectel EM05-G (GR) modem
	USB: serial: option: add Quectel EM05-G (CS) modem
	USB: serial: option: add Quectel EM05-G (RS) modem
	USB: serial: option: add Quectel EC200U modem
	USB: serial: option: add Quectel EM05CN (SG) modem
	USB: serial: option: add Quectel EM05CN modem
	USB: misc: iowarrior: fix up header size for USB_DEVICE_ID_CODEMERCS_IOW100
	usb: core: hub: disable autosuspend for TI TUSB8041
	USB: serial: cp210x: add SCALANCE LPE-9000 device id
	usb: host: ehci-fsl: Fix module alias
	usb: gadget: g_webcam: Send color matching descriptor per frame
	usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate()
	usb-storage: apply IGNORE_UAS only for HIKSEMI MD202 on RTL9210
	serial: pch_uart: Pass correct sg to dma_unmap_sg()
	serial: atmel: fix incorrect baudrate setup
	gsmi: fix null-deref in gsmi_get_variable
	x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
	Linux 4.14.304

Change-Id: I1d0be4a225148a9a518b88a5f9146278d41198c8
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-01-25 15:44:17 +00:00
Greg Kroah-Hartman
291a0395bb prlimit: do_prlimit needs to have a speculation check
commit 739790605705ddcf18f21782b9c99ad7d53a8c11 upstream.

do_prlimit() adds the user-controlled resource value to a pointer that
will subsequently be dereferenced.  In order to help prevent this
codepath from being used as a spectre "gadget" a barrier needs to be
added after checking the range.

Reported-by: Jordy Zomer <jordyzomer@google.com>
Tested-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:05:18 +01:00
Greg Kroah-Hartman
1a9762f5e7 Merge 4.14.248 into android-4.14-stable
Changes in 4.14.248
	s390/bpf: Fix optimizing out zero-extensions
	rcu: Fix missed wakeup of exp_wq waiters
	apparmor: remove duplicate macro list_entry_is_head()
	crypto: talitos - fix max key size for sha384 and sha512
	sctp: validate chunk size in __rcv_asconf_lookup
	sctp: add param size validation for SCTP_PARAM_SET_PRIMARY
	dmaengine: acpi: Avoid comparison GSI with Linux vIRQ
	thermal/drivers/exynos: Fix an error code in exynos_tmu_probe()
	9p/trans_virtio: Remove sysfs file on probe failure
	prctl: allow to setup brk for et_dyn executables
	profiling: fix shift-out-of-bounds bugs
	pwm: lpc32xx: Don't modify HW state in .probe() after the PWM chip was registered
	Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
	parisc: Move pci_dev_is_behind_card_dino to where it is used
	dmaengine: ioat: depends on !UML
	dmaengine: xilinx_dma: Set DMA mask for coherent APIs
	ceph: lockdep annotations for try_nonblocking_invalidate
	nilfs2: fix memory leak in nilfs_sysfs_create_device_group
	nilfs2: fix NULL pointer in nilfs_##name##_attr_release
	nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
	nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
	nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
	nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
	pwm: rockchip: Don't modify HW state in .remove() callback
	blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()
	drm/nouveau/nvkm: Replace -ENOSYS with -ENODEV
	Linux 4.14.248

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8aca967b6e6877f9760b0609491b408d8bcdfdea
2021-09-26 13:56:50 +02:00
Cyrill Gorcunov
5f83a3a14b prctl: allow to setup brk for et_dyn executables
commit e1fbbd073137a9d63279f6bf363151a938347640 upstream.

Keno Fischer reported that when a binray loaded via ld-linux-x the
prctl(PR_SET_MM_MAP) doesn't allow to setup brk value because it lays
before mm:end_data.

For example a test program shows

 | # ~/t
 |
 | start_code      401000
 | end_code        401a15
 | start_stack     7ffce4577dd0
 | start_data	   403e10
 | end_data        40408c
 | start_brk	   b5b000
 | sbrk(0)         b5b000

and when executed via ld-linux

 | # /lib64/ld-linux-x86-64.so.2 ~/t
 |
 | start_code      7fc25b0a4000
 | end_code        7fc25b0c4524
 | start_stack     7fffcc6b2400
 | start_data	   7fc25b0ce4c0
 | end_data        7fc25b0cff98
 | start_brk	   55555710c000
 | sbrk(0)         55555710c000

This of course prevent criu from restoring such programs.  Looking into
how kernel operates with brk/start_brk inside brk() syscall I don't see
any problem if we allow to setup brk/start_brk without checking for
end_data.  Even if someone pass some weird address here on a purpose then
the worst possible result will be an unexpected unmapping of existing vma
(own vma, since prctl works with the callers memory) but test for
RLIMIT_DATA is still valid and a user won't be able to gain more memory in
case of expanding VMAs via new values shipped with prctl call.

Link: https://lkml.kernel.org/r/20210121221207.GB2174@grain
Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Andrey Vagin <avagin@gmail.com>
Tested-by: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-26 13:37:28 +02:00
JianMin Liu
675bd78068 [ALPS04791510] task-turbo: Introduce to Task Turbo
A new option CONFIG_MTK_TASK_TURBO for task-turbo feature.
  Task turbo provide enhancement of APP launch and lock latency
  via Preempting lock waiting queue and more oppotunity to occupy
  CPU resource. user can apply pid to turbo the specific task via
  turbo_pid interface.

  If task-turbo enabled
    1) When app is TOP-APP group, turbo UI thread/Render thread
    2) Inherit turbo abilty to lock holder and binders target task
    3) turbo User-specified task

  How to enable(default off):
   Task-turbo for launch:
    - echo 15 > /sys/module/task_turbo/parameters/feats

   Task-turbo for lock latency:
    - echo 7  > /sys/module/task_turbo/parameters/feats

  Related proc node and setting:
    a. cat /proc/[pid]/task/[tid]/turbo
      - query turbo task status

    b. echo [pid] > /sys/module/task_turbo/parameters/turbo_pid
      - turbo specific task by pid

    c. echo pid > /sys/module/task_turbo/parameters/unset_turbo_pid
      - de-turbo specific task by pid

MTK-Commit-Id: cd06fe7846efde21e4af495da3406bca40876739

Change-Id: Ic7f0ccc00332cf1feb39bb6b9a55bf756228187d
CR-Id: ALPS04791510
Feature: System Performance
Signed-off-by: JianMin Liu <jian-min.liu@mediatek.com>
2021-01-29 01:03:15 +08:00
Greg Kroah-Hartman
c1013a481e Merge 4.14.200 into android-4.14-stable
Changes in 4.14.200
	af_key: pfkey_dump needs parameter validation
	phy: qcom-qmp: Use correct values for ipq8074 PCIe Gen2 PHY init
	KVM: fix memory leak in kvm_io_bus_unregister_dev()
	kprobes: fix kill kprobe which has been marked as gone
	mm/thp: fix __split_huge_pmd_locked() for migration PMD
	RDMA/ucma: ucma_context reference leak in error path
	hdlc_ppp: add range checks in ppp_cp_parse_cr()
	ip: fix tos reflection in ack and reset packets
	net: ipv6: fix kconfig dependency warning for IPV6_SEG6_HMAC
	tipc: fix shutdown() of connection oriented socket
	tipc: use skb_unshare() instead in tipc_buf_append()
	bnxt_en: Protect bnxt_set_eee() and bnxt_set_pauseparam() with mutex.
	net: phy: Avoid NPD upon phy_detach() when driver is unbound
	net: add __must_check to skb_put_padto()
	ipv4: Update exception handling for multipath routes via same device
	geneve: add transport ports in route lookup for geneve
	serial: 8250: Avoid error message on reprobe
	mm: fix double page fault on arm64 if PTE_AF is cleared
	scsi: aacraid: fix illegal IO beyond last LBA
	m68k: q40: Fix info-leak in rtc_ioctl
	gma/gma500: fix a memory disclosure bug due to uninitialized bytes
	ASoC: kirkwood: fix IRQ error handling
	media: smiapp: Fix error handling at NVM reading
	arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
	x86/ioapic: Unbreak check_timer()
	ALSA: usb-audio: Add delay quirk for H570e USB headsets
	ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged
	PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out
	scsi: fnic: fix use after free
	clk/ti/adpll: allocate room for terminating null
	mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
	mfd: mfd-core: Protect against NULL call-back function pointer
	tracing: Adding NULL checks for trace_array descriptor pointer
	bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
	RDMA/i40iw: Fix potential use after free
	xfs: fix attr leaf header freemap.size underflow
	RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'
	mmc: core: Fix size overflow for mmc partitions
	gfs2: clean up iopen glock mess in gfs2_create_inode
	debugfs: Fix !DEBUG_FS debugfs_create_automount
	CIFS: Properly process SMB3 lease breaks
	kernel/sys.c: avoid copying possible padding bytes in copy_to_user
	neigh_stat_seq_next() should increase position index
	rt_cpu_seq_next should increase position index
	seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
	media: ti-vpe: cal: Restrict DMA to avoid memory corruption
	ACPI: EC: Reference count query handlers under lock
	dmaengine: zynqmp_dma: fix burst length configuration
	powerpc/eeh: Only dump stack once if an MMIO loop is detected
	tracing: Set kernel_stack's caller size properly
	ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter
	selftests/ftrace: fix glob selftest
	tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility
	Bluetooth: Fix refcount use-after-free issue
	mm: pagewalk: fix termination condition in walk_pte_range()
	Bluetooth: prefetch channel before killing sock
	KVM: fix overflow of zero page refcount with ksm running
	ALSA: hda: Clear RIRB status before reading WP
	skbuff: fix a data race in skb_queue_len()
	audit: CONFIG_CHANGE don't log internal bookkeeping as an event
	selinux: sel_avc_get_stat_idx should increase position index
	scsi: lpfc: Fix RQ buffer leakage when no IOCBs available
	scsi: lpfc: Fix coverity errors in fmdi attribute handling
	drm/omap: fix possible object reference leak
	perf test: Fix test trace+probe_vfs_getname.sh on s390
	RDMA/rxe: Fix configuration of atomic queue pair attributes
	KVM: x86: fix incorrect comparison in trace event
	media: staging/imx: Missing assignment in imx_media_capture_device_register()
	x86/pkeys: Add check for pkey "overflow"
	bpf: Remove recursion prevention from rcu free callback
	dmaengine: tegra-apb: Prevent race conditions on channel's freeing
	media: go7007: Fix URB type for interrupt handling
	Bluetooth: guard against controllers sending zero'd events
	timekeeping: Prevent 32bit truncation in scale64_check_overflow()
	ext4: fix a data race at inode->i_disksize
	mm: avoid data corruption on CoW fault into PFN-mapped VMA
	drm/amdgpu: increase atombios cmd timeout
	ath10k: use kzalloc to read for ath10k_sdio_hif_diag_read
	scsi: aacraid: Disabling TM path and only processing IOP reset
	Bluetooth: L2CAP: handle l2cap config request during open state
	media: tda10071: fix unsigned sign extension overflow
	xfs: don't ever return a stale pointer from __xfs_dir3_free_read
	tpm: ibmvtpm: Wait for buffer to be set before proceeding
	rtc: ds1374: fix possible race condition
	tracing: Use address-of operator on section symbols
	serial: 8250_port: Don't service RX FIFO if throttled
	serial: 8250_omap: Fix sleeping function called from invalid context during probe
	serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout
	perf cpumap: Fix snprintf overflow check
	cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn
	tools: gpio-hammer: Avoid potential overflow in main
	RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices
	SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()'
	svcrdma: Fix leak of transport addresses
	ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len
	ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor
	NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
	mm/kmemleak.c: use address-of operator on section symbols
	mm/filemap.c: clear page error before actual read
	mm/vmscan.c: fix data races using kswapd_classzone_idx
	mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
	scsi: qedi: Fix termination timeouts in session logout
	serial: uartps: Wait for tx_empty in console setup
	KVM: Remove CREATE_IRQCHIP/SET_PIT2 race
	bdev: Reduce time holding bd_mutex in sync in blkdev_close()
	drivers: char: tlclk.c: Avoid data race between init and interrupt handler
	staging:r8188eu: avoid skb_clone for amsdu to msdu conversion
	sparc64: vcc: Fix error return code in vcc_probe()
	arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
	dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion
	atm: fix a memory leak of vcc->user_back
	power: supply: max17040: Correct voltage reading
	phy: samsung: s5pv210-usb2: Add delay after reset
	Bluetooth: Handle Inquiry Cancel error after Inquiry Complete
	USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe()
	tty: serial: samsung: Correct clock selection logic
	ALSA: hda: Fix potential race in unsol event handler
	powerpc/traps: Make unrecoverable NMIs die instead of panic
	fuse: don't check refcount after stealing page
	USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int
	arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
	e1000: Do not perform reset in reset_task if we are already down
	drm/nouveau/debugfs: fix runtime pm imbalance on error
	printk: handle blank console arguments passed in.
	usb: dwc3: Increase timeout for CmdAct cleared by device controller
	btrfs: don't force read-only after error in drop snapshot
	vfio/pci: fix memory leaks of eventfd ctx
	perf util: Fix memory leak of prefix_if_not_in
	perf kcore_copy: Fix module map when there are no modules loaded
	mtd: rawnand: omap_elm: Fix runtime PM imbalance on error
	ceph: fix potential race in ceph_check_caps
	mm/swap_state: fix a data race in swapin_nr_pages
	rapidio: avoid data race between file operation callbacks and mport_cdev_add().
	mtd: parser: cmdline: Support MTD names containing one or more colons
	x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline
	vfio/pci: Clear error and request eventfd ctx after releasing
	cifs: Fix double add page to memcg when cifs_readpages
	scsi: libfc: Handling of extra kref
	scsi: libfc: Skip additional kref updating work event
	selftests/x86/syscall_nt: Clear weird flags after each test
	vfio/pci: fix racy on error and request eventfd ctx
	btrfs: qgroup: fix data leak caused by race between writeback and truncate
	s390/init: add missing __init annotations
	i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()
	objtool: Fix noreturn detection for ignored functions
	ieee802154: fix one possible memleak in ca8210_dev_com_init
	ieee802154/adf7242: check status of adf7242_read_reg
	clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
	mwifiex: Increase AES key storage size to 256 bits
	batman-adv: bla: fix type misuse for backbone_gw hash indexing
	atm: eni: fix the missed pci_disable_device() for eni_init_one()
	batman-adv: mcast/TT: fix wrongly dropped or rerouted packets
	mac802154: tx: fix use-after-free
	drm/vc4/vc4_hdmi: fill ASoC card owner
	net: qed: RDMA personality shouldn't fail VF load
	batman-adv: Add missing include for in_interrupt()
	batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh
	ALSA: asihpi: fix iounmap in error handler
	MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()
	s390/dasd: Fix zero write for FBA devices
	kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
	mm, THP, swap: fix allocating cluster for swapfile by mistake
	lib/string.c: implement stpcpy
	ata: define AC_ERR_OK
	ata: make qc_prep return ata_completion_errors
	ata: sata_mv, avoid trigerrable BUG_ON
	Linux 4.14.200

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3d3049dca196c46cb6b2a66d60a5a6a3a099efbb
2020-10-01 17:59:29 +02:00
Joe Perches
62fd2fcca3 kernel/sys.c: avoid copying possible padding bytes in copy_to_user
[ Upstream commit 5e1aada08cd19ea652b2d32a250501d09b02ff2e ]

Initialization is not guaranteed to zero padding bytes so use an
explicit memset instead to avoid leaking any kernel content in any
possible padding bytes.

Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:12:30 +02:00
Catalin Marinas
cde9b8ce92 UPSTREAM: arm64: Tighten the PR_{SET, GET}_TAGGED_ADDR_CTRL prctl() unused arguments
(Upstream commit 3e91ec89f527b9870fe42dcbdb74fd389d123a95).

Require that arg{3,4,5} of the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl and
arg2 of the PR_GET_TAGGED_ADDR_CTRL prctl() are zero rather than ignored
for future extensions.

Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I8bb5c3eb4728440880c971d77904f7e45b571ddc
2019-10-07 14:56:02 -04:00
Catalin Marinas
aa5a358032 BACKPORT: arm64: Introduce prctl() options to control the tagged user addresses ABI
(Upstream commit 63f0c60379650d82250f22e4cf4137ef3dc4f43d).

It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve(). A Kconfig
option allows the overall disabling of the relaxed ABI.

The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Change-Id: I2d52c5589b05415faab315c116245f1058d64750
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
2019-10-07 14:56:02 -04:00
Greg Kroah-Hartman
cfee25d274 Merge 4.14.126 into android-4.14
Changes in 4.14.126
	rapidio: fix a NULL pointer dereference when create_workqueue() fails
	fs/fat/file.c: issue flush after the writeback of FAT
	sysctl: return -EINVAL if val violates minmax
	ipc: prevent lockup on alloc_msg and free_msg
	ARM: prevent tracing IPI_CPU_BACKTRACE
	mm/hmm: select mmu notifier when selecting HMM
	hugetlbfs: on restore reserve error path retain subpool reservation
	mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE
	mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
	mm/cma.c: fix the bitmap status to show failed allocation reason
	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
	mm/slab.c: fix an infinite loop in leaks_show()
	kernel/sys.c: prctl: fix false positive in validate_prctl_map()
	thermal: rcar_gen3_thermal: disable interrupt in .remove
	drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
	mfd: tps65912-spi: Add missing of table registration
	mfd: intel-lpss: Set the device in reset state when init
	drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration
	mfd: twl6040: Fix device init errors for ACCCTL register
	perf/x86/intel: Allow PEBS multi-entry in watermark mode
	drm/bridge: adv7511: Fix low refresh rate selection
	objtool: Don't use ignore flag for fake jumps
	EDAC/mpc85xx: Prevent building as a module
	pwm: meson: Use the spin-lock only to protect register modifications
	ntp: Allow TAI-UTC offset to be set to zero
	f2fs: fix to avoid panic in do_recover_data()
	f2fs: fix to clear dirty inode in error path of f2fs_iget()
	f2fs: fix to avoid panic in dec_valid_block_count()
	f2fs: fix to do sanity check on valid block count of segment
	percpu: remove spurious lock dependency between percpu and sched
	configfs: fix possible use-after-free in configfs_register_group
	uml: fix a boot splat wrt use of cpu_all_mask
	mmc: mmci: Prevent polling for busy detection in IRQ context
	watchdog: imx2_wdt: Fix set_timeout for big timeout values
	watchdog: fix compile time error of pretimeout governors
	blk-mq: move cancel of requeue_work into blk_mq_release
	iommu/vt-d: Set intel_iommu_gfx_mapped correctly
	misc: pci_endpoint_test: Fix test_reg_bar to be updated in pci_endpoint_test
	nvme-pci: unquiesce admin queue on shutdown
	ALSA: hda - Register irq handler after the chip initialization
	nvmem: core: fix read buffer in place
	fuse: retrieve: cap requested size to negotiated max_write
	nfsd: allow fh_want_write to be called twice
	vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"
	x86/PCI: Fix PCI IRQ routing table memory leak
	platform/chrome: cros_ec_proto: check for NULL transfer function
	PCI: keystone: Prevent ARM32 specific code to be compiled for ARM64
	soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
	clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
	soc: rockchip: Set the proper PWM for rk3288
	ARM: dts: imx51: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx50: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx53: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx7d: Specify IMX7D_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6ul: Specify IMX6UL_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
	PCI: rpadlpar: Fix leaked device_node references in add/remove paths
	platform/x86: intel_pmc_ipc: adding error handling
	power: supply: max14656: fix potential use-before-alloc
	PCI: rcar: Fix a potential NULL pointer dereference
	PCI: rcar: Fix 64bit MSI message address handling
	video: hgafb: fix potential NULL pointer dereference
	video: imsttfb: fix potential NULL pointer dereferences
	block, bfq: increase idling for weight-raised queues
	PCI: xilinx: Check for __get_free_pages() failure
	gpio: gpio-omap: add check for off wake capable gpios
	dmaengine: idma64: Use actual device for DMA transfers
	pwm: tiehrpwm: Update shadow register for disabling PWMs
	ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa
	pwm: Fix deadlock warning when removing PWM device
	ARM: exynos: Fix undefined instruction during Exynos5422 resume
	usb: typec: fusb302: Check vconn is off when we start toggling
	gpio: vf610: Do not share irq_chip
	percpu: do not search past bitmap when allocating an area
	Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections"
	Revert "drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)"
	drm: don't block fb changes for async plane updates
	ALSA: seq: Cover unsubscribe_port() in list_mutex
	Linux 4.14.126

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-15 15:56:56 +02:00
Cyrill Gorcunov
a84bd98db8 kernel/sys.c: prctl: fix false positive in validate_prctl_map()
[ Upstream commit a9e73998f9d705c94a8dca9687633adc0f24a19a ]

While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long).  These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.

As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL.  From the image dump of a program:

 | "mm_start_code": "0x400000",
 | "mm_end_code": "0x8f5fb4",
 | "mm_start_data": "0xf1bfb0",
 | "mm_end_data": "0xf1bfb0",

Thus we need to change validate_prctl_map from strictly less to less or
equal operator use.

Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
Fixes: f606b77f1a ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:52 +02:00
Greg Kroah-Hartman
fc59235394 Merge 4.14.69 into android-4.14
Changes in 4.14.69
	net: 6lowpan: fix reserved space for single frames
	net: mac802154: tx: expand tailroom if necessary
	9p/net: Fix zero-copy path in the 9p virtio transport
	spi: davinci: fix a NULL pointer dereference
	spi: pxa2xx: Add support for Intel Ice Lake
	spi: spi-fsl-dspi: Fix imprecise abort on VF500 during probe
	spi: cadence: Change usleep_range() to udelay(), for atomic context
	mmc: renesas_sdhi_internal_dmac: fix #define RST_RESERVED_BITS
	readahead: stricter check for bdi io_pages
	block: blk_init_allocated_queue() set q->fq as NULL in the fail case
	block: really disable runtime-pm for blk-mq
	drm/i915/userptr: reject zero user_size
	libertas: fix suspend and resume for SDIO connected cards
	media: Revert "[media] tvp5150: fix pad format frame height"
	mailbox: xgene-slimpro: Fix potential NULL pointer dereference
	Replace magic for trusting the secondary keyring with #define
	Fix kexec forbidding kernels signed with keys in the secondary keyring to boot
	powerpc/fadump: handle crash memory ranges array index overflow
	powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
	PCI: Add wrappers for dev_printk()
	powerpc/powernv/pci: Work around races in PCI bridge enabling
	cxl: Fix wrong comparison in cxl_adapter_context_get()
	ib_srpt: Fix a use-after-free in srpt_close_ch()
	RDMA/rxe: Set wqe->status correctly if an unexpected response is received
	9p: fix multiple NULL-pointer-dereferences
	fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
	9p/virtio: fix off-by-one error in sg list bounds check
	net/9p/client.c: version pointer uninitialized
	net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree()
	dm integrity: change 'suspending' variable from bool to int
	dm thin: stop no_space_timeout worker when switching to write-mode
	dm cache metadata: save in-core policy_hint_size to on-disk superblock
	dm cache metadata: set dirty on all cache blocks after a crash
	dm crypt: don't decrease device limits
	uart: fix race between uart_put_char() and uart_shutdown()
	Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()
	iio: sca3000: Fix missing return in switch
	iio: ad9523: Fix displayed phase
	iio: ad9523: Fix return value for ad952x_store()
	extcon: Release locking when sending the notification of connector state
	vmw_balloon: fix inflation of 64-bit GFNs
	vmw_balloon: do not use 2MB without batching
	vmw_balloon: VMCI_DOORBELL_SET does not check status
	vmw_balloon: fix VMCI use when balloon built into kernel
	rtc: omap: fix potential crash on power off
	tracing: Do not call start/stop() functions when tracing_on does not change
	tracing/blktrace: Fix to allow setting same value
	printk/tracing: Do not trace printk_nmi_enter()
	livepatch: Validate module/old func name length
	uprobes: Use synchronize_rcu() not synchronize_sched()
	mfd: hi655x: Fix regmap area declared size for hi655x
	ovl: fix wrong use of impure dir cache in ovl_iterate()
	drivers/block/zram/zram_drv.c: fix bug storing backing_dev
	cpufreq: governor: Avoid accessing invalid governor_data
	PM / sleep: wakeup: Fix build error caused by missing SRCU support
	KVM: VMX: fixes for vmentry_l1d_flush module parameter
	KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages
	xtensa: limit offsets in __loop_cache_{all,page}
	xtensa: increase ranges in ___invalidate_{i,d}cache_all
	block, bfq: return nbytes and not zero from struct cftype .write() method
	pnfs/blocklayout: off by one in bl_map_stripe()
	NFSv4 client live hangs after live data migration recovery
	NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
	NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
	ARM: tegra: Fix Tegra30 Cardhu PCA954x reset
	mm/tlb: Remove tlb_remove_table() non-concurrent condition
	iommu/vt-d: Add definitions for PFSID
	iommu/vt-d: Fix dev iotlb pfsid use
	sys: don't hold uts_sem while accessing userspace memory
	userns: move user access out of the mutex
	ubifs: Fix memory leak in lprobs self-check
	Revert "UBIFS: Fix potential integer overflow in allocation"
	ubifs: Check data node size before truncate
	ubifs: xattr: Don't operate on deleted inodes
	ubifs: Fix synced_i_size calculation for xattr inodes
	pwm: tiehrpwm: Don't use emulation mode bits to control PWM output
	pwm: tiehrpwm: Fix disabling of output of PWMs
	fb: fix lost console when the user unplugs a USB adapter
	udlfb: set optimal write delay
	getxattr: use correct xattr length
	libnvdimm: fix ars_status output length calculation
	bcache: release dc->writeback_lock properly in bch_writeback_thread()
	cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
	perf auxtrace: Fix queue resize
	crypto: vmx - Fix sleep-in-atomic bugs
	crypto: caam - fix DMA mapping direction for RSA forms 2 & 3
	crypto: caam/jr - fix descriptor DMA unmapping
	crypto: caam/qi - fix error path in xts setkey
	fs/quota: Fix spectre gadget in do_quotactl
	arm64: mm: always enable CONFIG_HOLES_IN_ZONE
	Linux 4.14.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-09-10 09:21:07 +02:00
Jann Horn
b692c405a1 sys: don't hold uts_sem while accessing userspace memory
commit 42a0cc3478584d4d63f68f2f5af021ddbea771fa upstream.

Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.

Cc: stable@vger.kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09 19:56:00 +02:00
Greg Kroah-Hartman
503f6fecb8 Merge 4.14.45 into android-4.14
Changes in 4.14.45
	MIPS: c-r4k: Fix data corruption related to cache coherence
	MIPS: ptrace: Expose FIR register through FP regset
	MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
	KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
	affs_lookup(): close a race with affs_remove_link()
	fs: don't scan the inode cache before SB_BORN is set
	aio: fix io_destroy(2) vs. lookup_ioctx() race
	ALSA: timer: Fix pause event notification
	do d_instantiate/unlock_new_inode combinations safely
	mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
	mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
	mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
	libata: Blacklist some Sandisk SSDs for NCQ
	libata: blacklist Micron 500IT SSD with MU01 firmware
	xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
	drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
	arm64: lse: Add early clobbers to some input/output asm operands
	powerpc/64s: Clear PCR on boot
	IB/hfi1: Use after free race condition in send context error path
	IB/umem: Use the correct mm during ib_umem_release
	sr: pass down correctly sized SCSI sense buffer
	idr: fix invalid ptr dereference on item delete
	Revert "ipc/shm: Fix shmat mmap nil-page protection"
	ipc/shm: fix shmat() nil address after round-down when remapping
	mm/kasan: don't vfree() nonexistent vm_area
	kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
	kasan: fix memory hotplug during boot
	kernel/sys.c: fix potential Spectre v1 issue
	KVM/VMX: Expose SSBD properly to guests
	KVM: s390: vsie: fix < 8k check for the itdba
	KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
	kvm: x86: IA32_ARCH_CAPABILITIES is always supported
	x86/kvm: fix LAPIC timer drift when guest uses periodic mode
	powerpc/64s: Improve RFI L1-D cache flush fallback
	powerpc/pseries: Support firmware disable of RFI flush
	powerpc/powernv: Support firmware disable of RFI flush
	powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs code
	powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again
	powerpc/rfi-flush: Always enable fallback flush on pseries
	powerpc/rfi-flush: Differentiate enabled and patched flush types
	powerpc/rfi-flush: Call setup_rfi_flush() after LPM migration
	powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags
	powerpc: Add security feature flags for Spectre/Meltdown
	powerpc/pseries: Set or clear security feature flags
	powerpc/powernv: Set or clear security feature flags
	powerpc/64s: Move cpu_show_meltdown()
	powerpc/64s: Enhance the information in cpu_show_meltdown()
	powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()
	powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()
	powerpc/64s: Wire up cpu_show_spectre_v1()
	powerpc/64s: Wire up cpu_show_spectre_v2()
	powerpc/pseries: Fix clearing of security feature flags
	powerpc: Move default security feature flags
	powerpc/pseries: Restore default security feature flags on setup
	powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
	powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit
	MIPS: generic: Fix machine compatible matching
	mac80211: mesh: fix wrong mesh TTL offset calculation
	ARC: Fix malformed ARC_EMUL_UNALIGNED default
	ptr_ring: prevent integer overflow when calculating size
	arm64: dts: rockchip: fix rock64 gmac2io stability issues
	arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
	libata: Fix compile warning with ATA_DEBUG enabled
	selftests: sync: missing CFLAGS while compiling
	selftest/vDSO: fix O=
	selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
	selftests: memfd: add config fragment for fuse
	ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
	ARM: OMAP3: Fix prm wake interrupt for resume
	ARM: OMAP2+: Fix sar_base inititalization for HS omaps
	ARM: OMAP1: clock: Fix debugfs_create_*() usage
	ibmvnic: Wait until reset is complete to set carrier on
	ibmvnic: Free RX socket buffer in case of adapter error
	ibmvnic: Clean RX pool buffers during device close
	tls: retrun the correct IV in getsockopt
	xhci: workaround for AMD Promontory disabled ports wakeup
	IB/uverbs: Fix method merging in uverbs_ioctl_merge
	IB/uverbs: Fix possible oops with duplicate ioctl attributes
	IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy
	arm64: dts: rockchip: Fix DWMMC clocks
	ARM: dts: rockchip: Fix DWMMC clocks
	iwlwifi: mvm: fix security bug in PN checking
	iwlwifi: mvm: fix IBSS for devices that support station type API
	iwlwifi: mvm: always init rs with 20mhz bandwidth rates
	NFC: llcp: Limit size of SDP URI
	rxrpc: Work around usercopy check
	MD: Free bioset when md_run fails
	md: fix md_write_start() deadlock w/o metadata devices
	s390/dasd: fix handling of internal requests
	xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tos
	mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
	mac80211: fix a possible leak of station stats
	mac80211: fix calling sleeping function in atomic context
	cfg80211: clear wep keys after disconnection
	mac80211: Do not disconnect on invalid operating class
	mac80211: Fix sending ADDBA response for an ongoing session
	gpu: ipu-v3: pre: fix device node leak in ipu_pre_lookup_by_phandle
	gpu: ipu-v3: prg: fix device node leak in ipu_prg_lookup_by_phandle
	md raid10: fix NULL deference in handle_write_completed()
	drm/exynos: g2d: use monotonic timestamps
	drm/exynos: fix comparison to bitshift when dealing with a mask
	drm/meson: fix vsync buffer update
	arm64: perf: correct PMUVer probing
	RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails
	RDMA/bnxt_re: Fix system crash during load/unload
	ibmvnic: Check for NULL skb's in NAPI poll routine
	net/mlx5e: Return error if prio is specified when offloading eswitch vlan push
	locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
	md: raid5: avoid string overflow warning
	virtio_net: fix XDP code path in receive_small()
	kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
	bug.h: work around GCC PR82365 in BUG()
	selftests/memfd: add run_fuse_test.sh to TEST_FILES
	seccomp: add a selftest for get_metadata
	soc: imx: gpc: de-register power domains only if initialized
	powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
	s390/cio: fix ccw_device_start_timeout API
	s390/cio: fix return code after missing interrupt
	s390/cio: clear timer when terminating driver I/O
	selftests/bpf/test_maps: exit child process without error in ENOMEM case
	PKCS#7: fix direct verification of SignerInfo signature
	arm64: dts: cavium: fix PCI bus dtc warnings
	nfs: system crashes after NFS4ERR_MOVED recovery
	ARM: OMAP: Fix dmtimer init for omap1
	smsc75xx: fix smsc75xx_set_features()
	regulatory: add NUL to request alpha2
	integrity/security: fix digsig.c build error with header file
	x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
	locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
	x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
	mac80211: drop frames with unexpected DS bits from fast-rx to slow path
	arm64: fix unwind_frame() for filtered out fn for function graph tracing
	macvlan: fix use-after-free in macvlan_common_newlink()
	KVM: nVMX: Don't halt vcpu when L1 is injecting events to L2
	kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
	ARM: dts: imx6dl: Include correct dtsi file for Engicam i.CoreM6 DualLite/Solo RQS
	fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
	fs: dcache: Use READ_ONCE when accessing i_dir_seq
	md: fix a potential deadlock of raid5/raid10 reshape
	md/raid1: fix NULL pointer dereference
	batman-adv: fix packet checksum in receive path
	batman-adv: invalidate checksum on fragment reassembly
	netfilter: ipt_CLUSTERIP: put config struct if we can't increment ct refcount
	netfilter: ipt_CLUSTERIP: put config instead of freeing it
	netfilter: ebtables: convert BUG_ONs to WARN_ONs
	batman-adv: Ignore invalid batadv_iv_gw during netlink send
	batman-adv: Ignore invalid batadv_v_gw during netlink send
	batman-adv: Fix netlink dumping of BLA claims
	batman-adv: Fix netlink dumping of BLA backbones
	nvme-pci: Fix nvme queue cleanup if IRQ setup fails
	clocksource/drivers/fsl_ftm_timer: Fix error return checking
	libceph, ceph: avoid memory leak when specifying same option several times
	ceph: fix dentry leak when failing to init debugfs
	xen/pvcalls: fix null pointer dereference on map->sock
	ARM: orion5x: Revert commit 4904dbda41.
	qrtr: add MODULE_ALIAS macro to smd
	selftests/futex: Fix line continuation in Makefile
	r8152: fix tx packets accounting
	virtio-gpu: fix ioctl and expose the fixed status to userspace.
	dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
	bcache: fix kcrashes with fio in RAID5 backend dev
	ip_gre: fix IFLA_MTU ignored on NEWLINK
	ip6_tunnel: fix IFLA_MTU ignored on NEWLINK
	sit: fix IFLA_MTU ignored on NEWLINK
	nbd: fix return value in error handling path
	ARM: dts: NSP: Fix amount of RAM on BCM958625HR
	ARM: dts: bcm283x: Fix unit address of local_intc
	powerpc/boot: Fix random libfdt related build errors
	clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
	gianfar: Fix Rx byte accounting for ndev stats
	net/tcp/illinois: replace broken algorithm reference link
	nvmet: fix PSDT field check in command format
	net/smc: use link_id of server in confirm link reply
	mlxsw: core: Fix flex keys scratchpad offset conflict
	mlxsw: spectrum: Treat IPv6 unregistered multicast as broadcast
	spectrum: Reference count VLAN entries
	ARC: mcip: halt GFRC counter when ARC cores halt
	ARC: mcip: update MCIP debug mask when the new cpu came online
	ARC: setup cpu possible mask according to possible-cpus dts property
	ipvs: remove IPS_NAT_MASK check to fix passive FTP
	IB/mlx: Set slid to zero in Ethernet completion struct
	RDMA/bnxt_re: Unconditionly fence non wire memory operations
	RDMA/bnxt_re: Fix incorrect DB offset calculation
	RDMA/bnxt_re: Fix the ib_reg failure cleanup
	xen/pirq: fix error path cleanup when binding MSIs
	drm/amd/amdgpu: Correct VRAM width for APUs with GMC9
	xfrm: Fix ESN sequence number handling for IPsec GSO packets.
	arm64: dts: rockchip: Fix rk3399-gru-* s2r (pinctrl hogs, wifi reset)
	drm/sun4i: Fix dclk_set_phase
	btrfs: use kvzalloc to allocate btrfs_fs_info
	Btrfs: send, fix issuing write op when processing hole in no data mode
	Btrfs: fix log replay failure after linking special file and fsync
	ceph: fix potential memory leak in init_caches()
	block: display the correct diskname for bio
	nvme-pci: Fix EEH failure on ppc
	nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors
	selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
	net: ethtool: don't ignore return from driver get_fecparam method
	iwlwifi: mvm: fix TX of CCMP 256
	iwlwifi: mvm: Fix channel switch for count 0 and 1
	iwlwifi: mvm: fix assert 0x2B00 on older FWs
	iwlwifi: avoid collecting firmware dump if not loaded
	iwlwifi: mvm: fix "failed to remove key" message
	iwlwifi: mvm: Direct multicast frames to the correct station
	iwlwifi: mvm: Correctly set the tid for mcast queue
	rds: Incorrect reference counting in TCP socket creation
	watchdog: f71808e_wdt: Fix magic close handling
	watchdog: sbsa: use 32-bit read for WCV
	batman-adv: Fix multicast packet loss with a single WANT_ALL_IPV4/6 flag
	hv_netvsc: use napi_schedule_irqoff
	hv_netvsc: filter multicast/broadcast
	hv_netvsc: propagate rx filters to VF
	ARM: dts: rockchip: Add missing #sound-dai-cells on rk3288
	perf record: Fix crash in pipe mode
	e1000e: Fix check_for_link return value with autoneg off
	e1000e: allocate ring descriptors with dma_zalloc_coherent
	ia64/err-inject: Use get_user_pages_fast()
	RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
	RDMA/qedr: Fix iWARP write and send with immediate
	IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
	IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
	IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
	fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
	fsl/fman: avoid sleeping in atomic context while adding an address
	qed: Free RoCE ILT Memory on rmmod qedr
	net: qcom/emac: Use proper free methods during TX
	net: smsc911x: Fix unload crash when link is up
	IB/core: Fix possible crash to access NULL netdev
	cxgb4: do not set needs_free_netdev for mgmt dev's
	xen-blkfront: move negotiate_mq to cover all cases of new VBDs
	xen: xenbus: use put_device() instead of kfree()
	hv_netvsc: fix filter flags
	hv_netvsc: fix locking for rx_mode
	hv_netvsc: fix locking during VF setup
	ARM: davinci: fix the GPIO lookup for omapl138-hawk
	arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
	selftests/vm/run_vmtests: adjust hugetlb size according to nr_cpus
	lib/test_kmod.c: fix limit check on number of test devices created
	dmaengine: mv_xor_v2: Fix clock resource by adding a register clock
	netfilter: ebtables: fix erroneous reject of last rule
	can: m_can: change comparison to bitshift when dealing with a mask
	can: m_can: select pinctrl state in each suspend/resume function
	bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
	workqueue: use put_device() instead of kfree()
	ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
	sunvnet: does not support GSO for sctp
	KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending
	gpu: ipu-v3: prg: avoid possible array underflow
	drm/imx: move arming of the vblank event to atomic_flush
	drm/nouveau/bl: fix backlight regression
	xfrm: fix rcu_read_unlock usage in xfrm_local_error
	iwlwifi: mvm: set the correct tid when we flush the MCAST sta
	iwlwifi: mvm: Correctly set IGTK for AP
	iwlwifi: mvm: fix error checking for multi/broadcast sta
	net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
	vlan: Fix out of order vlan headers with reorder header off
	batman-adv: fix header size check in batadv_dbg_arp()
	net/sched: fix NULL dereference in the error path of tcf_sample_init()
	batman-adv: Fix skbuff rcsum on packet reroute
	vti4: Don't count header length twice on tunnel setup
	ip_tunnel: Clamp MTU to bounds on new link
	vti4: Don't override MTU passed on link creation via IFLA_MTU
	vti6: Fix dev->max_mtu setting
	iwlwifi: mvm: Increase session protection time after CS
	iwlwifi: mvm: clear tx queue id when unreserving aggregation queue
	iwlwifi: mvm: make sure internal station has a valid id
	iwlwifi: mvm: fix array out of bounds reference
	drm/tegra: Shutdown on driver unbind
	perf/cgroup: Fix child event counting bug
	brcmfmac: Fix check for ISO3166 code
	kbuild: make scripts/adjust_autoksyms.sh robust against timestamp races
	RDMA/ucma: Correct option size check using optlen
	RDMA/qedr: fix QP's ack timeout configuration
	RDMA/qedr: Fix rc initialization on CNQ allocation failure
	RDMA/qedr: Fix QP state initialization race
	net/sched: fix idr leak on the error path of tcf_bpf_init()
	net/sched: fix idr leak in the error path of tcf_simp_init()
	net/sched: fix idr leak in the error path of tcf_act_police_init()
	net/sched: fix idr leak in the error path of tcp_pedit_init()
	net/sched: fix idr leak in the error path of __tcf_ipt_init()
	net/sched: fix idr leak in the error path of tcf_skbmod_init()
	net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
	drm/ast: Fixed 1280x800 Display Issue
	mm/mempolicy.c: avoid use uninitialized preferred_node
	mm, thp: do not cause memcg oom for thp
	xfrm: Fix transport mode skb control buffer usage.
	selftests: ftrace: Add probe event argument syntax testcase
	selftests: ftrace: Add a testcase for string type with kprobe_event
	selftests: ftrace: Add a testcase for probepoint
	drm/amdkfd: Fix scratch memory with HWS enabled
	batman-adv: fix multicast-via-unicast transmission with AP isolation
	batman-adv: fix packet loss for broadcasted DHCP packets to a server
	ARM: 8748/1: mm: Define vdso_start, vdso_end as array
	lan78xx: Set ASD in MAC_CR when EEE is enabled.
	net: qmi_wwan: add BroadMobi BM806U 2020:2033
	bonding: fix the err path for dev hwaddr sync in bond_enslave
	net: dsa: mt7530: fix module autoloading for OF platform drivers
	net/mlx5: Make eswitch support to depend on switchdev
	perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
	x86/alternatives: Fixup alternative_call_2
	llc: properly handle dev_queue_xmit() return value
	builddeb: Fix header package regarding dtc source links
	qede: Fix barrier usage after tx doorbell write.
	mm, slab: memcg_link the SLAB's kmem_cache
	mm/page_owner: fix recursion bug after changing skip entries
	mm/vmstat.c: fix vmstat_update() preemption BUG
	mm/kmemleak.c: wait for scan completion before disabling free
	hv_netvsc: enable multicast if necessary
	qede: Do not drop rx-checksum invalidated packets.
	net: Fix untag for vlan packets without ethernet header
	vlan: Fix vlan insertion for packets without ethernet header
	net: mvneta: fix enable of all initialized RXQs
	sh: fix debug trap failure to process signals before return to user
	firmware: dmi_scan: Fix UUID length safety check
	nvme: don't send keep-alives to the discovery controller
	Btrfs: clean up resources during umount after trans is aborted
	Btrfs: fix loss of prealloc extents past i_size after fsync log replay
	x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
	x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
	fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
	swap: divide-by-zero when zero length swap file on ssd
	z3fold: fix memory leak
	sr: get/drop reference to device in revalidate and check_events
	Force log to disk before reading the AGF during a fstrim
	cpufreq: CPPC: Initialize shared perf capabilities of CPUs
	powerpc/fscr: Enable interrupts earlier before calling get_user()
	perf tools: Fix perf builds with clang support
	perf clang: Add support for recent clang versions
	dp83640: Ensure against premature access to PHY registers after reset
	ibmvnic: Zero used TX descriptor counter on reset
	mm/ksm: fix interaction with THP
	mm: fix races between address_space dereference and free in page_evicatable
	mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one()
	Btrfs: bail out on error during replay_dir_deletes
	Btrfs: fix NULL pointer dereference in log_dir_items
	btrfs: Fix possible softlock on single core machines
	IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
	ocfs2/dlm: don't handle migrate lockres if already in shutdown
	powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
	sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
	x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
	KVM: VMX: raise internal error for exception during invalid protected mode state
	lan78xx: Connect phy early
	fscache: Fix hanging wait on page discarded by writeback
	sparc64: Make atomic_xchg() an inline function rather than a macro.
	net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
	net: bgmac: Correctly annotate register space
	powerpc/64s: sreset panic if there is no debugger or crash dump handlers
	btrfs: tests/qgroup: Fix wrong tree backref level
	Btrfs: fix copy_items() return value when logging an inode
	btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
	btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled
	rxrpc: Fix Tx ring annotation after initial Tx failure
	rxrpc: Don't treat call aborts as conn aborts
	xen/acpi: off by one in read_acpi_id()
	drivers: macintosh: rack-meter: really fix bogus memsets
	ACPI: acpi_pad: Fix memory leak in power saving threads
	powerpc/mpic: Check if cpu_possible() in mpic_physmask()
	ieee802154: ca8210: fix uninitialised data read
	ath10k: advertize beacon_int_min_gcd
	iommu/amd: Take into account that alloc_dev_data() may return NULL
	intel_th: Use correct method of finding hub
	m68k: set dma and coherent masks for platform FEC ethernets
	iwlwifi: mvm: check if mac80211_queue is valid in iwl_mvm_disable_txq
	parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
	hwmon: (nct6775) Fix writing pwmX_mode
	powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
	powerpc/perf: Fix kernel address leak via sampling registers
	rsi: fix kernel panic observed on 64bit machine
	tools/thermal: tmon: fix for segfault
	selftests: Print the test we're running to /dev/kmsg
	net/mlx5: Protect from command bit overflow
	watchdog: davinci_wdt: fix error handling in davinci_wdt_probe()
	ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
	nvme-pci: disable APST for Samsung NVMe SSD 960 EVO + ASUS PRIME Z370-A
	ath9k: fix crash in spectral scan
	cxgb4: Setup FW queues before registering netdev
	ima: Fix Kconfig to select TPM 2.0 CRB interface
	ima: Fallback to the builtin hash algorithm
	watchdog: aspeed: Allow configuring for alternate boot
	virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
	arm: dts: socfpga: fix GIC PPI warning
	ext4: don't complain about incorrect features when probing
	drm/vmwgfx: Unpin the screen object backup buffer when not used
	iommu/mediatek: Fix protect memory setting
	cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
	IB/mlx5: Set the default active rate and width to QDR and 4X
	zorro: Set up z->dev.dma_mask for the DMA API
	bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
	remoteproc: imx_rproc: Fix an error handling path in 'imx_rproc_probe()'
	dt-bindings: add device tree binding for Allwinner H6 main CCU
	ACPICA: Events: add a return on failure from acpi_hw_register_read
	ACPICA: Fix memory leak on unusual memory leak
	ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
	cxgb4: Fix queue free path of ULD drivers
	i2c: mv64xxx: Apply errata delay only in standard mode
	KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
	perf top: Fix top.call-graph config option reading
	perf stat: Fix core dump when flag T is used
	IB/core: Honor port_num while resolving GID for IB link layer
	drm/amdkfd: add missing include of mm.h
	coresight: Use %px to print pcsr instead of %p
	regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'
	spi: bcm-qspi: fIX some error handling paths
	net/smc: pay attention to MAX_ORDER for CQ entries
	MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
	PCI: Restore config space on runtime resume despite being unbound
	watchdog: dw: RMW the control register
	watchdog: aspeed: Fix translation of reset mode to ctrl register
	ipmi_ssif: Fix kernel panic at msg_done_handler
	drm/meson: Fix some error handling paths in 'meson_drv_bind_master()'
	drm/meson: Fix an un-handled error path in 'meson_drv_bind_master()'
	powerpc: Add missing prototype for arch_irq_work_raise()
	powerpc/powernv/npu: Fix deadlock in mmio_invalidate()
	cxl: Check if PSL data-cache is available before issue flush request
	f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
	f2fs: fix to clear CP_TRIMMED_FLAG
	f2fs: fix to check extent cache in f2fs_drop_extent_tree
	perf/core: Fix installing cgroup events on CPU
	max17042: propagate of_node to power supply device
	perf/core: Fix perf_output_read_group()
	drm/panel: simple: Fix the bus format for the Ontat panel
	hwmon: (pmbus/max8688) Accept negative page register values
	hwmon: (pmbus/adm1275) Accept negative page register values
	perf/x86/intel: Properly save/restore the PMU state in the NMI handler
	cdrom: do not call check_disk_change() inside cdrom_open()
	efi/arm*: Only register page tables when they exist
	perf/x86/intel: Fix large period handling on Broadwell CPUs
	perf/x86/intel: Fix event update for auto-reload
	arm64: dts: qcom: Fix SPI5 config on MSM8996
	soc: qcom: wcnss_ctrl: Fix increment in NV upload
	gfs2: Fix fallocate chunk size
	x86/devicetree: Initialize device tree before using it
	x86/devicetree: Fix device IRQ settings in DT
	phy: rockchip-emmc: retry calpad busy trimming
	ALSA: vmaster: Propagate slave error
	phy: qcom-qmp: Fix phy pipe clock gating
	drm/bridge: sii902x: Retry status read after DDI I2C
	tools: hv: fix compiler warnings about major/target_fname
	block: null_blk: fix 'Invalid parameters' when loading module
	dmaengine: pl330: fix a race condition in case of threaded irqs
	dmaengine: rcar-dmac: Check the done lists in rcar_dmac_chan_get_residue()
	enic: enable rq before updating rq descriptors
	watchdog: asm9260_wdt: fix error handling in asm9260_wdt_probe()
	hwrng: stm32 - add reset during probe
	pinctrl: devicetree: Fix dt_to_map_one_config handling of hogs
	pinctrl: artpec6: dt: add missing pin group uart5nocts
	vfio-ccw: fence off transport mode
	dmaengine: qcom: bam_dma: get num-channels and num-ees from dt
	drm: omapdrm: dss: Move initialization code from component bind to probe
	ARM: dts: dra71-evm: Correct evm_sd regulator max voltage
	drm/amdgpu: disable GFX ring and disable PQ wptr in hw_fini
	drm/amdgpu: adjust timeout for ib_ring_tests(v2)
	net: stmmac: ensure that the device has released ownership before reading data
	net: stmmac: ensure that the MSS desc is the last desc to set the own bit
	cpufreq: Reorder cpufreq_online() error code path
	dpaa_eth: fix SG mapping
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
	udf: Provide saner default for invalid uid / gid
	ixgbe: prevent ptp_rx_hang from running when in FILTER_ALL mode
	sh_eth: fix TSU init on SH7734/R8A7740
	power: supply: ltc2941-battery-gauge: Fix temperature units
	ARM: dts: bcm283x: Fix probing of bcm2835-i2s
	ARM: dts: bcm283x: Fix pin function of JTAG pins
	PCMCIA / PM: Avoid noirq suspend aborts during suspend-to-idle
	audit: return on memory error to avoid null pointer dereference
	net: stmmac: call correct function in stmmac_mac_config_rx_queues_routing()
	rcu: Call touch_nmi_watchdog() while printing stall warnings
	pinctrl: sh-pfc: r8a7796: Fix MOD_SEL register pin assignment for SSI pins group
	dpaa_eth: fix pause capability advertisement logic
	MIPS: Octeon: Fix logging messages with spurious periods after newlines
	drm/rockchip: Respect page offset for PRIME mmap calls
	x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
	perf test: Fix test case inet_pton to accept inlines.
	perf report: Fix wrong jump arrow
	perf tests: Use arch__compare_symbol_names to compare symbols
	perf report: Fix memory corruption in --branch-history mode --branch-history
	perf tests: Fix dwarf unwind for stripped binaries
	selftests/net: fixes psock_fanout eBPF test case
	netlabel: If PF_INET6, check sk_buff ip header version
	drm: rcar-du: lvds: Fix LVDS startup on R-Car Gen3
	drm: rcar-du: lvds: Fix LVDS startup on R-Car Gen2
	ARM: dts: at91: tse850: use the correct compatible for the eeprom
	regmap: Correct comparison in regmap_cached
	i40e: Add delay after EMP reset for firmware to recover
	ARM: dts: imx7d: cl-som-imx7: fix pinctrl_enet
	ARM: dts: porter: Fix HDMI output routing
	regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
	pinctrl: msm: Use dynamic GPIO numbering
	pinctrl: mcp23s08: spi: Fix regmap debugfs entries
	kdb: make "mdr" command repeat
	drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful
	Linux 4.14.45

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-30 13:17:17 +02:00
Gustavo A. R. Silva
058dfcf9c2 kernel/sys.c: fix potential Spectre v1 issue
commit 23d6aef74da86a33fa6bb75f79565e0a16ee97c2 upstream.

`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

  kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
  kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)

Fix this by sanitizing *resource* before using it to index
current->signal->rlim

Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:51:50 +02:00
Greg Kroah-Hartman
4c9e0a9b25 Merge 4.14.43 into android-4.14
Changes in 4.14.43
	usbip: usbip_host: refine probe and disconnect debug msgs to be useful
	usbip: usbip_host: delete device from busid_table after rebind
	usbip: usbip_host: run rebind from exit when module is removed
	usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
	usbip: usbip_host: fix bad unlock balance during stub_probe()
	ALSA: usb: mixer: volume quirk for CM102-A+/102S+
	ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
	ALSA: control: fix a redundant-copy issue
	spi: pxa2xx: Allow 64-bit DMA
	spi: bcm-qspi: Avoid setting MSPI_CDRAM_PCS for spi-nor master
	spi: bcm-qspi: Always read and set BSPI_MAST_N_BOOT_CTRL
	KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
	KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
	powerpc: Don't preempt_disable() in show_cpuinfo()
	vfio: ccw: fix cleanup if cp_prefetch fails
	tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
	tee: shm: fix use-after-free via temporarily dropped reference
	netfilter: nf_tables: free set name in error path
	netfilter: nf_tables: can't fail after linking rule into active rule list
	netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6}
	i2c: designware: fix poll-after-enable regression
	powerpc/powernv: Fix NVRAM sleep in invalid context when crashing
	drm: Match sysfs name in link removal to link creation
	lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly
	radix tree: fix multi-order iteration race
	mm: don't allow deferred pages with NEED_PER_CPU_KM
	drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk
	s390/qdio: fix access to uninitialized qdio_q fields
	s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
	s390/qdio: don't release memory in qdio_setup_irq()
	s390: remove indirect branch from do_softirq_own_stack
	x86/pkeys: Override pkey when moving away from PROT_EXEC
	x86/pkeys: Do not special case protection key 0
	efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' definition for mixed mode
	ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstr
	x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
	tick/broadcast: Use for_each_cpu() specially on UP kernels
	ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
	ARM: 8770/1: kprobes: Prohibit probing on optimized_callback
	ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions
	Btrfs: fix xattr loss after power failure
	Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting
	btrfs: property: Set incompat flag if lzo/zstd compression is set
	btrfs: fix crash when trying to resume balance without the resume flag
	btrfs: Split btrfs_del_delalloc_inode into 2 functions
	btrfs: Fix delalloc inodes invalidation during transaction abort
	btrfs: fix reading stale metadata blocks after degraded raid1 mounts
	x86/nospec: Simplify alternative_msr_write()
	x86/bugs: Concentrate bug detection into a separate function
	x86/bugs: Concentrate bug reporting into a separate function
	x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits
	x86/bugs, KVM: Support the combination of guest and host IBRS
	x86/bugs: Expose /sys/../spec_store_bypass
	x86/cpufeatures: Add X86_FEATURE_RDS
	x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation
	x86/bugs/intel: Set proper CPU features and setup RDS
	x86/bugs: Whitelist allowed SPEC_CTRL MSR values
	x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested
	x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest
	x86/speculation: Create spec-ctrl.h to avoid include hell
	prctl: Add speculation control prctls
	x86/process: Allow runtime control of Speculative Store Bypass
	x86/speculation: Add prctl for Speculative Store Bypass mitigation
	nospec: Allow getting/setting on non-current task
	proc: Provide details on speculation flaw mitigations
	seccomp: Enable speculation flaw mitigations
	x86/bugs: Make boot modes __ro_after_init
	prctl: Add force disable speculation
	seccomp: Use PR_SPEC_FORCE_DISABLE
	seccomp: Add filter flag to opt-out of SSB mitigation
	seccomp: Move speculation migitation control to arch code
	x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass
	x86/bugs: Rename _RDS to _SSBD
	proc: Use underscores for SSBD in 'status'
	Documentation/spec_ctrl: Do some minor cleanups
	x86/bugs: Fix __ssb_select_mitigation() return type
	x86/bugs: Make cpu_show_common() static
	x86/bugs: Fix the parameters alignment and missing void
	x86/cpu: Make alternative_msr_write work for 32-bit code
	KVM: SVM: Move spec control call after restore of GS
	x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
	x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
	x86/cpufeatures: Disentangle SSBD enumeration
	x86/cpufeatures: Add FEATURE_ZEN
	x86/speculation: Handle HT correctly on AMD
	x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
	x86/speculation: Add virtualized speculative store bypass disable support
	x86/speculation: Rework speculative_store_bypass_update()
	x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
	x86/bugs: Expose x86_spec_ctrl_base directly
	x86/bugs: Remove x86_spec_ctrl_set()
	x86/bugs: Rework spec_ctrl base and mask logic
	x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
	KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
	x86/bugs: Rename SSBD_NO to SSB_NO
	Linux 4.14.43

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-22 20:17:10 +02:00
Kees Cook
7d1254a148 nospec: Allow getting/setting on non-current task
commit 7bbf1373e228840bb0295a2ca26d548ef37f448e upstream

Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than
current.

This is needed both for /proc/$pid/status queries and for seccomp (since
thread-syncing can trigger seccomp in non-current threads).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-22 18:54:03 +02:00
Thomas Gleixner
33f6a06810 prctl: Add speculation control prctls
commit b617cfc858161140d69cc0b5cc211996b557a1c7 upstream

Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.

PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:

Bit  Define           Description
0    PR_SPEC_PRCTL    Mitigation can be controlled per task by
                      PR_SET_SPECULATION_CTRL
1    PR_SPEC_ENABLE   The speculation feature is enabled, mitigation is
                      disabled
2    PR_SPEC_DISABLE  The speculation feature is disabled, mitigation is
                      enabled

If all bits are 0 the CPU is not affected by the speculation misfeature.

If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.

PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.

The common return values are:

EINVAL  prctl is not implemented by the architecture or the unused prctl()
        arguments are not 0
ENODEV  arg2 is selecting a not supported speculation misfeature

PR_SET_SPECULATION_CTRL has these additional return values:

ERANGE  arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO   prctl control of the selected speculation misfeature is disabled

The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.

Based on an initial patch from Tim Chen and mostly rewritten.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-22 18:54:03 +02:00
Colin Cross
8392add7be ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory.  When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.

This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma.  vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.

Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.

The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas.  If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".

The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen.  This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging.  The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.

Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list.  Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.

Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>

[AmitP: Fix get_user_pages_remote() call to align with upstream commit
        5b56d49fc3 ("mm: add locked parameter to get_user_pages_remote()")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-12-18 21:11:22 +05:30
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Kirill Tkhai
4d28df6152 prctl: Allow local CAP_SYS_ADMIN changing exe_file
During checkpointing and restore of userspace tasks
we bumped into the situation, that it's not possible
to restore the tasks, which user namespace does not
have uid 0 or gid 0 mapped.

People create user namespace mappings like they want,
and there is no a limitation on obligatory uid and gid
"must be mapped". So, if there is no uid 0 or gid 0
in the mapping, it's impossible to restore mm->exe_file
of the processes belonging to this user namespace.

Also, there is no a workaround. It's impossible
to create a temporary uid/gid mapping, because
only one write to /proc/[pid]/uid_map and gid_map
is allowed during a namespace lifetime.
If there is an entry, then no more mapings can't be
written. If there isn't an entry, we can't write
there too, otherwise user task won't be able
to do that in the future.

The patch changes the check, and looks for CAP_SYS_ADMIN
instead of zero uid and gid. This allows to restore
a task independently of its user namespace mappings.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Serge Hallyn <serge@hallyn.com>
CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Michal Hocko <mhocko@suse.com>
CC: Andrei Vagin <avagin@openvz.org>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
CC: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-07-20 07:46:07 -05:00
Al Viro
58c7ffc074 fix a braino in compat_sys_getrlimit()
Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Fixes: commit d9e968cb9f "getrlimit()/setrlimit(): move compat to native"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 09:15:00 -07:00
Michal Hocko
1860033237 mm: make PR_SET_THP_DISABLE immediately active
PR_SET_THP_DISABLE has a rather subtle semantic.  It doesn't affect any
existing mapping because it only updated mm->def_flags which is a
template for new mappings.

The mappings created after prctl(PR_SET_THP_DISABLE) have VM_NOHUGEPAGE
flag set.  This can be quite surprising for all those applications which
do not do prctl(); fork() & exec() and want to control their own THP
behavior.

Another usecase when the immediate semantic of the prctl might be useful
is a combination of pre- and post-copy migration of containers with
CRIU.  In this case CRIU populates a part of a memory region with data
that was saved during the pre-copy stage.  Afterwards, the region is
registered with userfaultfd and CRIU expects to get page faults for the
parts of the region that were not yet populated.  However, khugepaged
collapses the pages and the expected page faults do not occur.

In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
temporary mechanism for enabling/disabling THP process wide.

Implementation wise, a new MMF_DISABLE_THP flag is added.  This flag is
tested when decision whether to use huge pages is taken either during
page fault of at the time of THP collapse.

It should be noted, that the new implementation makes PR_SET_THP_DISABLE
master override to any per-VMA setting, which was not the case
previously.

Fixes: a0715cc226 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
Link: http://lkml.kernel.org/r/1496415802-30944-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
Linus Torvalds
c856863988 Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc compat stuff updates from Al Viro:
 "This part is basically untangling various compat stuff. Compat
  syscalls moved to their native counterparts, getting rid of quite a
  bit of double-copying and/or set_fs() uses. A lot of field-by-field
  copyin/copyout killed off.

   - kernel/compat.c is much closer to containing just the
     copyin/copyout of compat structs. Not all compat syscalls are gone
     from it yet, but it's getting there.

   - ipc/compat_mq.c killed off completely.

   - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to
     drivers/block/floppy.c where they belong. Yes, there are several
     drivers that implement some of the same ioctls. Some are m68k and
     one is 32bit-only pmac. drivers/block/floppy.c is the only one in
     that bunch that can be built on biarch"

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mqueue: move compat syscalls to native ones
  usbdevfs: get rid of field-by-field copyin
  compat_hdio_ioctl: get rid of set_fs()
  take floppy compat ioctls to sodding floppy.c
  ipmi: get rid of field-by-field __get_user()
  ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one
  rt_sigtimedwait(): move compat to native
  select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()
  put_compat_rusage(): switch to copy_to_user()
  sigpending(): move compat to native
  getrlimit()/setrlimit(): move compat to native
  times(2): move compat to native
  compat_{get,put}_bitmap(): use unsafe_{get,put}_user()
  fb_get_fscreeninfo(): don't bother with do_fb_ioctl()
  do_sigaltstack(): lift copying to/from userland into callers
  take compat_sys_old_getrlimit() to native syscall
  trim __ARCH_WANT_SYS_OLD_GETRLIMIT
2017-07-06 20:57:13 -07:00
Al Viro
d9e968cb9f getrlimit()/setrlimit(): move compat to native
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09 23:51:33 -04:00
Al Viro
ca2406ed58 times(2): move compat to native
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09 23:51:28 -04:00
Al Viro
613763a1f0 take compat_sys_old_getrlimit() to native syscall
... and sanitize the ifdefs in there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-27 15:38:06 -04:00
Al Viro
ce72a16fa7 wait4(2)/waitid(2): separate copying rusage to userland
New helpers: kernel_waitid() and kernel_wait4().  sys_waitid(),
sys_wait4() and their compat variants switched to those.  Copying
struct rusage to userland is left to syscall itself.  For
compat_sys_wait4() that eliminates the use of set_fs() completely.
For compat_sys_waitid() it's still needed (for siginfo handling);
that will change shortly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-21 13:11:00 -04:00
Linus Torvalds
e579dde654 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "This is a set of small fixes that were mostly stumbled over during
  more significant development. This proc fix and the fix to
  posix-timers are the most significant of the lot.

  There is a lot of good development going on but unfortunately it
  didn't quite make the merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Fix unbalanced hard link numbers
  signal: Make kill_proc_info static
  rlimit: Properly call security_task_setrlimit
  signal: Remove unused definition of sig_user_definied
  ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET
  ipc: Remove unused declaration of recompute_msgmni
  posix-timers: Correct sanity check in posix_cpu_nsleep
  sysctl: Remove dead register_sysctl_root
2017-05-05 11:08:43 -07:00
Eric W. Biederman
cad4ea546b rlimit: Properly call security_task_setrlimit
Modify do_prlimit to call security_task_setrlimit passing the task
whose rlimit we are changing not the tsk->group_leader.

In general this should not matter as the lsms implementing
security_task_setrlimit apparmor and selinux both examine the
task->cred to see what should be allowed on the destination task.

That task->cred is shared between tasks created with CLONE_THREAD
unless thread keyrings are in play, in which case both apparmor and
selinux create duplicate security contexts.

So the only time when it will matter which thread is passed to
security_task_setrlimit is if one of the threads of a process performs
an operation that changes only it's credentials.  At which point if a
thread has done that we don't want to hide that information from the
lsms.

So fix the call of security_task_setrlimit.  With the removal
of tsk->group_leader this makes the code slightly faster,
more comprehensible and maintainable.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-04-21 16:08:19 -05:00
Stephen Smalley
791ec491c3 prlimit,security,selinux: add a security hook for prlimit
When SELinux was first added to the kernel, a process could only get
and set its own resource limits via getrlimit(2) and setrlimit(2), so no
MAC checks were required for those operations, and thus no security hooks
were defined for them. Later, SELinux introduced a hook for setlimit(2)
with a check if the hard limit was being changed in order to be able to
rely on the hard limit value as a safe reset point upon context
transitions.

Later on, when prlimit(2) was added to the kernel with the ability to get
or set resource limits (hard or soft) of another process, LSM/SELinux was
not updated other than to pass the target process to the setrlimit hook.
This resulted in incomplete control over both getting and setting the
resource limits of another process.

Add a new security_task_prlimit() hook to the check_prlimit_permission()
function to provide complete mediation.  The hook is only called when
acting on another task, and only if the existing DAC/capability checks
would allow access.  Pass flags down to the hook to indicate whether the
prlimit(2) call will read, write, or both read and write the resource
limits of the target process.

The existing security_task_setrlimit() hook is left alone; it continues
to serve a purpose in supporting the ability to make decisions based on
the old and/or new resource limit values when setting limits.  This
is consistent with the DAC/capability logic, where
check_prlimit_permission() performs generic DAC/capability checks for
acting on another task, while do_prlimit() performs a capability check
based on a comparison of the old and new resource limits.  Fix the
inline documentation for the hook to match the code.

Implement the new hook for SELinux.  For setting resource limits, we
reuse the existing setrlimit permission.  Note that this does overload
the setrlimit permission to mean the ability to set the resource limit
(soft or hard) of another process or the ability to change one's own
hard limit.  For getting resource limits, a new getrlimit permission
is defined.  This was not originally defined since getrlimit(2) could
only be used to obtain a process' own limits.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-06 10:43:47 +11:00
Ingo Molnar
32ef5517c2 sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>
Introduce a trivial, mostly empty <linux/sched/cputime.h> header
to prepare for the moving of cputime functionality out of sched.h.

Update all code that relies on these facilities.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:39 +01:00
Ingo Molnar
299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar
03441a3482 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h>
We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/stat.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Ingo Molnar
f7ccbae45c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h>
We are going to split <linux/sched/coredump.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/coredump.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar
6e84f31522 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar
4eb5aaa3af sched/headers: Prepare for new header dependencies before moving code to <linux/sched/autogroup.h>
We are going to split <linux/sched/autogroup.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/autogroup.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar
4f17722c72 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h>
We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which
will have to be picked up from a couple of .c files.

Create a trivial placeholder <linux/sched/topology.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Linus Torvalds
f1ef09fde1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "There is a lot here. A lot of these changes result in subtle user
  visible differences in kernel behavior. I don't expect anything will
  care but I will revert/fix things immediately if any regressions show
  up.

  From Seth Forshee there is a continuation of the work to make the vfs
  ready for unpriviled mounts. We had thought the previous changes
  prevented the creation of files outside of s_user_ns of a filesystem,
  but it turns we missed the O_CREAT path. Ooops.

  Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
  standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
  children that are forked after the prctl are considered and not
  children forked before the prctl. The only known user of this prctl
  systemd forks all children after the prctl. So no userspace
  regressions will occur. Holding earlier forked children to the same
  rules as later forked children creates a semantic that is sane enough
  to allow checkpoing of processes that use this feature.

  There is a long delayed change by Nikolay Borisov to limit inotify
  instances inside a user namespace.

  Michael Kerrisk extends the API for files used to maniuplate
  namespaces with two new trivial ioctls to allow discovery of the
  hierachy and properties of namespaces.

  Konstantin Khlebnikov with the help of Al Viro adds code that when a
  network namespace exits purges it's sysctl entries from the dcache. As
  in some circumstances this could use a lot of memory.

  Vivek Goyal fixed a bug with stacked filesystems where the permissions
  on the wrong inode were being checked.

  I continue previous work on ptracing across exec. Allowing a file to
  be setuid across exec while being ptraced if the tracer has enough
  credentials in the user namespace, and if the process has CAP_SETUID
  in it's own namespace. Proc files for setuid or otherwise undumpable
  executables are now owned by the root in the user namespace of their
  mm. Allowing debugging of setuid applications in containers to work
  better.

  A bug I introduced with permission checking and automount is now
  fixed. The big change is to mark the mounts that the kernel initiates
  as a result of an automount. This allows the permission checks in sget
  to be safely suppressed for this kind of mount. As the permission
  check happened when the original filesystem was mounted.

  Finally a special case in the mount namespace is removed preventing
  unbounded chains in the mount hash table, and making the semantics
  simpler which benefits CRIU.

  The vfs fix along with related work in ima and evm I believe makes us
  ready to finish developing and merge fully unprivileged mounts of the
  fuse filesystem. The cleanups of the mount namespace makes discussing
  how to fix the worst case complexity of umount. The stacked filesystem
  fixes pave the way for adding multiple mappings for the filesystem
  uids so that efficient and safer containers can be implemented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc/sysctl: Don't grab i_lock under sysctl_lock.
  vfs: Use upper filesystem inode in bprm_fill_uid()
  proc/sysctl: prune stale dentries during unregistering
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  prctl: propagate has_child_subreaper flag to every descendant
  introduce the walk_process_tree() helper
  nsfs: Add an ioctl() to return owner UID of a userns
  fs: Better permission checking for submounts
  exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
  vfs: open() with O_CREAT should not create inodes with unknown ids
  nsfs: Add an ioctl() to return the namespace type
  proc: Better ownership of files for non-dumpable tasks in user namespaces
  exec: Remove LSM_UNSAFE_PTRACE_CAP
  exec: Test the ptracer's saved cred to see if the tracee can gain caps
  exec: Don't reset euid and egid when the tracee has CAP_SETUID
  inotify: Convert to using per-namespace limits
2017-02-23 20:33:51 -08:00
Pavel Tikhomirov
749860ce24 prctl: propagate has_child_subreaper flag to every descendant
If process forks some children when it has is_child_subreaper
flag enabled they will inherit has_child_subreaper flag - first
group, when is_child_subreaper is disabled forked children will
not inherit it - second group. So child-subreaper does not reparent
all his descendants when their parents die. Having these two
differently behaving groups can lead to confusion. Also it is
a problem for CRIU, as when we restore process tree we need to
somehow determine which descendants belong to which group and
much harder - to put them exactly to these group.

To simplify these we can add a propagation of has_child_subreaper
flag on PR_SET_CHILD_SUBREAPER, walking all descendants of child-
subreaper to setup has_child_subreaper flag.

In common cases when process like systemd first sets itself to
be a child-subreaper and only after that forks its services, we will
have zero-length list of descendants to walk. Testing with binary
subtree of 2^15 processes prctl took < 0.007 sec and has shown close
to linear dependency(~0.2 * n * usec) on lower numbers of processes.

Moreover, I doubt someone intentionaly pre-forks the children whitch
should reparent to init before becoming subreaper, because some our
ancestor migh have had is_child_subreaper flag while forking our
sub-tree and our childs will all inherit has_child_subreaper flag,
and we have no way to influence it. And only way to check if we have
no has_child_subreaper flag is to create some childs, kill them and
see where they will reparent to.

Using walk_process_tree helper to walk subtree, thanks to Oleg! Timing
seems to be the same.

Optimize:

a) When descendant already has has_child_subreaper flag all his subtree
has it too already.

* for a) to be true need to move has_child_subreaper inheritance under
the same tasklist_lock with adding task to its ->real_parent->children
as without it process can inherit zero has_child_subreaper, then we
set 1 to it's parent flag, check that parent has no more children, and
only after child with wrong flag is added to the tree.

* Also make these inheritance more clear by using real_parent instead of
current, as on clone(CLONE_PARENT) if current has is_child_subreaper
and real_parent has no is_child_subreaper or has_child_subreaper, child
will have has_child_subreaper flag set without actually having a
subreaper in it's ancestors.

b) When some descendant is child_reaper, it's subtree is in different
pidns from us(original child-subreaper) and processes from other pidns
will never reparent to us.

So we can skip their(a,b) subtree from walk.

v2: switch to walk_process_tree() general helper, move
has_child_subreaper inheritance
v3: remove csr_descendant leftover, change current to real_parent
in has_child_subreaper inheritance
v4: small commit message fix

Fixes: ebec18a6d3 ("prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-02-03 16:55:15 +13:00
Frederic Weisbecker
5613fda9a5 sched/cputime: Convert task/group cputime to nsecs
Now that most cputime readers use the transition API which return the
task cputime in old style cputime_t, we can safely store the cputime in
nsecs. This will eventually make cputime statistics less opaque and more
granular. Back and forth convertions between cputime_t and nsecs in order
to deal with cputime_t random granularity won't be needed anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:13:49 +01:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds
e34bac726d Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - various misc bits

 - most of MM (quite a lot of MM material is awaiting the merge of
   linux-next dependencies)

 - kasan

 - printk updates

 - procfs updates

 - MAINTAINERS

 - /lib updates

 - checkpatch updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits)
  init: reduce rootwait polling interval time to 5ms
  binfmt_elf: use vmalloc() for allocation of vma_filesz
  checkpatch: don't emit unified-diff error for rename-only patches
  checkpatch: don't check c99 types like uint8_t under tools
  checkpatch: avoid multiple line dereferences
  checkpatch: don't check .pl files, improve absolute path commit log test
  scripts/checkpatch.pl: fix spelling
  checkpatch: don't try to get maintained status when --no-tree is given
  lib/ida: document locking requirements a bit better
  lib/rbtree.c: fix typo in comment of ____rb_erase_color
  lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
  MAINTAINERS: add drm and drm/i915 irc channels
  MAINTAINERS: add "C:" for URI for chat where developers hang out
  MAINTAINERS: add drm and drm/i915 bug filing info
  MAINTAINERS: add "B:" for URI where to file bugs
  get_maintainer: look for arbitrary letter prefixes in sections
  printk: add Kconfig option to set default console loglevel
  printk/sound: handle more message headers
  printk/btrfs: handle more message headers
  printk/kdb: handle more message headers
  ...
2016-12-12 20:50:02 -08:00
Stanislav Kinsburskiy
3fb4afd9a5 prctl: remove one-shot limitation for changing exe link
This limitation came with the reason to remove "another way for
malicious code to obscure a compromised program and masquerade as a
benign process" by allowing "security-concious program can use this
prctl once during its early initialization to ensure the prctl cannot
later be abused for this purpose":

    http://marc.info/?l=linux-kernel&m=133160684517468&w=2

This explanation doesn't look sufficient.  The only thing "exe" link is
indicating is the file, used to execve, which is basically nothing and
not reliable immediately after process has returned from execve system
call.

Moreover, to use this feture, all the mappings to previous exe file have
to be unmapped and all the new exe file permissions must be satisfied.

Which means, that changing exe link is very similar to calling execve on
the binary.

The need to remove this limitations comes from migration of NFS mount
point, which is not accessible during restore and replaced by other file
system.  Because of this exe link has to be changed twice.

[akpm@linux-foundation.org: fix up comment]
Link: http://lkml.kernel.org/r/20160927153755.9337.69650.stgit@localhost.localdomain
Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:06 -08:00
Nicolas Pitre
baa73d9e47 posix-timers: Make them configurable
Some embedded systems have no use for them.  This removes about
25KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime, setitimer, getitimer, alarm.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:26:35 +01:00
Michal Hocko
17b0573d77 prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
PR_SET_THP_DISABLE requires mmap_sem for write.  If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving.  Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
John Stultz
da8b44d5a9 timer: convert timer_slack_ns from unsigned long to u64
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).

The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems.  It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.

The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.

With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:

$ time sleep 1

real    0m10.747s
user    0m0.001s
sys     0m0.005s

The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s.  Let me
know if it makes sense to break that up more or not.

Other than that things are fairly straightforward.

This patch (of 2):

The timer_slack_ns value in the task struct is currently a unsigned
long.  This means that on 32bit applications, the maximum slack is just
over 4 seconds.  However, on 64bit machines, its much much larger (~500
years).

This disparity could make application development a little (as well as
the default_slack) to a u64.  This means both 32bit and 64bit systems
have the same effective internal slack range.

Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.

This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Mateusz Guzik
ddf1d398e5 prctl: take mmap sem for writing to protect against others
An unprivileged user can trigger an oops on a kernel with
CONFIG_CHECKPOINT_RESTORE.

proc_pid_cmdline_read takes mmap_sem for reading and obtains args + env
start/end values. These get sanity checked as follows:
        BUG_ON(arg_start > arg_end);
        BUG_ON(env_start > env_end);

These can be changed by prctl_set_mm. Turns out also takes the semaphore for
reading, effectively rendering it useless. This results in:

  kernel BUG at fs/proc/base.c:240!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: virtio_net
  CPU: 0 PID: 925 Comm: a.out Not tainted 4.4.0-rc8-next-20160105dupa+ #71
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff880077a68000 ti: ffff8800784d0000 task.ti: ffff8800784d0000
  RIP: proc_pid_cmdline_read+0x520/0x530
  RSP: 0018:ffff8800784d3db8  EFLAGS: 00010206
  RAX: ffff880077c5b6b0 RBX: ffff8800784d3f18 RCX: 0000000000000000
  RDX: 0000000000000002 RSI: 00007f78e8857000 RDI: 0000000000000246
  RBP: ffff8800784d3e40 R08: 0000000000000008 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000050
  R13: 00007f78e8857800 R14: ffff88006fcef000 R15: ffff880077c5b600
  FS:  00007f78e884a740(0000) GS:ffff88007b200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007f78e8361770 CR3: 00000000790a5000 CR4: 00000000000006f0
  Call Trace:
    __vfs_read+0x37/0x100
    vfs_read+0x82/0x130
    SyS_read+0x58/0xd0
    entry_SYSCALL_64_fastpath+0x12/0x76
  Code: 4c 8b 7d a8 eb e9 48 8b 9d 78 ff ff ff 4c 8b 7d 90 48 8b 03 48 39 45 a8 0f 87 f0 fe ff ff e9 d1 fe ff ff 4c 8b 7d 90 eb c6 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
  RIP   proc_pid_cmdline_read+0x520/0x530
  ---[ end trace 97882617ae9c6818 ]---

Turns out there are instances where the code just reads aformentioned
values without locking whatsoever - namely environ_read and get_cmdline.

Interestingly these functions look quite resilient against bogus values,
but I don't believe this should be relied upon.

The first patch gets rid of the oops bug by grabbing mmap_sem for
writing.

The second patch is optional and puts locking around aformentioned
consumers for safety.  Consumers of other fields don't seem to benefit
from similar treatment and are left untouched.

This patch (of 2):

The code was taking the semaphore for reading, which does not protect
against readers nor concurrent modifications.

The problem could cause a sanity checks to fail in procfs's cmdline
reader, resulting in an OOPS.

Note that some functions perform an unlocked read of various mm fields,
but they seem to be fine despite possible modificaton.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anshuman Khandual <anshuman.linux@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Ben Segall
8639b46139 pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
the current pid namespace.  This is in contrast to both the other modes of
setpriority and the example of kill(-1).  Fix this.  getpriority and
ioprio have the same failure mode, fix them too.

Eric said:

: After some more thinking about it this patch sounds justifiable.
:
: My goal with namespaces is not to build perfect isolation mechanisms
: as that can get into ill defined territory, but to build well defined
: mechanisms.  And to handle the corner cases so you can use only
: a single namespace with well defined results.
:
: In this case you have found the two interfaces I am aware of that
: identify processes by uid instead of by pid.  Which quite frankly is
: weird.  Unfortunately the weird unexpected cases are hard to handle
: in the usual way.
:
: I was hoping for a little more information.  Changes like this one we
: have to be careful of because someone might be depending on the current
: behavior.  I don't think they are and I do think this make sense as part
: of the pid namespace.

Signed-off-by: Ben Segall <bsegall@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ambrose Feinstein <ambrose@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00