Commit Graph

468 Commits

Author SHA1 Message Date
backslashxx
244454b1e5 KernelSU: integrate scope-minimized manual hooks -k4.9 v1.5
This refactors original KSU hooks to replace deep kernel function hooks with targeted hooks.
This backports KernelSU pr#1657 and having pr#2084 elements (32-bit sucompat).
It reduces the scope of kernel function interception and still maintains full fucntionality.

This commit is a squash of the following:
*	fs/exec: do_execve: ksu_handle_execve_sucompat hook
*	fs/exec: compat_do_execve: ksu_handle_execve_sucompat hook
	fs/open: sys_faccessat: ksu_handle_faccessat hook
*	fs/read_write: sys_read: ksu_handle_sys_read hook
*	fs/stat: sys_newfstatat: ksu_handle_stat hook
*	fs/stat: sys_fstatat64: ksu_handle_stat hook
*	drivers: input: input_event: ksu_handle_input_handle_event hook

references: KernelSU pr#1657, pr#2084
	https://kernelsu.org/guide/how-to-integrate-for-non-gki.html

Signed-off-by: backslashxx <118538522+backslashxx@users.noreply.github.com>
2025-08-19 19:31:28 +02:00
Greg Kroah-Hartman
d1605fc9c1 Merge 4.9.317 into android-4.9-q
Changes in 4.9.317
	net: af_key: check encryption module availability consistency
	drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
	assoc_array: Fix BUG_ON during garbage collect
	drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()
	block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
	exec: Force single empty string when argv is empty
	dm crypt: make printing of the key constant-time
	dm stats: add cond_resched when looping over entries
	dm verity: set DM_TARGET_IMMUTABLE feature flag
	tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
	NFSD: Fix possible sleep during nfsd4_release_lockowner()
	bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes
	Linux 4.9.317

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaab75e44222ad2a768b4226ae422c39d1d1017c3
2022-06-06 10:59:55 +02:00
Kees Cook
41f6ea5b9a exec: Force single empty string when argv is empty
commit dcd46d897adb70d63e025f175a00a89797d31a43 upstream.

Quoting[1] Ariadne Conill:

"In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[2]:

    The argument arg0 should point to a filename string that is
    associated with the process being started by one of the exec
    functions.
...
Interestingly, Michael Kerrisk opened an issue about this in 2008[3],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[4]
of this bug in a shellcode, we can reconsider.

This issue is being tracked in the KSPP issue tracker[5]."

While the initial code searches[6][7] turned up what appeared to be
mostly corner case tests, trying to that just reject argv == NULL
(or an immediately terminated pointer list) quickly started tripping[8]
existing userspace programs.

The next best approach is forcing a single empty string into argv and
adjusting argc to match. The number of programs depending on argc == 0
seems a smaller set than those calling execve with a NULL argv.

Account for the additional stack space in bprm_stack_limits(). Inject an
empty string when argc == 0 (and set argc = 1). Warn about the case so
userspace has some notice about the change:

    process './argc0' launched './argc0' with NULL argv: empty string added

Additionally WARN() and reject NULL argv usage for kernel threads.

[1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.org/
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=8408
[4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[5] https://github.com/KSPP/linux/issues/176
[6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+*NULL&literal=0
[7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%2C%5Cs*NULL&literal=0
[8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/

Reported-by: Ariadne Conill <ariadne@dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Ariadne Conill <ariadne@dereferenced.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org
[vegard: fixed conflicts due to missing
 886d7de631da71e30909980fdbf318f7caade262^- and
 3950e975431bc914f7e81b8f2a2dbdf2064acb0f^- and
 655c16a8ce9c15842547f40ce23fd148aeccc074]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-06 08:19:46 +02:00
Greg Kroah-Hartman
bcf0e3b543 Merge 4.9.288 into android-4.9-q
Changes in 4.9.288
	ALSA: seq: Fix a potential UAF by wrong private_free call order
	s390: fix strrchr() implementation
	xhci: Enable trust tx length quirk for Fresco FL11 USB controller
	cb710: avoid NULL pointer subtraction
	efi/cper: use stack buffer for error record decoding
	efi: Change down_interruptible() in virt_efi_reset_system() to down_trylock()
	Input: xpad - add support for another USB ID of Nacon GC-100
	USB: serial: qcserial: add EM9191 QDL support
	USB: serial: option: add Telit LE910Cx composition 0x1204
	nvmem: Fix shift-out-of-bound (UBSAN) with byte size cells
	iio: adc128s052: Fix the error handling path of 'adc128_probe()'
	iio: light: opt3001: Fixed timeout error when 0 lux
	iio: ssp_sensors: add more range checking in ssp_parse_dataframe()
	iio: ssp_sensors: fix error code in ssp_print_mcu_debug()
	net: arc: select CRC32
	net: korina: select CRC32
	net: encx24j600: check error in devm_regmap_init_encx24j600
	ethernet: s2io: fix setting mac address during resume
	nfc: fix error handling of nfc_proto_register()
	NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
	NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
	pata_legacy: fix a couple uninitialized variable bugs
	drm/msm: Fix null pointer dereference on pointer edp
	drm/msm/dsi: fix off by one in dsi_bus_clk_enable error handling
	r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256
	xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF
	xtensa: xtfpga: Try software restart before simulating CPU reset
	NFSD: Keep existing listeners on portlist error
	netfilter: ipvs: make global sysctl readonly in non-init netns
	NIOS2: irqflags: rename a redefined register name
	can: rcar_can: fix suspend/resume
	can: peak_usb: pcan_usb_fd_decode_status(): fix back to ERROR_ACTIVE state notification
	can: peak_pci: peak_pci_remove(): fix UAF
	ocfs2: fix data corruption after conversion from inline format
	ocfs2: mount fails with buffer overflow in strlen
	elfcore: correct reference to CONFIG_UML
	vfs: check fd has read access in kernel_read_file_from_fd()
	ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset
	ASoC: DAPM: Fix missing kctl change notifications
	nfc: nci: fix the UAF of rf_conn_info object
	isdn: cpai: check ctr->cnr to avoid array index out of bound
	netfilter: Kconfig: use 'default y' instead of 'm' for bool config option
	ARM: dts: spear3xx: Fix gmac node
	isdn: mISDN: Fix sleeping function called from invalid context
	platform/x86: intel_scu_ipc: Update timeout value in comment
	ALSA: hda: avoid write to STATESTS if controller is in reset
	net: mdiobus: Fix memory leak in __mdiobus_register
	tracing: Have all levels of checks prevent recursion
	ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
	Linux 4.9.288

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8aaac774905fc04e20f969573f54832c3f18ad14
2021-10-27 09:44:25 +02:00
Matthew Wilcox (Oracle)
52ed5a196b vfs: check fd has read access in kernel_read_file_from_fd()
commit 032146cda85566abcd1c4884d9d23e4e30a07e9a upstream.

If we open a file without read access and then pass the fd to a syscall
whose implementation calls kernel_read_file_from_fd(), we get a warning
from __kernel_read():

        if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))

This currently affects both finit_module() and kexec_file_load(), but it
could affect other syscalls in the future.

Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org
Fixes: b844f0ecbc ("vfs: define kernel_copy_file_from_fd()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-27 09:34:00 +02:00
Greg Kroah-Hartman
2a2b02a000 Merge 4.9.255 into android-4.9-q
Changes in 4.9.255
	ACPI: sysfs: Prefer "compatible" modalias
	wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
	net: usb: qmi_wwan: added support for Thales Cinterion PLSx3 modem family
	y2038: futex: Move compat implementation into futex.c
	futex: Move futex exit handling into futex code
	futex: Replace PF_EXITPIDONE with a state
	exit/exec: Seperate mm_release()
	futex: Split futex_mm_release() for exit/exec
	futex: Set task::futex_state to DEAD right after handling futex exit
	futex: Mark the begin of futex exit explicitly
	futex: Sanitize exit state handling
	futex: Provide state handling for exec() as well
	futex: Add mutex around futex exit
	futex: Provide distinct return value when owner is exiting
	futex: Prevent exit livelock
	KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]
	KVM: x86: get smi pending status correctly
	leds: trigger: fix potential deadlock with libata
	mt7601u: fix kernel crash unplugging the device
	mt7601u: fix rx buffer refcounting
	ARM: imx: build suspend-imx6.S with arm instruction set
	netfilter: nft_dynset: add timeout extension to template
	xfrm: Fix oops in xfrm_replay_advance_bmp
	RDMA/cxgb4: Fix the reported max_recv_sge value
	iwlwifi: pcie: use jiffies for memory read spin time limit
	iwlwifi: pcie: reschedule in long-running memory reads
	mac80211: pause TX while changing interface type
	can: dev: prevent potential information leak in can_fill_info()
	iommu/vt-d: Gracefully handle DMAR units with no supported address widths
	iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built
	NFC: fix resource leak when target index is invalid
	NFC: fix possible resource leak
	Linux 4.9.255

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1ead684216d7f27b8209f4d680f40b3619d16e3a
2021-02-03 23:44:54 +01:00
Thomas Gleixner
394ff1207f exit/exec: Seperate mm_release()
commit 4610ba7ad877fafc0a25a30c6c82015304120426 upstream.

mm_release() contains the futex exit handling. mm_release() is called from
do_exit()->exit_mm() and from exec()->exec_mm().

In the exit_mm() case PF_EXITING and the futex state is updated. In the
exec_mm() case these states are not touched.

As the futex exit code needs further protections against exit races, this
needs to be split into two functions.

Preparatory only, no functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20191106224556.240518241@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-03 23:19:49 +01:00
Greg Kroah-Hartman
5791e89d21 Merge 4.9.224 into android-4.9-q
Changes in 4.9.224
	USB: serial: qcserial: Add DW5816e support
	dp83640: reverse arguments to list_add_tail
	fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks
	net: macsec: preserve ingress frame ordering
	net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()
	net: usb: qmi_wwan: add support for DW5816e
	sch_choke: avoid potential panic in choke_reset()
	sch_sfq: validate silly quantum values
	bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features().
	net/mlx5: Fix forced completion access non initialized command entry
	net/mlx5: Fix command entry leak in Internal Error State
	bnxt_en: Improve AER slot reset.
	Revert "ACPI / video: Add force_native quirk for HP Pavilion dv6"
	binfmt_elf: move brk out of mmap when doing direct loader exec
	USB: uas: add quirk for LaCie 2Big Quadra
	USB: serial: garmin_gps: add sanity checking for data length
	tracing: Add a vmalloc_sync_mappings() for safe measure
	mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
	batman-adv: fix batadv_nc_random_weight_tq
	batman-adv: Fix refcnt leak in batadv_show_throughput_override
	batman-adv: Fix refcnt leak in batadv_store_throughput_override
	batman-adv: Fix refcnt leak in batadv_v_ogm_process
	objtool: Fix stack offset tracking for indirect CFAs
	scripts/decodecode: fix trapping instruction formatting
	binfmt_elf: Do not move brk for INTERP-less ET_EXEC
	ext4: add cond_resched() to ext4_protect_reserved_inode
	net: ipv6: add net argument to ip6_dst_lookup_flow
	net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup
	blktrace: Fix potential deadlock between delete & sysfs ops
	blktrace: fix unlocked access to init/start-stop/teardown
	blktrace: fix trace mutex deadlock
	blktrace: Protect q->blk_trace with RCU
	blktrace: fix dereference after null check
	ptp: do not explicitly set drvdata in ptp_clock_register()
	ptp: use is_visible method to hide unused attributes
	ptp: create "pins" together with the rest of attributes
	chardev: add helper function to register char devs with a struct device
	ptp: Fix pass zero to ERR_PTR() in ptp_clock_register
	ptp: fix the race between the release of ptp_clock and cdev
	ptp: free ptp device pin descriptors properly
	shmem: fix possible deadlocks on shmlock_user_lock
	net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'
	net: moxa: Fix a potential double 'free_irq()'
	drop_monitor: work around gcc-10 stringop-overflow warning
	scsi: sg: add sg_remove_request in sg_write
	spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls
	cifs: Check for timeout on Negotiate stage
	cifs: Fix a race condition with cifs_echo_request
	dmaengine: pch_dma.c: Avoid data race between probe and irq handler
	dmaengine: mmp_tdma: Reset channel error on release
	ALSA: hda/hdmi: fix race in monitor detection during probe
	drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
	ipc/util.c: sysvipc_find_ipc() incorrectly updates position index
	pinctrl: cherryview: Add missing spinlock usage in chv_gpio_irq_handler
	i40iw: Fix error handling in i40iw_manage_arp_cache()
	netfilter: conntrack: avoid gcc-10 zero-length-bounds warning
	IB/mlx4: Test return value of calls to ib_get_cached_pkey
	pnp: Use list_for_each_entry() instead of open coding
	gcc-10 warnings: fix low-hanging fruit
	kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
	Stop the ad-hoc games with -Wno-maybe-initialized
	net: phy: micrel: Use strlcpy() for ethtool::get_strings
	gcc-10: avoid shadowing standard library 'free()' in crypto
	gcc-10: disable 'zero-length-bounds' warning for now
	gcc-10: disable 'array-bounds' warning for now
	gcc-10: disable 'stringop-overflow' warning for now
	gcc-10: disable 'restrict' warning for now
	net: fix a potential recursive NETDEV_FEAT_CHANGE
	netlabel: cope with NULL catmap
	Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"
	net: ipv4: really enforce backoff for redirects
	netprio_cgroup: Fix unlimited memory leak of v2 cgroups
	ALSA: hda/realtek - Limit int mic boost for Thinkpad T530
	ALSA: rawmidi: Initialize allocated buffers
	ALSA: rawmidi: Fix racy buffer resize under concurrent accesses
	ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset
	USB: gadget: fix illegal array access in binding with UDC
	usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list
	ARM: dts: imx27-phytec-phycard-s-rdk: Fix the I2C1 pinctrl entries
	x86: Fix early boot crash on gcc-10, third try
	exec: Move would_dump into flush_old_exec
	usb: gadget: net2272: Fix a memory leak in an error handling path in 'net2272_plat_probe()'
	usb: gadget: audio: Fix a missing error return value in audio_bind()
	usb: gadget: legacy: fix error return code in gncm_bind()
	usb: gadget: legacy: fix error return code in cdc_bind()
	Revert "ALSA: hda/realtek: Fix pop noise on ALC225"
	ARM: dts: r8a73a4: Add missing CMT1 interrupts
	ARM: dts: r8a7740: Add missing extal2 to CPG node
	KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
	Makefile: disallow data races on gcc-10 as well
	Linux 4.9.224

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib9fdff8dd9e9c92a2e750251b98eb0e4c9496fba
2020-05-20 11:46:34 +02:00
Eric W. Biederman
72d5fb7f67 exec: Move would_dump into flush_old_exec
commit f87d1c9559164294040e58f5e3b74a162bf7c6e8 upstream.

I goofed when I added mm->user_ns support to would_dump.  I missed the
fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
binfmt_script bprm->file is reassigned.  Which made the move of
would_dump from setup_new_exec to __do_execve_file before exec_binprm
incorrect as it can result in would_dump running on the script instead
of the interpreter of the script.

The net result is that the code stopped making unreadable interpreters
undumpable.  Which allows them to be ptraced and written to disk
without special permissions.  Oops.

The move was necessary because the call in set_new_exec was after
bprm->mm was no longer valid.

To correct this mistake move the misplaced would_dump from
__do_execve_file into flos_old_exec, before exec_mmap is called.

I tested and confirmed that without this fix I can attach with gdb to
a script with an unreadable interpreter, and with this fix I can not.

Cc: stable@vger.kernel.org
Fixes: f84df2a6f268 ("exec: Ensure mm->user_ns contains the execed files")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20 08:15:41 +02:00
Greg Kroah-Hartman
90aa47514b Merge 4.9.220 into android-4.9-q
Changes in 4.9.220
	bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
	net: vxge: fix wrong __VA_ARGS__ usage
	qlcnic: Fix bad kzalloc null test
	i2c: st: fix missing struct parameter description
	irqchip/versatile-fpga: Handle chained IRQs properly
	sched: Avoid scale real weight down to zero
	selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
	libata: Remove extra scsi_host_put() in ata_scsi_add_hosts()
	gfs2: Don't demote a glock until its revokes are written
	x86/boot: Use unsigned comparison for addresses
	locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
	btrfs: remove a BUG_ON() from merge_reloc_roots()
	btrfs: track reloc roots based on their commit root bytenr
	misc: rtsx: set correct pcr_ops for rts522A
	ASoC: fix regwmask
	ASoC: dapm: connect virtual mux with default value
	ASoC: dpcm: allow start or stop during pause for backend
	ASoC: topology: use name_prefix for new kcontrol
	usb: gadget: f_fs: Fix use after free issue as part of queue failure
	usb: gadget: composite: Inform controller driver of self-powered
	ALSA: usb-audio: Add mixer workaround for TRX40 and co
	ALSA: hda: Add driver blacklist
	ALSA: hda: Fix potential access overflow in beep helper
	ALSA: ice1724: Fix invalid access for enumerated ctl items
	ALSA: pcm: oss: Fix regression by buffer overflow fix
	media: ti-vpe: cal: fix disable_irqs to only the intended target
	acpi/x86: ignore unspecified bit positions in the ACPI global lock field
	thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
	KEYS: reaching the keys quotas correctly
	irqchip/versatile-fpga: Apply clear-mask earlier
	MIPS: OCTEON: irq: Fix potential NULL pointer dereference
	ath9k: Handle txpower changes even when TPC is disabled
	signal: Extend exec_id to 64bits
	x86/entry/32: Add missing ASM_CLAC to general_protection entry
	KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
	KVM: s390: vsie: Fix delivery of addressing exceptions
	KVM: x86: Allocate new rmap and large page tracking when moving memslot
	KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
	KVM: VMX: fix crash cleanup when KVM wasn't used
	btrfs: drop block from cache on error in relocation
	crypto: mxs-dcp - fix scatterlist linearization for hash
	ALSA: hda: Initialize power_state field properly
	x86/speculation: Remove redundant arch_smt_update() invocation
	tools: gpio: Fix out-of-tree build regression
	mm: Use fixed constant in page_frag_alloc instead of size + 1
	dm verity fec: fix memory leak in verity_fec_dtr
	scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point
	arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
	rtc: omap: Use define directive for PIN_CONFIG_ACTIVE_HIGH
	ext4: fix a data race at inode->i_blocks
	ocfs2: no need try to truncate file beyond i_size
	s390/diag: fix display of diagnose call statistics
	Input: i8042 - add Acer Aspire 5738z to nomux list
	kmod: make request_module() return an error when autoloading is disabled
	cpufreq: powernv: Fix use-after-free
	hfsplus: fix crash and filesystem corruption when deleting files
	libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
	powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
	Btrfs: fix crash during unmount due to race with delayed inode workers
	drm/dp_mst: Fix clearing payload state on topology disable
	drm: Remove PageReserved manipulation from drm_pci_alloc
	ipmi: fix hung processes in __get_guid()
	powerpc/fsl_booke: Avoid creating duplicate tlb1 entry
	misc: echo: Remove unnecessary parentheses and simplify check for zero
	mfd: dln2: Fix sanity checking for endpoints
	hsr: check protocol version in hsr_newlink()
	net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin
	net: qrtr: send msgs from local of same id as broadcast
	net: ipv6: do not consider routes via gateways for anycast address check
	scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic
	jbd2: improve comments about freeing data buffers whose page mapping is NULL
	ext4: fix incorrect group count in ext4_fill_super error message
	ext4: fix incorrect inodes per group in error message
	ASoC: Intel: mrfld: fix incorrect check on p->sink
	ASoC: Intel: mrfld: return error codes when an error occurs
	ALSA: usb-audio: Don't override ignore_ctl_error value from the map
	btrfs: check commit root generation in should_ignore_root
	mac80211_hwsim: Use kstrndup() in place of kasprintf()
	ext4: do not zeroout extents beyond i_disksize
	dm flakey: check for null arg_name in parse_features()
	kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
	scsi: target: remove boilerplate code
	scsi: target: fix hang when multiple threads try to destroy the same iscsi session
	tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
	objtool: Fix switch table detection in .text.unlikely
	scsi: sg: add sg_remove_request in sg_common_write
	ALSA: hda: Don't release card at firmware loading error
	video: fbdev: sis: Remove unnecessary parentheses and commented code
	drm: NULL pointer dereference [null-pointer-deref] (CWE 476) problem
	Revert "gpio: set up initial state from .get_direction()"
	wil6210: increase firmware ready timeout
	wil6210: fix temperature debugfs
	scsi: ufs: make sure all interrupts are processed
	scsi: ufs: ufs-qcom: remove broken hci version quirk
	wil6210: rate limit wil_rx_refill error
	rtc: pm8xxx: Fix issue in RTC write path
	wil6210: fix length check in __wmi_send
	soc: qcom: smem: Use le32_to_cpu for comparison
	of: fix missing kobject init for !SYSFS && OF_DYNAMIC config
	arm64: cpu_errata: include required headers
	of: unittest: kmemleak in of_unittest_platform_populate()
	clk: at91: usb: continue if clk_hw_round_rate() return zero
	power: supply: bq27xxx_battery: Silence deferred-probe error
	clk: tegra: Fix Tegra PMC clock out parents
	NFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails
	s390/cpuinfo: fix wrong output when CPU0 is offline
	powerpc/maple: Fix declaration made after definition
	ext4: do not commit super on read-only bdev
	percpu_counter: fix a data race at vm_committed_as
	compiler.h: fix error in BUILD_BUG_ON() reporting
	KVM: s390: vsie: Fix possible race when shadowing region 3 tables
	NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
	ext2: fix empty body warnings when -Wextra is used
	ext2: fix debug reference to ext2_xattr_cache
	libnvdimm: Out of bounds read in __nd_ioctl()
	iommu/amd: Fix the configuration of GCR3 table root pointer
	fbdev: potential information leak in do_fb_ioctl()
	tty: evh_bytechan: Fix out of bounds accesses
	locktorture: Print ratio of acquisitions, not failures
	mtd: lpddr: Fix a double free in probe()
	mtd: phram: fix a double free issue in error path
	x86/CPU: Add native CPUID variants returning a single datum
	x86/microcode/intel: replace sync_core() with native_cpuid_reg(eax)
	x86/vdso: Fix lsl operand order
	Linux 4.9.220

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I130bead53d151b84c03bac575c0f3760e14538a6
2020-04-24 08:34:09 +02:00
Eric W. Biederman
110012a2c9 signal: Extend exec_id to 64bits
commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream.

Replace the 32bit exec_id with a 64bit exec_id to make it impossible
to wrap the exec_id counter.  With care an attacker can cause exec_id
wrap and send arbitrary signals to a newly exec'd parent.  This
bypasses the signal sending checks if the parent changes their
credentials during exec.

The severity of this problem can been seen that in my limited testing
of a 32bit exec_id it can take as little as 19s to exec 65536 times.
Which means that it can take as little as 14 days to wrap a 32bit
exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
days.  Even my slower timing is in the uptime of a typical server.
Which means self_exec_id is simply a speed bump today, and if exec
gets noticably faster self_exec_id won't even be a speed bump.

Extending self_exec_id to 64bits introduces a problem on 32bit
architectures where reading self_exec_id is no longer atomic and can
take two read instructions.  Which means that is is possible to hit
a window where the read value of exec_id does not match the written
value.  So with very lucky timing after this change this still
remains expoiltable.

I have updated the update of exec_id on exec to use WRITE_ONCE
and the read of exec_id in do_notify_parent to use READ_ONCE
to make it clear that there is no locking between these two
locations.

Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
Fixes: 2.3.23pre2
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24 07:58:54 +02:00
Greg Kroah-Hartman
0eb90dd8f7 Merge 4.9.187 into android-4.9-q
Changes in 4.9.187
	MIPS: ath79: fix ar933x uart parity mode
	MIPS: fix build on non-linux hosts
	arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly
	dmaengine: imx-sdma: fix use-after-free on probe error path
	ath10k: Do not send probe response template for mesh
	ath9k: Check for errors when reading SREV register
	ath6kl: add some bounds checking
	ath: DFS JP domain W56 fixed pulse type 3 RADAR detection
	batman-adv: fix for leaked TVLV handler.
	media: dvb: usb: fix use after free in dvb_usb_device_exit
	crypto: talitos - fix skcipher failure due to wrong output IV
	media: marvell-ccic: fix DMA s/g desc number calculation
	media: vpss: fix a potential NULL pointer dereference
	media: media_device_enum_links32: clean a reserved field
	net: stmmac: dwmac1000: Clear unused address entries
	net: stmmac: dwmac4/5: Clear unused address entries
	signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig
	af_key: fix leaks in key_pol_get_resp and dump_sp.
	xfrm: Fix xfrm sel prefix length validation
	media: mc-device.c: don't memset __user pointer contents
	media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails.
	net: phy: Check against net_device being NULL
	crypto: talitos - properly handle split ICV.
	crypto: talitos - Align SEC1 accesses to 32 bits boundaries.
	tua6100: Avoid build warnings.
	locking/lockdep: Fix merging of hlocks with non-zero references
	media: wl128x: Fix some error handling in fm_v4l2_init_video_device()
	cpupower : frequency-set -r option misses the last cpu in related cpu list
	net: fec: Do not use netdev messages too early
	net: axienet: Fix race condition causing TX hang
	s390/qdio: handle PENDING state for QEBSM devices
	perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode
	perf test 6: Fix missing kvm module load for s390
	gpio: omap: fix lack of irqstatus_raw0 for OMAP4
	gpio: omap: ensure irq is enabled before wakeup
	regmap: fix bulk writes on paged registers
	bpf: silence warning messages in core
	rcu: Force inlining of rcu_read_lock()
	blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
	xfrm: fix sa selector validation
	perf evsel: Make perf_evsel__name() accept a NULL argument
	vhost_net: disable zerocopy by default
	ipoib: correcly show a VF hardware address
	EDAC/sysfs: Fix memory leak when creating a csrow object
	ipsec: select crypto ciphers for xfrm_algo
	media: i2c: fix warning same module names
	ntp: Limit TAI-UTC offset
	timer_list: Guard procfs specific code
	acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
	media: coda: fix mpeg2 sequence number handling
	media: coda: increment sequence offset for the last returned frame
	mt7601u: do not schedule rx_tasklet when the device has been disconnected
	x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
	mt7601u: fix possible memory leak when the device is disconnected
	ath10k: fix PCIE device wake up failed
	perf tools: Increase MAX_NR_CPUS and MAX_CACHES
	libata: don't request sense data on !ZAC ATA devices
	clocksource/drivers/exynos_mct: Increase priority over ARM arch timer
	rslib: Fix decoding of shortened codes
	rslib: Fix handling of of caller provided syndrome
	ixgbe: Check DDM existence in transceiver before access
	crypto: asymmetric_keys - select CRYPTO_HASH where needed
	EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec
	bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush()
	iwlwifi: mvm: Drop large non sta frames
	net: usb: asix: init MAC address buffers
	gpiolib: Fix references to gpiod_[gs]et_*value_cansleep() variants
	Bluetooth: hci_bcsp: Fix memory leak in rx_skb
	Bluetooth: 6lowpan: search for destination address in all peers
	Bluetooth: Check state in l2cap_disconnect_rsp
	Bluetooth: validate BLE connection interval updates
	gtp: fix Illegal context switch in RCU read-side critical section.
	gtp: fix use-after-free in gtp_newlink()
	xen: let alloc_xenballooned_pages() fail if not enough memory free
	scsi: NCR5380: Reduce goto statements in NCR5380_select()
	scsi: NCR5380: Always re-enable reselection interrupt
	scsi: mac_scsi: Increase PIO/PDMA transfer length threshold
	crypto: ghash - fix unaligned memory access in ghash_setkey()
	crypto: arm64/sha1-ce - correct digest for empty data in finup
	crypto: arm64/sha2-ce - correct digest for empty data in finup
	crypto: chacha20poly1305 - fix atomic sleep when using async algorithm
	crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe
	Input: gtco - bounds check collection indent level
	regulator: s2mps11: Fix buck7 and buck8 wrong voltages
	arm64: tegra: Update Jetson TX1 GPU regulator timings
	iwlwifi: pcie: don't service an interrupt that was masked
	tracing/snapshot: Resize spare buffer if size changed
	NFSv4: Handle the special Linux file open access mode
	lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE
	ALSA: seq: Break too long mutex context in the write loop
	ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine
	media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom()
	media: coda: Remove unbalanced and unneeded mutex unlock
	KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
	arm64: tegra: Fix AGIC register range
	fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
	drm/nouveau/i2c: Enable i2c pads & busses during preinit
	padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
	9p/virtio: Add cleanup path in p9_virtio_init
	PCI: Do not poll for PME if the device is in D3cold
	Btrfs: add missing inode version, ctime and mtime updates when punching hole
	libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
	take floppy compat ioctls to sodding floppy.c
	floppy: fix div-by-zero in setup_format_params
	floppy: fix out-of-bounds read in next_valid_format
	floppy: fix invalid pointer dereference in drive_name
	floppy: fix out-of-bounds read in copy_buffer
	coda: pass the host file in vma->vm_file on mmap
	gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM
	crypto: ccp - Validate the the error value used to index error messages
	PCI: hv: Delete the device earlier from hbus->children for hot-remove
	PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
	crypto: caam - limit output IV to CBC to work around CTR mode DMA issue
	um: Allow building and running on older hosts
	um: Fix FP register size for XSTATE/XSAVE
	parisc: Ensure userspace privilege for ptraced processes in regset functions
	parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1
	powerpc/32s: fix suspend/resume when IBATs 4-7 are used
	powerpc/watchpoint: Restore NV GPRs while returning from exception
	eCryptfs: fix a couple type promotion bugs
	intel_th: msu: Fix single mode with disabled IOMMU
	Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug
	usb: Handle USB3 remote wakeup for LPM enabled devices correctly
	dm bufio: fix deadlock with loop device
	compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
	compiler.h: Add read_word_at_a_time() function.
	lib/strscpy: Shut up KASAN false-positives in strscpy()
	ext4: allow directory holes
	bnx2x: Prevent load reordering in tx completion processing
	bnx2x: Prevent ptp_task to be rescheduled indefinitely
	caif-hsi: fix possible deadlock in cfhsi_exit_module()
	igmp: fix memory leak in igmpv3_del_delrec()
	ipv4: don't set IPv6 only flags to IPv4 addresses
	net: bcmgenet: use promisc for unsupported filters
	net: dsa: mv88e6xxx: wait after reset deactivation
	net: neigh: fix multiple neigh timer scheduling
	net: openvswitch: fix csum updates for MPLS actions
	nfc: fix potential illegal memory access
	rxrpc: Fix send on a connected, but unbound socket
	sky2: Disable MSI on ASUS P6T
	vrf: make sure skb->data contains ip header to make routing
	macsec: fix use-after-free of skb during RX
	macsec: fix checksumming after decryption
	netrom: fix a memory leak in nr_rx_frame()
	netrom: hold sock when setting skb->destructor
	bonding: validate ip header before check IPPROTO_IGMP
	tcp: Reset bytes_acked and bytes_received when disconnecting
	net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
	net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
	net: bridge: stp: don't cache eth dest pointer before skb pull
	perf/x86/amd/uncore: Rename 'L2' to 'LLC'
	perf/x86/amd/uncore: Get correct number of cores sharing last level cache
	perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
	NFSv4: Fix open create exclusive when the server reboots
	nfsd: increase DRC cache limit
	nfsd: give out fewer session slots as limit approaches
	nfsd: fix performance-limiting session calculation
	nfsd: Fix overflow causing non-working mounts on 1 TB machines
	drm/panel: simple: Fix panel_simple_dsi_probe
	usb: core: hub: Disable hub-initiated U1/U2
	tty: max310x: Fix invalid baudrate divisors calculator
	pinctrl: rockchip: fix leaked of_node references
	tty: serial: cpm_uart - fix init when SMC is relocated
	drm/bridge: tc358767: read display_props in get_modes()
	drm/bridge: sii902x: pixel clock unit is 10kHz instead of 1kHz
	memstick: Fix error cleanup path of memstick_init
	tty/serial: digicolor: Fix digicolor-usart already registered warning
	tty: serial: msm_serial: avoid system lockup condition
	serial: 8250: Fix TX interrupt handling condition
	drm/virtio: Add memory barriers for capset cache.
	phy: renesas: rcar-gen2: Fix memory leak at error paths
	drm/rockchip: Properly adjust to a true clock in adjusted_mode
	tty: serial_core: Set port active bit in uart_port_activate
	usb: gadget: Zero ffs_io_data
	powerpc/pci/of: Fix OF flags parsing for 64bit BARs
	PCI: sysfs: Ignore lockdep for remove attribute
	kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS
	PCI: xilinx-nwl: Fix Multi MSI data programming
	iio: iio-utils: Fix possible incorrect mask calculation
	recordmcount: Fix spurious mcount entries on powerpc
	mfd: core: Set fwnode for created devices
	mfd: arizona: Fix undefined behavior
	mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk
	um: Silence lockdep complaint about mmap_sem
	powerpc/4xx/uic: clear pending interrupt after irq type/pol change
	RDMA/i40iw: Set queue pair state when being queried
	serial: sh-sci: Terminate TX DMA during buffer flushing
	serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
	kallsyms: exclude kasan local symbols on s390
	perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning
	RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
	powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
	f2fs: avoid out-of-range memory access
	mailbox: handle failed named mailbox channel request
	powerpc/eeh: Handle hugepages in ioremap space
	sh: prevent warnings when using iounmap
	mm/kmemleak.c: fix check for softirq context
	9p: pass the correct prototype to read_cache_page
	mm/mmu_notifier: use hlist_add_head_rcu()
	locking/lockdep: Fix lock used or unused stats error
	locking/lockdep: Hide unused 'class' variable
	usb: wusbcore: fix unbalanced get/put cluster_id
	usb: pci-quirks: Correct AMD PLL quirk detection
	x86/sysfb_efi: Add quirks for some devices with swapped width and height
	x86/speculation/mds: Apply more accurate check on hypervisor platform
	hpet: Fix division by zero in hpet_time_div()
	ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1
	ALSA: hda - Add a conexant codec entry to let mute led work
	powerpc/tm: Fix oops on sigreturn on systems without TM
	access: avoid the RCU grace period for the temporary subjective credentials
	ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt
	tcp: reset sk_send_head in tcp_write_queue_purge
	arm64: dts: marvell: Fix A37xx UART0 register size
	i2c: qup: fixed releasing dma without flush operation completion
	arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
	ISDN: hfcsusb: checking idx of ep configuration
	media: au0828: fix null dereference in error path
	media: cpia2_usb: first wake up, then free in disconnect
	media: radio-raremono: change devm_k*alloc to k*alloc
	Bluetooth: hci_uart: check for missing tty operations
	sched/fair: Don't free p->numa_faults with concurrent readers
	drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
	ceph: hold i_ceph_lock when removing caps for freeing inode
	Linux 4.9.187

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-08-04 09:50:32 +02:00
Jann Horn
837ffc9723 sched/fair: Don't free p->numa_faults with concurrent readers
commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:33:45 +02:00
Greg Kroah-Hartman
429c144a84 Merge 4.9.163 into android-4.9
Changes in 4.9.163
	USB: serial: option: add Telit ME910 ECM composition
	USB: serial: cp210x: add ID for Ingenico 3070
	USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
	cpufreq: Use struct kobj_attribute instead of struct global_attr
	ncpfs: fix build warning of strncpy
	isdn: isdn_tty: fix build warning of strncpy
	staging: comedi: ni_660x: fix missing break in switch statement
	staging: wilc1000: fix to set correct value for 'vif_num'
	staging: android: ion: fix sys heap pool's gfp_flags
	ip6mr: Do not call __IP6_INC_STATS() from preemptible context
	net-sysfs: Fix mem leak in netdev_register_kobject
	sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
	team: Free BPF filter when unregistering netdev
	bnxt_en: Drop oversize TX packets to prevent errors.
	hv_netvsc: Fix IP header checksum for coalesced packets
	net: dsa: mv88e6xxx: Fix u64 statistics
	netlabel: fix out-of-bounds memory accesses
	net: netem: fix skb length BUG_ON in __skb_to_sgvec
	net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
	net: sit: fix memory leak in sit_init_net()
	xen-netback: don't populate the hash cache on XenBus disconnect
	xen-netback: fix occasional leak of grant ref mappings under memory pressure
	net: Add __icmp_send helper.
	net: avoid use IPCB in cipso_v4_error
	tun: fix blocking read
	tun: remove unnecessary memory barrier
	net: phy: Micrel KSZ8061: link failure after cable connect
	x86/CPU/AMD: Set the CPB bit unconditionally on F17h
	applicom: Fix potential Spectre v1 vulnerabilities
	MIPS: irq: Allocate accurate order pages for irq stack
	hugetlbfs: fix races and page leaks during migration
	exec: Fix mem leak in kernel_read_file
	media: uvcvideo: Fix 'type' check leading to overflow
	vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
	perf core: Fix perf_proc_update_handler() bug
	perf tools: Handle TOPOLOGY headers with no CPU
	IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
	iommu/amd: Call free_iova_fast with pfn in map_sg
	iommu/amd: Unmap all mapped pages in error path of map_sg
	ipvs: Fix signed integer overflow when setsockopt timeout
	iommu/amd: Fix IOMMU page flush when detach device from a domain
	xtensa: SMP: fix ccount_timer_shutdown
	xtensa: SMP: fix secondary CPU initialization
	xtensa: smp_lx200_defconfig: fix vectors clash
	xtensa: SMP: mark each possible CPU as present
	xtensa: SMP: limit number of possible CPUs by NR_CPUS
	net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
	net: hns: Fix for missing of_node_put() after of_parse_phandle()
	net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
	net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
	gpio: vf610: Mask all GPIO interrupts
	nfs: Fix NULL pointer dereference of dev_name
	qed: Fix VF probe failure while FLR
	scsi: libfc: free skb when receiving invalid flogi resp
	platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
	cifs: fix computation for MAX_SMB2_HDR_SIZE
	arm64: kprobe: Always blacklist the KVM world-switch code
	x86/kexec: Don't setup EFI info if EFI runtime is not enabled
	x86_64: increase stack size for KASAN_EXTRA
	mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
	mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
	fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
	autofs: drop dentry reference only when it is never used
	autofs: fix error return in autofs_fill_super()
	soc: fsl: qbman: avoid race in clearing QMan interrupt
	ARM: pxa: ssp: unneeded to free devm_ allocated data
	arm64: dts: add msm8996 compatible to gicv3
	usb: phy: fix link errors
	irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
	drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init
	dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
	vsock/virtio: fix kernel panic after device hot-unplug
	vsock/virtio: reset connected sockets on device removal
	dmaengine: dmatest: Abort test in case of mapping error
	selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET
	selftests: netfilter: add simple masq/redirect test cases
	netfilter: nf_nat: skip nat clash resolution for same-origin entries
	s390/qeth: fix use-after-free in error path
	perf symbols: Filter out hidden symbols from labels
	MIPS: Remove function size check in get_frame_info()
	fs: ratelimit __find_get_block_slow() failure message.
	Input: wacom_serial4 - add support for Wacom ArtPad II tablet
	Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
	iscsi_ibft: Fix missing break in switch statement
	scsi: aacraid: Fix missing break in switch statement
	futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
	ARM: dts: exynos: Fix pinctrl definition for eMMC RTSN line on Odroid X2/U3
	ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
	drm: disable uncached DMA optimization for ARM and arm64
	ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
	ARM: dts: exynos: Do not ignore real-world fuse values for thermal zone 0 on Exynos5420
	perf/x86/intel: Make cpuc allocations consistent
	perf/x86/intel: Generalize dynamic constraint creation
	x86: Add TSX Force Abort CPUID/MSR
	Linux 4.9.163

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-03-13 14:20:38 -07:00
YueHaibing
dd6734e179 exec: Fix mem leak in kernel_read_file
commit f612acfae86af7ecad754ae6a46019be9da05b8e upstream.

syzkaller report this:
BUG: memory leak
unreferenced object 0xffffc9000488d000 (size 9195520):
  comm "syz-executor.0", pid 2752, jiffies 4294787496 (age 18.757s)
  hex dump (first 32 bytes):
    ff ff ff ff ff ff ff ff a8 00 00 00 01 00 00 00  ................
    02 00 00 00 00 00 00 00 80 a1 7a c1 ff ff ff ff  ..........z.....
  backtrace:
    [<000000000863775c>] __vmalloc_node mm/vmalloc.c:1795 [inline]
    [<000000000863775c>] __vmalloc_node_flags mm/vmalloc.c:1809 [inline]
    [<000000000863775c>] vmalloc+0x8c/0xb0 mm/vmalloc.c:1831
    [<000000003f668111>] kernel_read_file+0x58f/0x7d0 fs/exec.c:924
    [<000000002385813f>] kernel_read_file_from_fd+0x49/0x80 fs/exec.c:993
    [<0000000011953ff1>] __do_sys_finit_module+0x13b/0x2a0 kernel/module.c:3895
    [<000000006f58491f>] do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    [<00000000ee78baf4>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<00000000241f889b>] 0xffffffffffffffff

It should goto 'out_free' lable to free allocated buf while kernel_read
fails.

Fixes: 39d637af5a ("vfs: forbid write access when reading a file into memory")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thibaut Sautereau <thibaut@sautereau.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-13 14:04:55 -07:00
Greg Kroah-Hartman
52be322125 Merge 4.9.116 into android-4.9
Changes in 4.9.116
	MIPS: ath79: fix register address in ath79_ddr_wb_flush()
	MIPS: Fix off-by-one in pci_resource_to_user()
	ip: hash fragments consistently
	ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
	net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
	net: skb_segment() should not return NULL
	net/mlx5: Adjust clock overflow work period
	net/mlx5e: Don't allow aRFS for encapsulated packets
	net/mlx5e: Fix quota counting in aRFS expire flow
	multicast: do not restore deleted record source filter mode to new one
	net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv
	rtnetlink: add rtnl_link_state check in rtnl_configure_link
	tcp: fix dctcp delayed ACK schedule
	tcp: helpers to send special DCTCP ack
	tcp: do not cancel delay-AcK on DCTCP special ACK
	tcp: do not delay ACK in DCTCP upon CE status change
	tcp: free batches of packets in tcp_prune_ofo_queue()
	tcp: avoid collapses in tcp_prune_queue() if possible
	tcp: detect malicious patterns in tcp_collapse_ofo_queue()
	tcp: call tcp_drop() from tcp_data_queue_ofo()
	usb: cdc_acm: Add quirk for Castles VEGA3000
	usb: core: handle hub C_PORT_OVER_CURRENT condition
	usb: gadget: f_fs: Only return delayed status when len is 0
	driver core: Partially revert "driver core: correct device's shutdown order"
	can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
	can: xilinx_can: fix power management handling
	can: xilinx_can: fix recovery from error states not being propagated
	can: xilinx_can: fix device dropping off bus on RX overrun
	can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
	can: xilinx_can: fix incorrect clear of non-processed interrupts
	can: xilinx_can: fix RX overflow interrupt not being enabled
	turn off -Wattribute-alias
	exec: avoid gcc-8 warning for get_task_comm
	Linux 4.9.116

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-07-31 21:27:31 +02:00
Arnd Bergmann
b9dd13488a exec: avoid gcc-8 warning for get_task_comm
commit 3756f6401c302617c5e091081ca4d26ab604bec5 upstream.

gcc-8 warns about using strncpy() with the source size as the limit:

  fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]

This is indeed slightly suspicious, as it protects us from source
arguments without NUL-termination, but does not guarantee that the
destination is terminated.

This keeps the strncpy() to ensure we have properly padded target
buffer, but ensures that we use the correct length, by passing the
actual length of the destination buffer as well as adding a build-time
check to ensure it is exactly TASK_COMM_LEN.

There are only 23 callsites which I all reviewed to ensure this is
currently the case.  We could get away with doing only the check or
passing the right length, but it doesn't hurt to do both.

Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28 07:49:14 +02:00
Greg Kroah-Hartman
14accea70e Merge 4.9.39 into android-4.9
Changes in 4.9.39
	xen-netfront: Rework the fix for Rx stall during OOM and network stress
	net_sched: fix error recovery at qdisc creation
	net: sched: Fix one possible panic when no destroy callback
	net/phy: micrel: configure intterupts after autoneg workaround
	ipv6: avoid unregistering inet6_dev for loopback
	net: dp83640: Avoid NULL pointer dereference.
	tcp: reset sk_rx_dst in tcp_disconnect()
	net: prevent sign extension in dev_get_stats()
	bridge: mdb: fix leak on complete_info ptr on fail path
	rocker: move dereference before free
	bpf: prevent leaking pointer via xadd on unpriviledged
	net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
	net/mlx5: Cancel delayed recovery work when unloading the driver
	liquidio: fix bug in soft reset failure detection
	net/mlx5e: Fix TX carrier errors report in get stats ndo
	ipv6: dad: don't remove dynamic addresses if link is down
	vxlan: fix hlist corruption
	net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64
	net: ipv6: Compare lwstate in detecting duplicate nexthops
	vrf: fix bug_on triggered by rx when destroying a vrf
	rds: tcp: use sock_create_lite() to create the accept socket
	brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
	brcmfmac: Fix a memory leak in error handling path in 'brcmf_cfg80211_attach'
	brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain
	sfc: don't read beyond unicast address list
	cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
	cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
	cfg80211: Check if PMKID attribute is of expected size
	cfg80211: Check if NAN service ID is of expected size
	irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
	parisc: Report SIGSEGV instead of SIGBUS when running out of stack
	parisc: use compat_sys_keyctl()
	parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
	parisc/mm: Ensure IRQs are off in switch_mm()
	tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
	thp, mm: fix crash due race in MADV_FREE handling
	kernel/extable.c: mark core_kernel_text notrace
	mm/list_lru.c: fix list_lru_count_node() to be race free
	fs/dcache.c: fix spin lockup issue on nlru->lock
	checkpatch: silence perl 5.26.0 unescaped left brace warnings
	binfmt_elf: use ELF_ET_DYN_BASE only for PIE
	arm: move ELF_ET_DYN_BASE to 4MB
	arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
	powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
	s390: reduce ELF_ET_DYN_BASE
	exec: Limit arg stack to at most 75% of _STK_LIM
	ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers
	vt: fix unchecked __put_user() in tioclinux ioctls
	rcu: Add memory barriers for NOCB leader wakeup
	nvmem: core: fix leaks on registration errors
	mnt: In umount propagation reparent in a separate pass
	mnt: In propgate_umount handle visiting mounts in any order
	mnt: Make propagate_umount less slow for overlapping mount propagation trees
	selftests/capabilities: Fix the test_execve test
	mm: fix overflow check in expand_upwards()
	crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
	crypto: atmel - only treat EBUSY as transient if backlog
	crypto: sha1-ssse3 - Disable avx2
	crypto: caam - properly set IV after {en,de}crypt
	crypto: caam - fix signals handling
	Revert "sched/core: Optimize SCHED_SMT"
	sched/fair, cpumask: Export for_each_cpu_wrap()
	sched/topology: Fix building of overlapping sched-groups
	sched/topology: Optimize build_group_mask()
	sched/topology: Fix overlapping sched_group_mask
	PM / wakeirq: Convert to SRCU
	PM / QoS: return -EINVAL for bogus strings
	tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
	kvm: vmx: Do not disable intercepts for BNDCFGS
	kvm: x86: Guest BNDCFGS requires guest MPX support
	kvm: vmx: Check value written to IA32_BNDCFGS
	kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
	4.9.39

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-21 08:55:50 +02:00
Kees Cook
f31c4f65dd exec: Limit arg stack to at most 75% of _STK_LIM
commit da029c11e6b12f321f36dac8771e833b65cec962 upstream.

To avoid pathological stack usage or the need to special-case setuid
execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:42:22 +02:00
Greg Kroah-Hartman
75d78c7eda Merge 4.9.35 into android-4.9
Changes in 4.9.35
	clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset
	xen/blkback: fix disconnect while I/Os in flight
	xen-blkback: don't leak stack data via response ring
	ALSA: firewire-lib: Fix stall of process context at packet error
	ALSA: pcm: Don't treat NULL chmap as a fatal error
	fs/exec.c: account for argv/envp pointers
	powerpc/perf: Fix oops when kthread execs user process
	autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
	lib/cmdline.c: fix get_options() overflow while parsing ranges
	perf/x86/intel: Add 1G DTLB load/store miss support for SKL
	KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows
	KVM: PPC: Book3S HV: Preserve userspace HTM state properly
	KVM: PPC: Book3S HV: Context-switch EBB registers properly
	CIFS: Improve readdir verbosity
	cxgb4: notify uP to route ctrlq compl to rdma rspq
	HID: Add quirk for Dell PIXART OEM mouse
	signal: Only reschedule timers on signals timers have sent
	powerpc/kprobes: Pause function_graph tracing during jprobes handling
	powerpc/64s: Handle data breakpoints in Radix mode
	Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list
	brcmfmac: add parameter to pass error code in firmware callback
	brcmfmac: use firmware callback upon failure to load
	brcmfmac: unbind all devices upon failure in firmware callback
	time: Fix clock->read(clock) race around clocksource changes
	time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
	arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
	target: Fix kref->refcount underflow in transport_cmd_finish_abort
	iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
	iscsi-target: Reject immediate data underflow larger than SCSI transfer length
	drm/radeon: add a PX quirk for another K53TK variant
	drm/radeon: add a quirk for Toshiba Satellite L20-183
	drm/amdgpu/atom: fix ps allocation size for EnableDispPowerGating
	drm/amdgpu: adjust default display clock
	rxrpc: Fix several cases where a padded len isn't checked in ticket decode
	of: Add check to of_scan_flat_dt() before accessing initial_boot_params
	mtd: spi-nor: fix spansion quad enable
	usb: gadget: f_fs: avoid out of bounds access on comp_desc
	rt2x00: avoid introducing a USB dependency in the rt2x00lib module
	net: phy: Initialize mdio clock at probe function
	dmaengine: bcm2835: Fix cyclic DMA period splitting
	spi: double time out tolerance
	net: phy: fix marvell phy status reading
	jump label: fix passing kbuild_cflags when checking for asm goto support
	brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2()
	Linux 4.9.35

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-29 14:26:07 +02:00
Kees Cook
3d6848e491 fs/exec.c: account for argv/envp pointers
commit 98da7d08850fb8bdeb395d6368ed15753304aa0c upstream.

When limiting the argv/envp strings during exec to 1/4 of the stack limit,
the storage of the pointers to the strings was not included.  This means
that an exec with huge numbers of tiny strings could eat 1/4 of the stack
limit in strings and then additional space would be later used by the
pointers to the strings.

For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
single-byte strings would consume less than 2MB of stack, the max (8MB /
4) amount allowed, but the pointers to the strings would consume the
remaining additional stack space (1677721 * 4 == 6710884).

The result (1677721 + 6710884 == 8388605) would exhaust stack space
entirely.  Controlling this stack exhaustion could result in
pathological behavior in setuid binaries (CVE-2017-1000365).

[akpm@linux-foundation.org: additional commenting from Kees]
Fixes: b6a2fea393 ("mm: variable length argument support")
Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Qualys Security Advisory <qsa@qualys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29 13:00:28 +02:00
Dmitry Shmidt
cd08287396 Merge tag 'v4.9.6' into android-4.9
This is the 4.9.6 stable release

Change-Id: I318df4b9d706d50c13fe3969d734117c25fc94bc
2017-01-31 13:55:27 -08:00
Daniel Rosenberg
bbcd0ffae5 ANDROID: vfs: Add permission2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.

Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-01-31 10:47:22 -08:00
Eric W. Biederman
e747b4ae3b ptrace: Capture the ptracer's creds not PT_PTRACE_CAP
commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream.

When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
overlooked.  This can result in incorrect behavior when an application
like strace traces an exec of a setuid executable.

Further PT_PTRACE_CAP does not have enough information for making good
security decisions as it does not report which user namespace the
capability is in.  This has already allowed one mistake through
insufficient granulariy.

I found this issue when I was testing another corner case of exec and
discovered that I could not get strace to set PT_PTRACE_CAP even when
running strace as root with a full set of caps.

This change fixes the above issue with strace allowing stracing as
root a setuid executable without disabling setuid.  More fundamentaly
this change allows what is allowable at all times, by using the correct
information in it's decision.

Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 10:40:13 +01:00
Aleksa Sarai
c1df5a6371 fs: exec: apply CLOEXEC before changing dumpable task flags
commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 upstream.

If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.

The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):

[vfs]
-> proc_pid_link_inode_operations.get_link
   -> proc_pid_get_link
      -> proc_fd_access_allowed
         -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);

Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.

This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).

Cc: dev@opencontainers.org
Reported-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 10:40:12 +01:00
Eric W. Biederman
21245b8635 exec: Ensure mm->user_ns contains the execed files
commit f84df2a6f268de584a201e8911384a2d244876e3 upstream.

When the user namespace support was merged the need to prevent
ptrace from revealing the contents of an unreadable executable
was overlooked.

Correct this oversight by ensuring that the executed file
or files are in mm->user_ns, by adjusting mm->user_ns.

Use the new function privileged_wrt_inode_uidgid to see if
the executable is a member of the user namespace, and as such
if having CAP_SYS_PTRACE in the user namespace should allow
tracing the executable.  If not update mm->user_ns to
the parent user namespace until an appropriate parent is found.

Reported-by: Jann Horn <jann@thejh.net>
Fixes: 9e4a36ece6 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 10:40:12 +01:00
Lorenzo Stoakes
9beae1ea89 mm: replace get_user_pages_remote() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:02 -07:00
Linus Torvalds
8e7106a607 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
 "This series is all about Nicolas flat format support for MMU systems.

  Traditional m68k no-MMU flat format binaries can now be run on m68k
  MMU enabled systems too.  The series includes some nice cleanups of
  the binfmt_flat code and converts it to using proper user space
  accessor functions.

  With all this in place you can boot and run a complete no-MMU flat
  format based user space on an MMU enabled system"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: enable binfmt_flat on systems with an MMU
  binfmt_flat: allow compressed flat binary format to work on MMU systems
  binfmt_flat: add MMU-specific support
  binfmt_flat: update libraries' data segment pointer with userspace accessors
  binfmt_flat: use clear_user() rather than memset() to clear .bss
  binfmt_flat: use proper user space accessors with old relocs code
  binfmt_flat: use proper user space accessors with relocs processing code
  binfmt_flat: clean up create_flat_tables() and stack accesses
  binfmt_flat: use generic transfer_args_to_stack()
  elf_fdpic_transfer_args_to_stack(): make it generic
  binfmt_flat: prevent kernel dammage from corrupted executable headers
  binfmt_flat: convert printk invocations to their modern form
  binfmt_flat: assorted cleanups
  m68k: use same start_thread() on MMU and no-MMU
  m68k: fix file path comment
  m68k: fix bFLT executable running on MMU enabled systems
2016-08-04 18:04:44 -04:00
Stephen Boyd
a098ecd2fa firmware: support loading into a pre-allocated buffer
Some systems are memory constrained but they need to load very large
firmwares.  The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver.  This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.

This creates needless memory pressure and delays loading because we have
to copy from kernel memory to somewhere else.  Let's add a
request_firmware_into_buf() API that allows drivers to request firmware
be loaded directly into a pre-allocated buffer.  This skips the
intermediate step of allocating a buffer in kernel memory to hold the
firmware image while it's read from the filesystem.  It also requires
that drivers know how much memory they'll require before requesting the
firmware and negates any benefits of firmware caching because the
firmware layer doesn't manage the buffer lifetime.

For a 16MB buffer, about half the time is spent performing a memcpy from
the buffer to the final resting place.  I see loading times go from
0.081171 seconds to 0.047696 seconds after applying this patch.  Plus
the vmalloc pressure is reduced.

This is based on a patch from Vikram Mulukutla on codeaurora.org:
  https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&id=0a328c5f6cd999f5c591f172216835636f39bcb5

Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:10 -04:00
Nicolas Pitre
7e7ec6a934 elf_fdpic_transfer_args_to_stack(): make it generic
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.

While at it, improve the code a bit not to copy below the actual
data area.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-25 16:51:49 +10:00
Andy Lutomirski
380cf5ba6b fs: Treat foreign mounts as nosuid
If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2016-06-24 10:40:41 -05:00
Michal Hocko
f268dfe905 exec: make exec path waiting for mmap_sem killable
setup_arg_pages requires mmap_sem for write.  If the waiting task gets
killed by the oom killer it would block oom_reaper from asynchronous
address space reclaim and reduce the chances of timely OOM resolving.
Wait for the lock in the killable mode and return with EINTR if the task
got killed while waiting.  All the callers are already handling error
path and the fatal signal doesn't need any additional treatment.

The same applies to __bprm_mm_init.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Oleg Nesterov
9eb8a659de exec: remove the no longer needed remove_arg_zero()->free_arg_page()
remove_arg_zero() does free_arg_page() for no reason.  This was needed
before and only if CONFIG_MMU=y: see commit 4fc75ff481 ("exec: fix
remove_arg_zero"), install_arg_page() was called for every page != NULL
in bprm->page[] array.  Today install_arg_page() has already gone and
free_arg_page() is nop after another commit b6a2fea393 ("mm: variable
length argument support").

CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't
need remove_arg_zero()->free_arg_page() too; apart from get_arg_page()
it never checks if the page in bprm->page[] was allocated or not, so the
"extra" non-freed page is fine.  OTOH, this free_arg_page() can add the
minor pessimization, the caller is going to do copy_strings_kernel()
right after remove_arg_zero() which will likely need to re-allocate the
same page again.

And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong
because we are going to increment bprm->p once again before return, so
CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this
page.

NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this
is not necessarily true.  copy_strings() does "len = strnlen_user(...)",
then copy_from_user(len) but another thread or debuger can overwrite the
trailing '0' in between.  Afaics nothing really bad can happen because
we must always have the null-terminated bprm->filename copied by the 1st
copy_strings_kernel(), but perhaps we should change this code to check
"bprm->p < bprm->exec" anyway, and/or change copy_strings() to ensure
that the last byte in string is always zero.

Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported by: hujunjie <jj.net@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Linus Torvalds
f4f27d0028 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - A new LSM, "LoadPin", from Kees Cook is added, which allows forcing
     of modules and firmware to be loaded from a specific device (this
     is from ChromeOS, where the device as a whole is verified
     cryptographically via dm-verity).

     This is disabled by default but can be configured to be enabled by
     default (don't do this if you don't know what you're doing).

   - Keys: allow authentication data to be stored in an asymmetric key.
     Lots of general fixes and updates.

   - SELinux: add restrictions for loading of kernel modules via
     finit_module().  Distinguish non-init user namespace capability
     checks.  Apply execstack check on thread stacks"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (48 commits)
  LSM: LoadPin: provide enablement CONFIG
  Yama: use atomic allocations when reporting
  seccomp: Fix comment typo
  ima: add support for creating files using the mknodat syscall
  ima: fix ima_inode_post_setattr
  vfs: forbid write access when reading a file into memory
  fs: fix over-zealous use of "const"
  selinux: apply execstack check on thread stacks
  selinux: distinguish non-init user namespace capability checks
  LSM: LoadPin for kernel file loading restrictions
  fs: define a string representation of the kernel_read_file_id enumeration
  Yama: consolidate error reporting
  string_helpers: add kstrdup_quotable_file
  string_helpers: add kstrdup_quotable_cmdline
  string_helpers: add kstrdup_quotable
  selinux: check ss_initialized before revalidating an inode label
  selinux: delay inode label lookup as long as possible
  selinux: don't revalidate an inode's label when explicitly setting it
  selinux: Change bool variable name to index.
  KEYS: Add KEYCTL_DH_COMPUTE command
  ...
2016-05-19 09:21:36 -07:00
Kees Cook
cb6fd68fdd exec: clarify reasoning for euid/egid reset
This section of code initially looks redundant, but is required. This
improves the comment to explain more clearly why the reset is needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-17 13:56:53 -07:00
Dmitry Kasatkin
39d637af5a vfs: forbid write access when reading a file into memory
This patch is based on top of the "vfs: support for a common kernel file
loader" patch set.  In general when the kernel is reading a file into
memory it does not want anything else writing to it.

The kernel currently only forbids write access to a file being executed.
This patch extends this locking to files being read by the kernel.

Changelog:
- moved function to kernel_read_file() - Mimi
- updated patch description - Mimi

Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@huawei.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
2016-05-01 09:23:51 -04:00
Linus Torvalds
643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Mimi Zohar
b844f0ecbc vfs: define kernel_copy_file_from_fd()
This patch defines kernel_read_file_from_fd(), a wrapper for the VFS
common kernel_read_file().

Changelog:
- Separated from the kernel modules patch
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2016-02-21 09:06:10 -05:00
Mimi Zohar
39eeb4fb97 security: define kernel_read_file hook
The kernel_read_file security hook is called prior to reading the file
into memory.

Changelog v4+:
- export security_kernel_read_file()

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
2016-02-21 09:06:09 -05:00
Mimi Zohar
09596b94f7 vfs: define kernel_read_file_from_path
This patch defines kernel_read_file_from_path(), a wrapper for the VFS
common kernel_read_file().

Changelog:
- revert error msg regression - reported by Sergey Senozhatsky
- Separated from the IMA patch

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-02-21 08:55:00 -05:00
Mimi Zohar
bc8ca5b92d vfs: define kernel_read_file_id enumeration
To differentiate between the kernel_read_file() callers, this patch
defines a new enumeration named kernel_read_file_id and includes the
caller identifier as an argument.

Subsequent patches define READING_KEXEC_IMAGE, READING_KEXEC_INITRAMFS,
READING_FIRMWARE, READING_MODULE, and READING_POLICY.

Changelog v3:
- Replace the IMA specific enumeration with a generic one.

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-02-18 17:14:04 -05:00
Mimi Zohar
b44a7dfc6f vfs: define a generic function to read a file from the kernel
For a while it was looked down upon to directly read files from Linux.
These days there exists a few mechanisms in the kernel that do just
this though to load a file into a local buffer.  There are minor but
important checks differences on each.  This patch set is the first
attempt at resolving some of these differences.

This patch introduces a common function for reading files from the kernel
with the corresponding security post-read hook and function.

Changelog v4+:
- export security_kernel_post_read_file() - Fengguang Wu
v3:
- additional bounds checking - Luis
v2:
- To simplify patch review, re-ordered patches

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-02-18 17:14:03 -05:00
Dave Hansen
1e9877902d mm/gup: Introduce get_user_pages_remote()
For protection keys, we need to understand whether protections
should be enforced in software or not.  In general, we enforce
protections when working on our own task, but not when on others.
We call these "current" and "remote" operations.

This patch introduces a new get_user_pages() variant:

        get_user_pages_remote()

Which is a replacement for when get_user_pages() is called on
non-current tsk/mm.

We also introduce a new gup flag: FOLL_REMOTE which can be used
for the "__" gup variants to get this new behavior.

The uprobes is_trap_at_addr() location holds mmap_sem and
calls get_user_pages(current->mm) on an instruction address.  This
makes it a pretty unique gup caller.  Being an instruction access
and also really originating from the kernel (vs. the app), I opted
to consider this a 'remote' access where protection keys will not
be enforced.

Without protection keys, this patch should not change any behavior.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:04:09 +01:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Al Viro
62fb4a155f don't carry MAY_OPEN in op->acc_mode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:28:40 -05:00
Eric W. Biederman
90f8572b0f vfs: Commit to never having exectuables on proc and sysfs.
Today proc and sysfs do not contain any executable files.  Several
applications today mount proc or sysfs without noexec and nosuid and
then depend on there being no exectuables files on proc or sysfs.
Having any executable files show on proc or sysfs would cause
a user space visible regression, and most likely security problems.

Therefore commit to never allowing executables on proc and sysfs by
adding a new flag to mark them as filesystems without executables and
enforce that flag.

Test the flag where MNT_NOEXEC is tested today, so that the only user
visible effect will be that exectuables will be treated as if the
execute bit is cleared.

The filesystems proc and sysfs do not currently incoporate any
executable files so this does not result in any user visible effects.

This makes it unnecessary to vet changes to proc and sysfs tightly for
adding exectuable files or changes to chattr that would modify
existing files, as no matter what the individual file say they will
not be treated as exectuable files by the vfs.

Not having to vet changes to closely is important as without this we
are only one proc_create call (or another goof up in the
implementation of notify_change) from having problematic executables
on proc.  Those mistakes are all too easy to make and would create
a situation where there are security issues or the assumptions of
some program having to be broken (and cause userspace regressions).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-10 10:39:25 -05:00
Helge Deller
d045c77c1a parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures
On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
currently parisc and metag only) stack randomization sometimes leads to crashes
when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
by default if not defined in arch-specific headers).

The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
additional space needed for the stack randomization (as defined by the value of
STACK_RND_MASK) was not taken into account yet and as such, when the stack
randomization code added a random offset to the stack start, the stack
effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
which then sometimes leads to out-of-stack situations and crashes.

This patch fixes it by adding the maximum possible amount of memory (based on
STACK_RND_MASK) which theoretically could be added by the stack randomization
code to the initial stack size. That way, the user-defined stack size is always
guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).

This bug is currently not visible on the metag architecture, because on metag
STACK_RND_MASK is defined to 0 which effectively disables stack randomization.

The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
section, so it does not affect other platformws beside those where the
stack grows upwards (parisc and metag).

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: stable@vger.kernel.org # v3.16+
2015-05-12 22:03:44 +02:00
Jann Horn
8b01fc86b9 fs: take i_mutex during prepare_binprm for set[ug]id executables
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-19 13:46:21 -07:00
Kirill Tkhai
dfcce791fb fs/exec.c:de_thread: move notify_count write under lock
We set sig->notify_count = -1 between RELEASE and ACQUIRE operations:

	spin_unlock_irq(lock);
	...
	if (!thread_group_leader(tsk)) {
		...
                for (;;) {
			sig->notify_count = -1;
                        write_lock_irq(&tasklist_lock);

There are no restriction on it so other processors may see this STORE
mixed with other STOREs in both areas limited by the spinlocks.

Probably, it may be reordered with the above

	sig->group_exit_task = tsk;
	sig->notify_count = zap_other_threads(tsk);

in some way.

Set it under tasklist_lock locked to be sure nothing will be reordered.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Davidlohr Bueso
6e399cd144 prctl: avoid using mmap_sem for exe_file serialization
Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
of calling set_mm_exe_file() which requires some form of serialization --
mmap_sem in this case.  For archs that do not have atomic rmw instructions
we still fallback to a spinlock alternative, so this should always be
safe.  As such, we only need the mmap_sem for looking up the backing
vm_file, which can be done sharing the lock.  Naturally, this means we
need to manually deal with both the new and old file reference counting,
and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
probably be deleted in the future anyway.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00