bka
634 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e7a8a5f04c |
Backport new vmalloc for "large performance benefits"
This is a backport from Linux 5.2-rc1 of a patch series to greatly enhance vmalloc's performance especially on embedded systems, plus all of its dependencies that were missing in kernel 4.9. For all the informations, refer to LKML: https://lkml.org/lkml/2018/10/19/786 Brief informations: Currently an allocation of the new VA area is done over busy list iteration until a suitable hole is found between two busy areas. Therefore each new allocation causes the list being grown. Due to long list and different permissive parameters an allocation can take a long time on embedded devices(milliseconds). This patch organizes the vmalloc memory layout into free areas of the VMALLOC_START-VMALLOC_END range. It uses a red-black tree that keeps blocks sorted by their offsets in pair with linked list keeping the free space in order of increasing addresses. Quote Phoronix: With this patch from Uladzislau Rezki, calling vmalloc() can take up to 67% less time compared to the behavior on Linux 5.1 and prior, at least with tests done by the developer under QEMU. Personal tests are showing that the device is more responsive when memory pressure is high and when huge allocations are to be done, it's also noticeably faster in this case, like when starting Chrome with more than 100 opened tabs after a system reboot (so, an uncached complete load of it). Shameless kanged from: https://github.com/sonyxperiadev/kernel / Pull Request 2016 |
||
|
|
0b84e6eefd |
Merge 4.9.292 into android-4.9-q
Changes in 4.9.292 staging: ion: Prevent incorrect reference counting behavour USB: serial: option: add Telit LE910S1 0x9200 composition USB: serial: option: add Fibocom FM101-GL variants usb: hub: Fix usb enumeration issue due to address0 race usb: hub: Fix locking issues with address0_mutex binder: fix test regression due to sender_euid change ALSA: ctxfi: Fix out-of-range access staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect() fuse: fix page stealing xen: don't continue xenstore initialization in case of errors xen: detect uninitialized xenbus in xenbus_init tracing: Fix pid filtering when triggers are attached ARM: dts: BCM5301X: Add interrupt properties to GPIO node ASoC: topology: Add missing rwsem around snd_ctl_remove() calls net: ieee802154: handle iftypes as u32 NFSv42: Don't fail clone() unless the OP_CLONE operation failed ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE scsi: mpt3sas: Fix kernel panic during drive powercycle test drm/vc4: fix error code in vc4_create_object() PM: hibernate: use correct mode for swsusp_close() tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows tracing: Check pid filtering when creating events hugetlbfs: flush TLBs correctly after huge_pmd_unshare vhost/vsock: fix incorrect used length reported to the guest proc/vmcore: fix clearing user buffer by properly using clear_user() NFC: add NCI_UNREG flag to eliminate the race fuse: release pipe buf after last use xen: sync include/xen/interface/io/ring.h with Xen's newest version xen/blkfront: read response from backend only once xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: don't trust the backend response data blindly xen/netfront: read response from backend only once xen/netfront: don't read data from request on the ring page xen/netfront: disentangle tx_skb_freelist xen/netfront: don't trust the backend response data blindly tty: hvc: replace BUG_ON() with negative return value shm: extend forced shm destroy to support objects from several IPC nses NFSv42: Fix pagecache invalidation after COPY/CLONE hugetlb: take PMD sharing into account when flushing tlb/caches net: return correct error code platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep s390/setup: avoid using memblock_enforce_memory_limit thermal: core: Reset previous low and high trip during thermal zone init scsi: iscsi: Unblock session then wake up error handler ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port() net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit kprobes: Limit max data_size of the kretprobe instances sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl fs: add fget_many() and fput_many() fget: check that the fd still exists after getting a ref to it natsemi: xtensa: fix section mismatch warnings net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings() siphash: use _unaligned version by default net/rds: correct socket tunable error in rds_tcp_tune() parisc: Fix "make install" on newer debian releases vgacon: Propagate console boot parameters before calling `vc_resize' tty: serial: msm_serial: Deactivate RX DMA for polling support serial: pl011: Add ACPI SBSA UART match id serial: core: fix transmit-buffer reset and memleak Linux 4.9.292 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I98a677406cfca6a63fffa4a94e45edbd45fd671a |
||
|
|
cc927c1ddb |
shm: extend forced shm destroy to support objects from several IPC nses
commit 85b6d24646e4125c591639841169baa98a2da503 upstream.
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes:
|
||
|
|
5791e89d21 |
Merge 4.9.224 into android-4.9-q
Changes in 4.9.224 USB: serial: qcserial: Add DW5816e support dp83640: reverse arguments to list_add_tail fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks net: macsec: preserve ingress frame ordering net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc() net: usb: qmi_wwan: add support for DW5816e sch_choke: avoid potential panic in choke_reset() sch_sfq: validate silly quantum values bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features(). net/mlx5: Fix forced completion access non initialized command entry net/mlx5: Fix command entry leak in Internal Error State bnxt_en: Improve AER slot reset. Revert "ACPI / video: Add force_native quirk for HP Pavilion dv6" binfmt_elf: move brk out of mmap when doing direct loader exec USB: uas: add quirk for LaCie 2Big Quadra USB: serial: garmin_gps: add sanity checking for data length tracing: Add a vmalloc_sync_mappings() for safe measure mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() batman-adv: fix batadv_nc_random_weight_tq batman-adv: Fix refcnt leak in batadv_show_throughput_override batman-adv: Fix refcnt leak in batadv_store_throughput_override batman-adv: Fix refcnt leak in batadv_v_ogm_process objtool: Fix stack offset tracking for indirect CFAs scripts/decodecode: fix trapping instruction formatting binfmt_elf: Do not move brk for INTERP-less ET_EXEC ext4: add cond_resched() to ext4_protect_reserved_inode net: ipv6: add net argument to ip6_dst_lookup_flow net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup blktrace: Fix potential deadlock between delete & sysfs ops blktrace: fix unlocked access to init/start-stop/teardown blktrace: fix trace mutex deadlock blktrace: Protect q->blk_trace with RCU blktrace: fix dereference after null check ptp: do not explicitly set drvdata in ptp_clock_register() ptp: use is_visible method to hide unused attributes ptp: create "pins" together with the rest of attributes chardev: add helper function to register char devs with a struct device ptp: Fix pass zero to ERR_PTR() in ptp_clock_register ptp: fix the race between the release of ptp_clock and cdev ptp: free ptp device pin descriptors properly shmem: fix possible deadlocks on shmlock_user_lock net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()' net: moxa: Fix a potential double 'free_irq()' drop_monitor: work around gcc-10 stringop-overflow warning scsi: sg: add sg_remove_request in sg_write spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls cifs: Check for timeout on Negotiate stage cifs: Fix a race condition with cifs_echo_request dmaengine: pch_dma.c: Avoid data race between probe and irq handler dmaengine: mmp_tdma: Reset channel error on release ALSA: hda/hdmi: fix race in monitor detection during probe drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper() ipc/util.c: sysvipc_find_ipc() incorrectly updates position index pinctrl: cherryview: Add missing spinlock usage in chv_gpio_irq_handler i40iw: Fix error handling in i40iw_manage_arp_cache() netfilter: conntrack: avoid gcc-10 zero-length-bounds warning IB/mlx4: Test return value of calls to ib_get_cached_pkey pnp: Use list_for_each_entry() instead of open coding gcc-10 warnings: fix low-hanging fruit kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig Stop the ad-hoc games with -Wno-maybe-initialized net: phy: micrel: Use strlcpy() for ethtool::get_strings gcc-10: avoid shadowing standard library 'free()' in crypto gcc-10: disable 'zero-length-bounds' warning for now gcc-10: disable 'array-bounds' warning for now gcc-10: disable 'stringop-overflow' warning for now gcc-10: disable 'restrict' warning for now net: fix a potential recursive NETDEV_FEAT_CHANGE netlabel: cope with NULL catmap Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu" net: ipv4: really enforce backoff for redirects netprio_cgroup: Fix unlimited memory leak of v2 cgroups ALSA: hda/realtek - Limit int mic boost for Thinkpad T530 ALSA: rawmidi: Initialize allocated buffers ALSA: rawmidi: Fix racy buffer resize under concurrent accesses ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset USB: gadget: fix illegal array access in binding with UDC usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list ARM: dts: imx27-phytec-phycard-s-rdk: Fix the I2C1 pinctrl entries x86: Fix early boot crash on gcc-10, third try exec: Move would_dump into flush_old_exec usb: gadget: net2272: Fix a memory leak in an error handling path in 'net2272_plat_probe()' usb: gadget: audio: Fix a missing error return value in audio_bind() usb: gadget: legacy: fix error return code in gncm_bind() usb: gadget: legacy: fix error return code in cdc_bind() Revert "ALSA: hda/realtek: Fix pop noise on ALC225" ARM: dts: r8a73a4: Add missing CMT1 interrupts ARM: dts: r8a7740: Add missing extal2 to CPG node KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce Makefile: disallow data races on gcc-10 as well Linux 4.9.224 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib9fdff8dd9e9c92a2e750251b98eb0e4c9496fba |
||
|
|
27634d8333 |
ipc/util.c: sysvipc_find_ipc() incorrectly updates position index
[ Upstream commit 5e698222c70257d13ae0816720dde57c56f81e15 ]
Commit 89163f93c6f9 ("ipc/util.c: sysvipc_find_ipc() should increase
position index") is causing this bug (seen on 5.6.8):
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
# ipcmk -Q
Message queue id: 0
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x82db8127 0 root 644 0 0
# ipcmk -Q
Message queue id: 1
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x82db8127 0 root 644 0 0
0x76d1fb2a 1 root 644 0 0
# ipcrm -q 0
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x76d1fb2a 1 root 644 0 0
0x76d1fb2a 1 root 644 0 0
# ipcmk -Q
Message queue id: 2
# ipcrm -q 2
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x76d1fb2a 1 root 644 0 0
0x76d1fb2a 1 root 644 0 0
# ipcmk -Q
Message queue id: 3
# ipcrm -q 1
# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x7c982867 3 root 644 0 0
0x7c982867 3 root 644 0 0
0x7c982867 3 root 644 0 0
0x7c982867 3 root 644 0 0
Whenever an IPC item with a low id is deleted, the items with higher ids
are duplicated, as if filling a hole.
new_pos should jump through hole of unused ids, pos can be updated
inside "for" cycle.
Fixes: 89163f93c6f9 ("ipc/util.c: sysvipc_find_ipc() should increase position index")
Reported-by: Andreas Schwab <schwab@suse.de>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/4921fe9b-9385-a2b4-1dc4-1099be6d2e39@virtuozzo.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
cf77b7c4f2 |
Merge 4.9.221 into android-4.9-q
Changes in 4.9.221
ext4: fix extent_status fragmentation for plain files
net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
net: ipv4: avoid unused variable warning for sysctl
drm/msm: Use the correct dma_sync calls harder
crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static
vti4: removed duplicate log message.
watchdog: reset last_hw_keepalive time at start
scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login
ceph: return ceph_mdsc_do_request() errors from __get_parent()
ceph: don't skip updating wanted caps when cap is stale
pwm: rcar: Fix late Runtime PM enablement
scsi: iscsi: Report unbind session event when the target has been removed
ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map()
kernel/gcov/fs.c: gcov_seq_next() should increase position index
ipc/util.c: sysvipc_find_ipc() should increase position index
s390/cio: avoid duplicated 'ADD' uevents
pwm: renesas-tpu: Fix late Runtime PM enablement
pwm: bcm2835: Dynamically allocate base
PCI/ASPM: Allow re-enabling Clock PM
ipv6: fix restrict IPV6_ADDRFORM operation
macsec: avoid to set wrong mtu
macvlan: fix null dereference in macvlan_device_event()
net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node
net/x25: Fix x25_neigh refcnt leak when receiving frame
tcp: cache line align MAX_TCP_HEADER
team: fix hang in team_mode_get()
net: dsa: b53: Fix ARL register definitions
xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish
ALSA: hda: Remove ASUS ROG Zenith from the blacklist
iio: xilinx-xadc: Fix ADC-B powerdown
iio: xilinx-xadc: Fix clearing interrupt when enabling trigger
iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode
fs/namespace.c: fix mountpoint reference counter race
USB: sisusbvga: Change port variable from signed to unsigned
USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPIDFIRE
USB: core: Fix free-while-in-use bug in the USB S-Glibrary
USB: hub: Fix handling of connect changes during sleep
overflow.h: Add arithmetic shift helper
vmalloc: fix remap_vmalloc_range() bounds checks
ALSA: usx2y: Fix potential NULL dereference
ALSA: usb-audio: Fix usb audio refcnt leak when getting spdif
ALSA: usb-audio: Filter out unsupported sample rates on Focusrite devices
tpm/tpm_tis: Free IRQ if probing fails
KVM: Check validity of resolved slot when searching memslots
KVM: VMX: Enable machine check support for 32bit targets
tty: hvc: fix buffer overflow during hvc_alloc().
tty: rocket, avoid OOB access
usb-storage: Add unusual_devs entry for JMicron JMS566
audit: check the length of userspace generated audit records
ASoC: dapm: fixup dapm kcontrol widget
ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y
staging: comedi: dt2815: fix writing hi byte of analog output
staging: comedi: Fix comedi_device refcnt leak in comedi_open
staging: vt6656: Fix drivers TBTT timing counter.
staging: vt6656: Power save stop wake_up_count wrap around.
UAS: no use logging any details in case of ENODEV
UAS: fix deadlock in error handling and PM flushing work
usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset()
remoteproc: Fix wrong rvring index computation
fuse: fix possibly missed wake-up after abort
mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer
usb: gadget: udc: bdc: Remove unnecessary NULL checks in bdc_req_complete
nfsd: memory corruption in nfsd4_lock()
net/cxgb4: Check the return from t4_query_params properly
perf/core: fix parent pid/tid in task exit events
bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B
xfs: fix partially uninitialized structure in xfs_reflink_remap_extent
scsi: target: fix PR IN / READ FULL STATUS for FC
objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
objtool: Support Clang non-section symbols in ORC dump
xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status
ext4: convert BUG_ON's to WARN_ON's in mballoc.c
hwmon: (jc42) Fix name to have no illegal characters
ext4: avoid declaring fs inconsistent due to invalid file handles
ext4: protect journal inode's blocks using block_validity
ext4: don't perform block validity checks on the journal inode
ext4: fix block validity checks for journal inodes using indirect blocks
ext4: unsigned int compared against zero
ext4: check for non-zero journal inum in ext4_calculate_overhead
propagate_one(): mnt_set_mountpoint() needs mount_lock
Linux 4.9.221
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ica420bdbb6c18fff9d0e6abffc180beefff0c5b9
|
||
|
|
2edb90c2d9 |
ipc/util.c: sysvipc_find_ipc() should increase position index
[ Upstream commit 89163f93c6f969da5811af5377cc10173583123b ] If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: NeilBrown <neilb@suse.com> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/b7a20945-e315-8bb0-21e6-3875c14a8494@virtuozzo.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
60fbe7034f |
Merge 4.9.215 into android-4.9-q
Changes in 4.9.215
x86/vdso: Use RDPID in preference to LSL when available
KVM: x86: emulate RDPID
ALSA: hda: Use scnprintf() for printing texts for sysfs/procfs
ecryptfs: fix a memory leak bug in parse_tag_1_packet()
ecryptfs: fix a memory leak bug in ecryptfs_init_messaging()
ALSA: usb-audio: Apply sample rate quirk for Audioengine D1
ext4: don't assume that mmp_nodename/bdevname have NUL
ext4: fix checksum errors with indexed dirs
ext4: improve explanation of a mount failure caused by a misconfigured kernel
Btrfs: fix race between using extent maps and merging them
btrfs: log message when rw remount is attempted with unclean tree-log
perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's event map
padata: Remove broken queue flushing
s390/time: Fix clk type in get_tod_clock
perf/x86/intel: Fix inaccurate period in context switch for auto-reload
hwmon: (pmbus/ltc2978) Fix PMBus polling of MFR_COMMON definitions.
jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
btrfs: print message when tree-log replay starts
scsi: qla2xxx: fix a potential NULL pointer dereference
Revert "KVM: VMX: Add non-canonical check on writes to RTIT address MSRs"
drm/gma500: Fixup fbdev stolen size usage evaluation
cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
brcmfmac: Fix use after free in brcmf_sdio_readframes()
gianfar: Fix TX timestamping with a stacked DSA driver
pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs
pxa168fb: Fix the function used to release some memory in an error handling path
media: i2c: mt9v032: fix enum mbus codes and frame sizes
powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap()
media: sti: bdisp: fix a possible sleep-in-atomic-context bug in bdisp_device_run()
pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
efi/x86: Map the entire EFI vendor string before copying it
MIPS: Loongson: Fix potential NULL dereference in loongson3_platform_init()
sparc: Add .exit.data section.
uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()
usb: gadget: udc: fix possible sleep-in-atomic-context bugs in gr_probe()
jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
x86/sysfb: Fix check for bad VRAM size
tracing: Fix tracing_stat return values in error handling paths
tracing: Fix very unlikely race of registering two stat tracers
ext4, jbd2: ensure panic when aborting with zero errno
kconfig: fix broken dependency in randconfig-generated .config
clk: qcom: rcg2: Don't crash if our parent can't be found; return an error
drm/amdgpu: remove 4 set but not used variable in amdgpu_atombios_get_connector_info_from_object_table
regulator: rk808: Lower log level on optional GPIOs being not available
net/wan/fsl_ucc_hdlc: reject muram offsets above 64K
PCI/IOV: Fix memory leak in pci_iov_add_virtfn()
NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu().
media: v4l2-device.h: Explicitly compare grp{id,mask} to zero in v4l2_device macros
reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
ALSA: usx2y: Adjust indentation in snd_usX2Y_hwdep_dsp_status
b43legacy: Fix -Wcast-function-type
ipw2x00: Fix -Wcast-function-type
iwlegacy: Fix -Wcast-function-type
rtlwifi: rtl_pci: Fix -Wcast-function-type
orinoco: avoid assertion in case of NULL pointer
ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
scsi: aic7xxx: Adjust indentation in ahc_find_syncrate
drm/mediatek: handle events when enabling/disabling crtc
ARM: dts: r8a7779: Add device node for ARM global timer
x86/vdso: Provide missing include file
PM / devfreq: rk3399_dmc: Add COMPILE_TEST and HAVE_ARM_SMCCC dependency
pinctrl: sh-pfc: sh7269: Fix CAN function GPIOs
RDMA/rxe: Fix error type of mmap_offset
ALSA: sh: Fix compile warning wrt const
tools lib api fs: Fix gcc9 stringop-truncation compilation error
usbip: Fix unsafe unaligned pointer usage
udf: Fix free space reporting for metadata and virtual partitions
soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls
Input: edt-ft5x06 - work around first register access error
wan: ixp4xx_hss: fix compile-testing on 64-bit
ASoC: atmel: fix build error with CONFIG_SND_ATMEL_SOC_DMA=m
tty: synclinkmp: Adjust indentation in several functions
tty: synclink_gt: Adjust indentation in several functions
driver core: platform: Prevent resouce overflow from causing infinite loops
driver core: Print device when resources present in really_probe()
vme: bridges: reduce stack usage
drm/nouveau/gr/gk20a,gm200-: add terminators to method lists read from fw
drm/nouveau: Fix copy-paste error in nouveau_fence_wait_uevent_handler
drm/vmwgfx: prevent memory leak in vmw_cmdbuf_res_add
usb: musb: omap2430: Get rid of musb .set_vbus for omap2430 glue
iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
scsi: iscsi: Don't destroy session if there are outstanding connections
arm64: fix alternatives with LLVM's integrated assembler
pwm: omap-dmtimer: Remove PWM chip in .remove before making it unfunctional
cmd64x: potential buffer overflow in cmd64x_program_timings()
ide: serverworks: potential overflow in svwks_set_pio_mode()
remoteproc: Initialize rproc_class before use
x86/decoder: Add TEST opcode to Group3-2
s390/ftrace: generate traced function stack frame
driver core: platform: fix u32 greater or equal to zero comparison
ALSA: hda - Add docking station support for Lenovo Thinkpad T420s
powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
ARM: 8951/1: Fix Kexec compilation issue.
hostap: Adjust indentation in prism2_hostapd_add_sta
iwlegacy: ensure loop counter addr does not wrap and cause an infinite loop
cifs: fix NULL dereference in match_prepath
irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
ftrace: fpid_next() should increase position index
trigger_next should increase position index
radeon: insert 10ms sleep in dce5_crtc_load_lut
ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
lib/scatterlist.c: adjust indentation in __sg_alloc_table
reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
bcache: explicity type cast in bset_bkey_last()
irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
iwlwifi: mvm: Fix thermal zone registration
microblaze: Prevent the overflow of the start
brd: check and limit max_part par
help_next should increase position index
selinux: ensure we cleanup the internal AVC counters on error in avc_update()
enic: prevent waking up stopped tx queues over watchdog reset
net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS
net/sched: flower: add missing validation of TCA_FLOWER_FLAGS
floppy: check FDC index for errors before assigning it
vt: selection, handle pending signals in paste_selection
staging: android: ashmem: Disallow ashmem memory from being remapped
staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
xhci: Force Maximum Packet size for Full-speed bulk devices to valid range.
usb: uas: fix a plug & unplug racing
USB: Fix novation SourceControl XL after suspend
USB: hub: Don't record a connect-change event during reset-resume
staging: rtl8188eu: Fix potential security hole
staging: rtl8188eu: Fix potential overuse of kernel memory
x86/mce/amd: Publish the bank pointer only after setup has succeeded
x86/mce/amd: Fix kobject lifetime
tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode
tty: serial: imx: setup the correct sg entry for tx dma
Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
xhci: apply XHCI_PME_STUCK_QUIRK to Intel Comet Lake platforms
KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI
VT_RESIZEX: get rid of field-by-field copyin
vt: vt_ioctl: fix race in VT_RESIZEX
lib/stackdepot.c: fix global out-of-bounds in stack_slabs
KVM: nVMX: Don't emulate instructions in guest mode
netfilter: xt_bpf: add overflow checks
ext4: fix a data race in EXT4_I(inode)->i_disksize
ext4: add cond_resched() to __ext4_find_entry()
ext4: fix mount failure with quota configured as module
ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
KVM: nVMX: Refactor IO bitmap checks into helper function
KVM: nVMX: Check IO instruction VM-exit conditions
KVM: apic: avoid calculating pending eoi from an uninitialized val
Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
staging: greybus: use after free in gb_audio_manager_remove_all()
ecryptfs: replace BUG_ON with error handling code
ALSA: rawmidi: Avoid bit fields for state flags
ALSA: seq: Avoid concurrent access to queue flags
ALSA: seq: Fix concurrent access to queue current tick/time
netfilter: xt_hashlimit: limit the max size of hashtable
ata: ahci: Add shutdown to freeze hardware resources of ahci
xen: Enable interrupts when calling _cond_resched()
s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range
Linux 4.9.215
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4c663321dde48cd2a324e59acb70c99f75f9344e
|
||
|
|
cc79150ad3 |
Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
commit edf28f4061afe4c2d9eb1c3323d90e882c1d6800 upstream. This reverts commit |
||
|
|
4ebd29edaf |
Merge 4.9.188 into android-4.9-q
Changes in 4.9.188 ARM: riscpc: fix DMA ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200 ARM: dts: rockchip: Make rk3288-veyron-mickey's emmc work again ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend ftrace: Enable trampoline when rec count returns back to one kernel/module.c: Only return -EEXIST for modules that have finished loading MIPS: lantiq: Fix bitfield masking dmaengine: rcar-dmac: Reject zero-length slave DMA requests fs/adfs: super: fix use-after-free bug btrfs: fix minimum number of chunk errors for DUP ceph: fix improper use of smp_mb__before_atomic() ceph: return -ERANGE if virtual xattr value didn't fit in buffer scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized ACPI: fix false-positive -Wuninitialized warning be2net: Signal that the device cannot transmit during reconfiguration x86/apic: Silence -Wtype-limits compiler warnings x86: math-emu: Hide clang warnings for 16-bit overflow mm/cma.c: fail if fixed declaration can't be honored coda: add error handling for fget coda: fix build using bare-metal toolchain uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings ipc/mqueue.c: only perform resource calculation if user valid x86/kvm: Don't call kvm_spurious_fault() from .fixup x86, boot: Remove multiple copy of static function sanitize_boot_params() kbuild: initialize CLANG_FLAGS correctly in the top Makefile Btrfs: fix incremental send failure after deduplication mmc: dw_mmc: Fix occasional hang after tuning on eMMC gpiolib: fix incorrect IRQ requesting of an active-low lineevent selinux: fix memory leak in policydb_init() s390/dasd: fix endless loop after read unit address configuration drivers/perf: arm_pmu: Fix failure path in PM notifier xen/swiotlb: fix condition for calling xen_destroy_contiguous_region() IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping infiniband: fix race condition between infiniband mlx4, mlx5 driver and core dumping coredump: fix race condition between collapse_huge_page() and core dumping eeprom: at24: make spd world-readable again Backport minimal compiler_attributes.h to support GCC 9 include/linux/module.h: copy __init/__exit attrs to init/cleanup_module objtool: Support GCC 9 cold subfunction naming scheme x86, mm, gup: prevent get_page() race with munmap in paravirt guest Linux 4.9.188 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
62369d5c01 |
ipc/mqueue.c: only perform resource calculation if user valid
[ Upstream commit a318f12ed8843cfac53198390c74a565c632f417 ]
Andreas Christoforou reported:
UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
9 * 2305843009213693951 cannot be represented in type 'long int'
...
Call Trace:
mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
evict+0x472/0x8c0 fs/inode.c:558
iput_final fs/inode.c:1547 [inline]
iput+0x51d/0x8c0 fs/inode.c:1573
mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
vfs_mkobj+0x39e/0x580 fs/namei.c:2892
prepare_open ipc/mqueue.c:731 [inline]
do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771
Which could be triggered by:
struct mq_attr attr = {
.mq_flags = 0,
.mq_maxmsg = 9,
.mq_msgsize = 0x1fffffffffffffff,
.mq_curmsgs = 0,
};
if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
perror("mq_open");
mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
preparing to return -EINVAL. During the cleanup, it calls
mqueue_evict_inode() which performed resource usage tracking math for
updating "user", before checking if there was a valid "user" at all
(which would indicate that the calculations would be sane). Instead,
delay this check to after seeing a valid "user".
The overflow was real, but the results went unused, so while the flaw is
harmless, it's noisy for kernel fuzzers, so just fix it by moving the
calculation under the non-NULL "user" where it actually gets used.
Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Andreas Christoforou <andreaschristofo@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
a0b21f86b2 |
Merge 4.9.183 into android-4.9-q
Changes in 4.9.183 rapidio: fix a NULL pointer dereference when create_workqueue() fails fs/fat/file.c: issue flush after the writeback of FAT sysctl: return -EINVAL if val violates minmax ipc: prevent lockup on alloc_msg and free_msg ARM: prevent tracing IPI_CPU_BACKTRACE hugetlbfs: on restore reserve error path retain subpool reservation mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE mm/cma.c: fix crash on CMA allocation if bitmap allocation fails mm/cma_debug.c: fix the break condition in cma_maxchunk_get() mm/slab.c: fix an infinite loop in leaks_show() kernel/sys.c: prctl: fix false positive in validate_prctl_map() drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER mfd: tps65912-spi: Add missing of table registration mfd: intel-lpss: Set the device in reset state when init mfd: twl6040: Fix device init errors for ACCCTL register perf/x86/intel: Allow PEBS multi-entry in watermark mode drm/bridge: adv7511: Fix low refresh rate selection objtool: Don't use ignore flag for fake jumps pwm: meson: Use the spin-lock only to protect register modifications ntp: Allow TAI-UTC offset to be set to zero f2fs: fix to avoid panic in do_recover_data() f2fs: fix to clear dirty inode in error path of f2fs_iget() f2fs: fix to do sanity check on valid block count of segment configfs: fix possible use-after-free in configfs_register_group uml: fix a boot splat wrt use of cpu_all_mask watchdog: imx2_wdt: Fix set_timeout for big timeout values watchdog: fix compile time error of pretimeout governors iommu/vt-d: Set intel_iommu_gfx_mapped correctly ALSA: hda - Register irq handler after the chip initialization nvmem: core: fix read buffer in place fuse: retrieve: cap requested size to negotiated max_write nfsd: allow fh_want_write to be called twice x86/PCI: Fix PCI IRQ routing table memory leak platform/chrome: cros_ec_proto: check for NULL transfer function soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288 ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA ARM: dts: imx7d: Specify IMX7D_CLK_IPG as "ipg" clock to SDMA ARM: dts: imx6ul: Specify IMX6UL_CLK_IPG as "ipg" clock to SDMA ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA PCI: rpadlpar: Fix leaked device_node references in add/remove paths platform/x86: intel_pmc_ipc: adding error handling PCI: rcar: Fix a potential NULL pointer dereference PCI: rcar: Fix 64bit MSI message address handling video: hgafb: fix potential NULL pointer dereference video: imsttfb: fix potential NULL pointer dereferences PCI: xilinx: Check for __get_free_pages() failure gpio: gpio-omap: add check for off wake capable gpios dmaengine: idma64: Use actual device for DMA transfers pwm: tiehrpwm: Update shadow register for disabling PWMs ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa pwm: Fix deadlock warning when removing PWM device ARM: exynos: Fix undefined instruction during Exynos5422 resume Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections" ALSA: seq: Cover unsubscribe_port() in list_mutex ALSA: oxfw: allow PCM capture for Stanton SCS.1m libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node fs/ocfs2: fix race in ocfs2_dentry_attach_lock() signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO ptrace: restore smp_rmb() in __ptrace_may_access() media: v4l2-ioctl: clear fields in s_parm i2c: acorn: fix i2c warning bcache: fix stack corruption by PRECEDING_KEY() cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css() ASoC: cs42xx8: Add regcache mask dirty ASoC: fsl_asrc: Fix the issue about unsupported rate x86/uaccess, kcov: Disable stack protector ALSA: seq: Protect in-kernel ioctl calls with mutex ALSA: seq: Fix race of get-subscription call vs port-delete ioctls Revert "ALSA: seq: Protect in-kernel ioctl calls with mutex" Drivers: misc: fix out-of-bounds access in function param_set_kgdbts_var scsi: lpfc: add check for loss of ndlp when sending RRQ arm64/mm: Inhibit huge-vmap with ptdump scsi: bnx2fc: fix incorrect cast to u64 on shift operation selftests/timers: Add missing fflush(stdout) calls usbnet: ipheth: fix racing condition KVM: x86/pmu: do not mask the value that is written to fixed PMUs KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read drm/vmwgfx: NULL pointer dereference from vmw_cmd_dx_view_define() usb: dwc2: Fix DMA cache alignment issues USB: Fix chipmunk-like voice when using Logitech C270 for recording audio. USB: usb-storage: Add new ID to ums-realtek USB: serial: pl2303: add Allied Telesis VT-Kit3 USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode USB: serial: option: add Telit 0x1260 and 0x1261 compositions rtc: pcf8523: don't return invalid date when battery is low ax25: fix inconsistent lock state in ax25_destroy_timer be2net: Fix number of Rx queues used for flow hashing ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero lapb: fixed leak of control-blocks. neigh: fix use-after-free read in pneigh_get_next sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg Revert "staging: vc04_services: prevent integer overflow in create_pagelist()" perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints selftests: netfilter: missing error check when setting up veth interface mISDN: make sure device name is NUL terminated x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor perf/ring_buffer: Fix exposing a temporarily decreased data_head perf/ring_buffer: Add ordering to rb->nest increment gpio: fix gpio-adp5588 build errors net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE() i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr configfs: Fix use-after-free when accessing sd->s_dentry perf data: Fix 'strncat may truncate' build failure with recent gcc perf record: Fix s390 missing module symbol and warning for non-root users ia64: fix build errors by exporting paddr_to_nid() KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route() scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask scsi: libsas: delete sas port if expander discover failed mlxsw: spectrum: Prevent force of 56G Abort file_remove_privs() for non-reg. files Linux 4.9.183 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
7b5598c8ad |
ipc: prevent lockup on alloc_msg and free_msg
[ Upstream commit d6a2946a88f524a47cc9b79279667137899db807 ] msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is enabled on large memory SMP systems, the pages initialization can take a long time, if msgctl10 requests a huge block memory, and it will block rcu scheduler, so release cpu actively. After adding schedule() in free_msg, free_msg can not be called when holding spinlock, so adding msg to a tmp list, and free it out of spinlock rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505 rcu: Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978 rcu: (detected by 11, t=35024 jiffies, g=44237529, q=16542267) msgctl10 R running task 21608 32505 2794 0x00000082 Call Trace: preempt_schedule_irq+0x4c/0xb0 retint_kernel+0x1b/0x2d RIP: 0010:__is_insn_slot_addr+0xfb/0x250 Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff <41> c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48 RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57 RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780 RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3 R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73 R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec kernel_text_address+0xc1/0x100 __kernel_text_address+0xe/0x30 unwind_get_return_address+0x2f/0x50 __save_stack_trace+0x92/0x100 create_object+0x380/0x650 __kmalloc+0x14c/0x2b0 load_msg+0x38/0x1a0 do_msgsnd+0x19e/0xcf0 do_syscall_64+0x117/0x400 entry_SYSCALL_64_after_hwframe+0x49/0xbe rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170 rcu: (detected by 14, t=35016 jiffies, g=44237525, q=12423063) msgctl10 R running task 21608 32170 32155 0x00000082 Call Trace: preempt_schedule_irq+0x4c/0xb0 retint_kernel+0x1b/0x2d RIP: 0010:lock_acquire+0x4d/0x340 Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 <48> c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82 RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64 RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000 is_bpf_text_address+0x32/0xe0 kernel_text_address+0xec/0x100 __kernel_text_address+0xe/0x30 unwind_get_return_address+0x2f/0x50 __save_stack_trace+0x92/0x100 save_stack+0x32/0xb0 __kasan_slab_free+0x130/0x180 kfree+0xfa/0x2d0 free_msg+0x24/0x50 do_msgrcv+0x508/0xe60 do_syscall_64+0x117/0x400 entry_SYSCALL_64_after_hwframe+0x49/0xbe Davidlohr said: "So after releasing the lock, the msg rbtree/list is empty and new calls will not see those in the newly populated tmp_msg list, and therefore they cannot access the delayed msg freeing pointers, which is good. Also the fact that the node_cache is now freed before the actual messages seems to be harmless as this is wanted for msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the info->lock the thing is freed anyway so it should not change things" Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
9797dcb8c7 |
Merge 4.9.104 into android-4.9
Changes in 4.9.104
MIPS: c-r4k: Fix data corruption related to cache coherence
MIPS: ptrace: Expose FIR register through FP regset
MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
affs_lookup(): close a race with affs_remove_link()
aio: fix io_destroy(2) vs. lookup_ioctx() race
ALSA: timer: Fix pause event notification
do d_instantiate/unlock_new_inode combinations safely
mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
libata: Blacklist some Sandisk SSDs for NCQ
libata: blacklist Micron 500IT SSD with MU01 firmware
xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
IB/hfi1: Use after free race condition in send context error path
Revert "ipc/shm: Fix shmat mmap nil-page protection"
ipc/shm: fix shmat() nil address after round-down when remapping
kasan: fix memory hotplug during boot
kernel/sys.c: fix potential Spectre v1 issue
kernel/signal.c: avoid undefined behaviour in kill_something_info
KVM/VMX: Expose SSBD properly to guests
KVM: s390: vsie: fix < 8k check for the itdba
KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
kvm: x86: IA32_ARCH_CAPABILITIES is always supported
firewire-ohci: work around oversized DMA reads on JMicron controllers
x86/tsc: Allow TSC calibration without PIT
NFSv4: always set NFS_LOCK_LOST when a lock is lost.
ALSA: hda - Use IS_REACHABLE() for dependency on input
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460
tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
PCI: Add function 1 DMA alias quirk for Marvell 9128
Input: psmouse - fix Synaptics detection when protocol is disabled
i40iw: Zero-out consumer key on allocate stag for FMR
tools lib traceevent: Simplify pointer print logic and fix %pF
perf callchain: Fix attr.sample_max_stack setting
tools lib traceevent: Fix get_field_str() for dynamic strings
perf record: Fix failed memory allocation for get_cpuid_str
iommu/vt-d: Use domain instead of cache fetching
dm thin: fix documentation relative to low water mark threshold
net: stmmac: dwmac-meson8b: fix setting the RGMII TX clock on Meson8b
net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock
nfs: Do not convert nfs_idmap_cache_timeout to jiffies
watchdog: sp5100_tco: Fix watchdog disable bit
kconfig: Don't leak main menus during parsing
kconfig: Fix automatic menu creation mem leak
kconfig: Fix expr_free() E_NOT leak
mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
ipmi/powernv: Fix error return code in ipmi_powernv_probe()
Btrfs: set plug for fsync
btrfs: Fix out of bounds access in btrfs_search_slot
Btrfs: fix scrub to repair raid6 corruption
btrfs: fail mount when sb flag is not in BTRFS_SUPER_FLAG_SUPP
HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
fm10k: fix "failed to kill vid" message for VF
device property: Define type of PROPERTY_ENRTY_*() macros
jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
powerpc/numa: Ensure nodes initialized for hotplug
RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
ntb_transport: Fix bug with max_mw_size parameter
gianfar: prevent integer wrapping in the rx handler
tcp_nv: fix potential integer overflow in tcpnv_acked
kvm: Map PFN-type memory regions as writable (if possible)
ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
ocfs2: return error when we attempt to access a dirty bh in jbd2
mm/mempolicy: fix the check of nodemask from user
mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
asm-generic: provide generic_pmdp_establish()
sparc64: update pmdp_invalidate() to return old pmd value
mm: thp: use down_read_trylock() in khugepaged to avoid long block
mm: pin address_space before dereferencing it while isolating an LRU page
mm/fadvise: discard partial page if endbyte is also EOF
openvswitch: Remove padding from packet before L3+ conntrack processing
IB/ipoib: Fix for potential no-carrier state
drm/nouveau/pmu/fuc: don't use movw directly anymore
netfilter: ipv6: nf_defrag: Kill frag queue on RFC2460 failure
x86/power: Fix swsusp_arch_resume prototype
firmware: dmi_scan: Fix handling of empty DMI strings
ACPI: processor_perflib: Do not send _PPC change notification if not ready
ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs
bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
MIPS: generic: Fix machine compatible matching
MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
xen-netfront: Fix race between device setup and open
xen/grant-table: Use put_page instead of free_page
RDS: IB: Fix null pointer issue
arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
proc: fix /proc/*/map_files lookup
cifs: silence compiler warnings showing up with gcc-8.0.0
bcache: properly set task state in bch_writeback_thread()
bcache: fix for allocator and register thread race
bcache: fix for data collapse after re-attaching an attached device
bcache: return attach error when no cache set exist
tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
bpf: fix rlimit in reuseport net selftest
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
locking/qspinlock: Ensure node->count is updated before initialising node
irqchip/gic-v3: Ignore disabled ITS nodes
cpumask: Make for_each_cpu_wrap() available on UP as well
irqchip/gic-v3: Change pr_debug message to pr_devel
ARC: Fix malformed ARC_EMUL_UNALIGNED default
ptr_ring: prevent integer overflow when calculating size
libata: Fix compile warning with ATA_DEBUG enabled
selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
selftests: memfd: add config fragment for fuse
ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
ARM: OMAP3: Fix prm wake interrupt for resume
ARM: OMAP1: clock: Fix debugfs_create_*() usage
ibmvnic: Free RX socket buffer in case of adapter error
iwlwifi: mvm: fix security bug in PN checking
iwlwifi: mvm: always init rs with 20mhz bandwidth rates
NFC: llcp: Limit size of SDP URI
rxrpc: Work around usercopy check
mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
mac80211: fix a possible leak of station stats
mac80211: fix calling sleeping function in atomic context
mac80211: Do not disconnect on invalid operating class
md raid10: fix NULL deference in handle_write_completed()
drm/exynos: g2d: use monotonic timestamps
drm/exynos: fix comparison to bitshift when dealing with a mask
locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
md: raid5: avoid string overflow warning
kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
s390/cio: fix ccw_device_start_timeout API
s390/cio: fix return code after missing interrupt
s390/cio: clear timer when terminating driver I/O
PKCS#7: fix direct verification of SignerInfo signature
ARM: OMAP: Fix dmtimer init for omap1
smsc75xx: fix smsc75xx_set_features()
regulatory: add NUL to request alpha2
integrity/security: fix digsig.c build error with header file
locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
mac80211: drop frames with unexpected DS bits from fast-rx to slow path
arm64: fix unwind_frame() for filtered out fn for function graph tracing
macvlan: fix use-after-free in macvlan_common_newlink()
kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
fs: dcache: Use READ_ONCE when accessing i_dir_seq
md: fix a potential deadlock of raid5/raid10 reshape
md/raid1: fix NULL pointer dereference
batman-adv: fix packet checksum in receive path
batman-adv: invalidate checksum on fragment reassembly
netfilter: ebtables: convert BUG_ONs to WARN_ONs
batman-adv: Ignore invalid batadv_iv_gw during netlink send
batman-adv: Ignore invalid batadv_v_gw during netlink send
batman-adv: Fix netlink dumping of BLA claims
batman-adv: Fix netlink dumping of BLA backbones
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
clocksource/drivers/fsl_ftm_timer: Fix error return checking
ceph: fix dentry leak when failing to init debugfs
ARM: orion5x: Revert commit
|
||
|
|
9c798bc19e |
ipc/shm: fix shmat() nil address after round-down when remapping
commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream. shmat()'s SHM_REMAP option forbids passing a nil address for; this is in fact the very first thing we check for. Andrea reported that for SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check, but we need to check again if the address was rounded down to nil. As of this patch, such cases will return -EINVAL. Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805 Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2ef44a3c1a |
Revert "ipc/shm: Fix shmat mmap nil-page protection"
commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.
Patch series "ipc/shm: shmat() fixes around nil-page".
These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.
The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.
I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour. Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).
[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
This patch (of 2):
Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED. However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.
For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].
[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
320d53a9d0 |
Merge 4.9.96 into android-4.9
Changes in 4.9.96 tty: make n_tty_read() always abort if hangup is in progress ubifs: Check ubifs_wbuf_sync() return code ubi: fastmap: Don't flush fastmap work on detach ubi: Fix error for write access ubi: Reject MLC NAND fs/reiserfs/journal.c: add missing resierfs_warning() arg resource: fix integer overflow at reallocation ipc/shm: fix use-after-free of shm file via remap_file_pages() mm, slab: reschedule cache_reap() on the same CPU usb: musb: gadget: misplaced out of bounds check usb: gadget: udc: core: update usb_ep_queue() documentation ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property ARM: dts: exynos: Fix IOMMU support for GScaler devices on Exynos5250 ARM: dts: at91: sama5d4: fix pinctrl compatible string spi: Fix scatterlist elements size in spi_map_buf xen-netfront: Fix hang on device removal regmap: Fix reversed bounds check in regmap_raw_write() ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status() USB: gadget: f_midi: fixing a possible double-free in f_midi USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw usb: dwc3: pci: Properly cleanup resource smb3: Fix root directory when server returns inode number of zero HID: i2c-hid: fix size check and type usage powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write() powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops HID: Fix hid_report_len usage HID: core: Fix size as type u32 ASoC: ssm2602: Replace reg_default_raw with reg_default thunderbolt: Resume control channel after hibernation image is created irqchip/gic: Take lock when updating irq type random: use a tighter cap in credit_entropy_bits_safe() jbd2: if the journal is aborted then don't allow update of the log tail ext4: don't update checksum of new initialized bitmaps ext4: protect i_disksize update by i_data_sem in direct write path ext4: fail ext4_iget for root directory if unallocated RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device RDMA/rxe: Fix an out-of-bounds read ALSA: pcm: Fix UAF at PCM release via PCM timer access IB/srp: Fix srp_abort() IB/srp: Fix completion vector assignment algorithm dmaengine: at_xdmac: fix rare residue corruption libnvdimm, namespace: use a safe lookup for dimm device name nfit, address-range-scrub: fix scrub in-progress reporting um: Compile with modern headers um: Use POSIX ucontext_t instead of struct ucontext iommu/vt-d: Fix a potential memory leak mmc: jz4740: Fix race condition in IRQ mask update clk: mvebu: armada-38x: add support for 1866MHz variants clk: mvebu: armada-38x: add support for missing clocks clk: fix false-positive Wmaybe-uninitialized warning clk: bcm2835: De-assert/assert PLL reset signal when appropriate pwm: rcar: Fix a condition to prevent mismatch value setting to duty thermal: imx: Fix race condition in imx_thermal_probe() dt-bindings: clock: mediatek: add binding for fixed-factor clock axisel_d4 watchdog: f71808e_wdt: Fix WD_EN register read vfio/pci: Virtualize Maximum Read Request Size ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation ALSA: pcm: Avoid potential races between OSS ioctls and read/write ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation ext4: don't allow r/w mounts if metadata blocks overlap the superblock drm/amdgpu: Add an ATPX quirk for hybrid laptop drm/amdgpu: Fix always_valid bos multiple LRU insertions. drm/amdgpu: Fix PCIe lane width calculation drm/rockchip: Clear all interrupts before requesting the IRQ drm/radeon: Fix PCIe lane width calculation ALSA: line6: Use correct endpoint type for midi output ALSA: rawmidi: Fix missing input substream checks in compat ioctls ALSA: hda - New VIA controller suppor no-snoop path random: fix crng_ready() test random: crng_reseed() should lock the crng instance that it is modifying random: add new ioctl RNDRESEEDCRNG HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device MIPS: uaccess: Add micromips clobbers to bzero invocation MIPS: memset.S: EVA & fault support for small_memset MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup MIPS: memset.S: Fix clobber of v1 in last_fixup powerpc/eeh: Fix enabling bridge MMIO windows powerpc/lib: Fix off-by-one in alternate feature patching udf: Fix leak of UTF-16 surrogates into encoded strings jffs2_kill_sb(): deal with failed allocations hypfs_kill_super(): deal with failed allocations orangefs_kill_sb(): deal with allocation failures rpc_pipefs: fix double-dput() Don't leak MNT_INTERNAL away from internal mounts autofs: mount point create should honour passed in mode mm/filemap.c: fix NULL pointer in page_cache_tree_insert() fanotify: fix logic of events on child writeback: safer lock nesting block/mq: fix potential deadlock during cpu hotplug Linux 4.9.96 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
570ef10de6 |
ipc/shm: fix use-after-free of shm file via remap_file_pages()
commit 3f05317d9889ab75c7190dcd39491d2a97921984 upstream. syzbot reported a use-after-free of shm_file_data(file)->file->f_op in shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately it couldn't generate a reproducer, but I found a bug which I think caused it. When remap_file_pages() is passed a full System V shared memory segment, the memory is first unmapped, then a new map is created using the ->vm_file. Between these steps, the shm ID can be removed and reused for a new shm segment. But, shm_mmap() only checks whether the ID is currently valid before calling the underlying file's ->mmap(); it doesn't check whether it was reused. Thus it can use the wrong underlying file, one that was already freed. Fix this by making the "outer" shm file (the one that gets put in ->vm_file) hold a reference to the real shm file, and by making __shm_open() require that the file associated with the shm ID matches the one associated with the "outer" file. Taking the reference to the real shm file is needed to fully solve the problem, since otherwise sfd->file could point to a freed file, which then could be reallocated for the reused shm ID, causing the wrong shm segment to be mapped (and without the required permission checks). Commit |
||
|
|
05baf14727 |
Merge tag 'v4.9.93' into android-4.9
This is the 4.9.93 stable release Change-Id: I4293d83f45982c6fd479bddbf9b0f811248ddc30 Signed-off-by: Greg Hackmann <ghackmann@google.com> |
||
|
|
86c8c892d3 |
ipc/shm.c: add split function to shm_vm_ops
commit 3d942ee079b917b24e2a0c5f18d35ac8ec9fee48 upstream. If System V shmget/shmat operations are used to create a hugetlbfs backed mapping, it is possible to munmap part of the mapping and split the underlying vma such that it is not huge page aligned. This will untimately result in the following BUG: kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009 REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002222 XER: 20040000 CFAR: c00000000036ee44 SOFTE: 1 NIP __unmap_hugepage_range+0xa4/0x760 LR __unmap_hugepage_range_final+0x28/0x50 Call Trace: 0x7115e4e00000 (unreliable) __unmap_hugepage_range_final+0x28/0x50 unmap_single_vma+0x11c/0x190 unmap_vmas+0x94/0x140 exit_mmap+0x9c/0x1d0 mmput+0xa8/0x1d0 do_exit+0x360/0xc80 do_group_exit+0x60/0x100 SyS_exit_group+0x24/0x30 system_call+0x58/0x6c ---[ end trace ee88f958a1c62605 ]--- This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct"). A split function was added to vm_operations_struct to determine if a mapping can be split. This was mostly for device-dax and hugetlbfs mappings which have specific alignment constraints. Mappings initiated via shmget/shmat have their original vm_ops overwritten with shm_vm_ops. shm_vm_ops functions will call back to the original vm_ops if needed. Add such a split function to shm_vm_ops. Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
71f1469722 |
Merge 4.9.79 into android-4.9
Changes in 4.9.79 x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels orangefs: use list_for_each_entry_safe in purge_waiting_ops orangefs: initialize op on loop restart in orangefs_devreq_read usbip: prevent vhci_hcd driver from leaking a socket pointer address usbip: Fix implicit fallthrough warning usbip: Fix potential format overflow in userspace tools can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2 Prevent timer value 0 for MWAITX drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled drivers: base: cacheinfo: fix boot error message when acpi is enabled mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack hwpoison, memcg: forcibly uncharge LRU pages cma: fix calculation of aligned offset mm, page_alloc: fix potential false positive in __zone_watermark_ok ipc: msg, make msgrcv work with LONG_MIN ACPI / scan: Prefer devices without _HID/_CID for _ADR matching ACPICA: Namespace: fix operand cache leak netfilter: nfnetlink_cthelper: Add missing permission checks netfilter: xt_osf: Add missing permission checks reiserfs: fix race in prealloc discard reiserfs: don't preallocate blocks for extended attributes fs/fcntl: f_setown, avoid undefined behaviour scsi: libiscsi: fix shifting of DID_REQUEUE host byte Revert "module: Add retpoline tag to VERMAGIC" mm: fix 100% CPU kswapd busyloop on unreclaimable nodes Input: trackpoint - force 3 buttons if 0 button is reported orangefs: fix deadlock; do not write i_size in read_iter um: link vmlinux with -no-pie vsyscall: Fix permissions for emulate mode with KAISER/PTI eventpoll.h: add missing epoll event masks dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL ipv6: fix udpv6 sendmsg crash caused by too small MTU ipv6: ip6_make_skb() needs to clear cork.base.dst lan78xx: Fix failure in USB Full Speed net: igmp: fix source address check for IGMPv3 reports net: qdisc_pkt_len_init() should be more robust net: tcp: close sock if net namespace is exiting pppoe: take ->needed_headroom of lower device into account on xmit r8169: fix memory corruption on retrieval of hardware statistics. sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf tipc: fix a memory leak in tipc_nl_node_get_link() vmxnet3: repair memory leak net: Allow neigh contructor functions ability to modify the primary_key ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY ppp: unlock all_ppp_mutex before registering device be2net: restore properly promisc mode after queues reconfiguration ip6_gre: init dev->mtu and dev->hard_header_len correctly gso: validate gso_type in GSO handlers mlxsw: spectrum_router: Don't log an error on missing neighbor tun: fix a memory leak for tfile->tx_array flow_dissector: properly cap thoff field perf/x86/amd/power: Do not load AMD power module on !AMD platforms x86/microcode/intel: Extend BDW late-loading further with LLC size check hrtimer: Reset hrtimer cpu base proper on CPU hotplug x86: bpf_jit: small optimization in emit_bpf_tail_call() bpf: fix bpf_tail_call() x64 JIT bpf: introduce BPF_JIT_ALWAYS_ON config bpf: arsh is not supported in 32 bit alu thus reject it bpf: avoid false sharing of map refcount with max_entries bpf: fix divides by zero bpf: fix 32-bit divide by zero bpf: reject stores into ctx via st and xadd nfsd: auth: Fix gid sorting when rootsquash enabled Linux 4.9.79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
542cde0e3c |
ipc: msg, make msgrcv work with LONG_MIN
commit 999898355e08ae3b92dfd0a08db706e0c6703d30 upstream.
When LONG_MIN is passed to msgrcv, one would expect to recieve any
message. But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is
undefined. In particular, with my gcc -LONG_MIN produces -LONG_MIN
again.
So handle this case properly by assigning LONG_MAX to *msgtyp if
LONG_MIN was specified as msgtyp to msgrcv.
This code:
long msg[] = { 100, 200 };
int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
msgsnd(m, &msg, sizeof(msg), 0);
msgrcv(m, &msg, sizeof(msg), LONG_MIN, 0);
produces currently nothing:
msgget(IPC_PRIVATE, IPC_CREAT|0644) = 65538
msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
msgrcv(65538, ...
Except a UBSAN warning:
UBSAN: Undefined behaviour in ipc/msg.c:745:13
negation of -9223372036854775808 cannot be represented in type 'long int':
With the patch, I see what I expect:
msgget(IPC_PRIVATE, IPC_CREAT|0644) = 0
msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16
Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
3f353c3ed4 |
Merge 4.9.38 into android-4.9
Changes in 4.9.38 mqueue: fix a use-after-free in sys_mq_notify() Add "shutdown" to "struct class". tpm: Issue a TPM2_Shutdown for TPM2 devices. tools include: Add a __fallthrough statement tools string: Use __fallthrough in perf_atoll() tools strfilter: Use __fallthrough perf top: Use __fallthrough perf thread_map: Correctly size buffer used with dirent->dt_name perf intel-pt: Use __fallthrough perf tests: Avoid possible truncation with dirent->d_name + snprintf perf bench numa: Avoid possible truncation when using snprintf() perf header: Fix handling of PERF_EVENT_UPDATE__SCALE perf scripting perl: Fix compile error with some perl5 versions perf probe: Fix to probe on gcc generated symbols for offline kernel perf probe: Add error checks to offline probe post-processing md: fix incorrect use of lexx_to_cpu in does_sb_need_changing md: fix super_offset endianness in super_1_rdev_size_change locking/rwsem-spinlock: Fix EINTR branch in __down_write_common() staging: vt6556: vnt_start Fix missing call to vnt_key_init_table. staging: comedi: fix clean-up of comedi_class in comedi_init() crypto: caam - fix gfp allocation flags (part I) crypto: rsa-pkcs1pad - use constant time memory comparison for MACs ext4: check return value of kstrtoull correctly in reserved_clusters_store x86/mm/pat: Don't report PAT on CPUs that don't support it saa7134: fix warm Medion 7134 EEPROM read Linux 4.9.38 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
e6952841ad |
mqueue: fix a use-after-free in sys_mq_notify()
commit f991af3daabaecff34684fd51fac80319d1baad1 upstream. The retry logic for netlink_attachskb() inside sys_mq_notify() is nasty and vulnerable: 1) The sock refcnt is already released when retry is needed 2) The fd is controllable by user-space because we already release the file refcnt so we when retry but the fd has been just closed by user-space during this small window, we end up calling netlink_detachskb() on the error path which releases the sock again, later when the user-space closes this socket a use-after-free could be triggered. Setting 'sock' to NULL here should be sufficient to fix it. Reported-by: GeneBlue <geneblue.mail@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
78b2dcc0df |
ipc/shm: Fix shmat mmap nil-page protection
am:
|
||
|
|
270e84a1e6 |
ipc/shm: Fix shmat mmap nil-page protection
commit 95e91b831f87ac8e1f8ed50c14d709089b4e01b8 upstream.
The issue is described here, with a nice testcase:
https://bugzilla.kernel.org/show_bug.cgi?id=192931
The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0. For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address. So by the time we do security_mmap_addr(0) things
get funky for shmat().
The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page. There are two possible fixes
to this. The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(... |MAP_FIXED). While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags. This makes the
behavior of shmat() identical to the mmap() case. The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.
Passes shm related ltp tests.
Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Gareth Evans <gareth.evans@contextis.co.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
bbcd0ffae5 |
ANDROID: vfs: Add permission2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to influence the permssions they return in permission2. It has been separated into a new call to avoid disrupting current permission users. Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca Signed-off-by: Daniel Rosenberg <drosen@google.com> |
||
|
|
8c8d4d4520 |
ipc: account for kmem usage on mqueue and msg
When kmem accounting switched from account by default to only account if flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out. The production use case at hand is that mqueues should be customizable via sysctls in Docker containers in a Kubernetes cluster. This can only be safely allowed to the users of the cluster (without the risk that they can cause resource shortage on a node, influencing other users' containers) if all resources they control are bounded, i.e. accounted for. Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Reported-by: Stefan Schimanski <sttts@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Stefan Schimanski <sttts@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
2a1613a586 |
ipc/sem.c: add cond_resched in exit_sme
In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in exit_sem. Apparently it's possible for the loop to take quite a long time and it doesn't have a scheduling point in it. Since the codes is executing under an rcu read section this may also cause rcu stalls, which in turn block synchronize_rcu operations, which more or less de-stabilises the whole system. Fix this by introducing a cond_resched() at the beginning of the loop. So this patch fixes the following: NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119] CPU: 10 PID: 18119 Comm: httpd Tainted: G O 4.4.20-clouder2 #6 Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015 task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000 RIP: 0010:[<ffffffff81614bc7>] [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30 RSP: 0018:ffff881c95553e40 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4 RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88 R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0 R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0 FS: 0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0 Call Trace: ? exit_sem+0x7c/0x280 do_exit+0x338/0xb40 do_group_exit+0x43/0xd0 SyS_exit_group+0x14/0x20 entry_SYSCALL_64_fastpath+0x16/0x6e Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.com Signed-off-by: Nikolay Borisov <kernel@kyup.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ed27f9122c |
ipc/msg: avoid waking sender upon full queue
Blocked tasks queued in q_senders waiting for their message to fit in the queue are blindly awoken every time we think there's a remote chance this might happen. This could cause numerous (and expensive -- thundering herd-ish) bogus wakeups if the queue is still really full. Adding to the scheduling cost/overhead, there's also the fact that we need to take the ipc object lock and requeue ourselves in the q_senders list. By keeping track of the blocked sender's message size, we can know previously if the wakeup ought to occur or not. Otherwise, to maintain the current wakeup order we just move it to the tail. This is exactly what occurs right now if the sender needs to go back to sleep. The case of EIDRM is left completely untouched, as we need to wakeup all the tasks, and shouldn't be playing games in the first place. This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context switches (avg out of ten runs). Although these tests are really about functionality (as opposed to performance), is does show the direct benefits of the optimization. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d0d6a2a95e |
ipc/msg: make ss_wakeup() kill arg boolean
... 'tis annoying. Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e3658538bf |
ipc/msg: batch queue sender wakeups
Currently the use of wake_qs in sysv msg queues are only for the receiver tasks that are blocked on the queue. But blocked sender tasks (due to queue size constraints) still are awoken with the ipc object lock held, which can be a problem particularly for small sized queues and far from gracious for -rt (just like it was for the receiver side). The paths that actually wakeup a sender are obviously related to when we are either getting rid of the queue or after (some) space is freed-up after a receiver takes the msg (msgrcv). Furthermore, with the exception of msgrcv, we can always piggy-back on expunge_all that has its own tasks lined-up for waking. Finally, upon unlinking the message, it should be no problem delaying the wakeups a bit until after we've released the lock. Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ee51636ca5 |
ipc/msg: implement lockless pipelined wakeups
This patch moves the wakeup_process() invocation so it is not done under the ipc global lock by making use of a lockless wake_q. With this change, the waiter is woken up once the message has been assigned and it does not need to loop on SMP if the message points to NULL. In the signal case we still need to check the pointer under the lock to verify the state. This change should also avoid the introduction of preempt_disable() in -RT which avoids a busy-loop which pools for the NULL -> !NULL change if the waiter has a higher priority compared to the waker. By making use of wake_qs, the logic of sysv msg queues is greatly simplified (and very well suited as we can batch lockless wakeups), particularly around the lockless receive algorithm. This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G": test | before | after | diff -----------------|------------|------------|---------- pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 % pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 % pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 % Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
5864a2fd30 |
ipc/sem.c: fix complex_count vs. simple op race
Commit |
||
|
|
101105b171 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning |
||
|
|
078cd8279e |
fs: Replace CURRENT_TIME with current_time() for inode timestamps
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_time() instead. CURRENT_TIME is also not y2038 safe. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Hence, it is necessary for all file system timestamps to use current_time(). Also, current_time() will be transitioned along with vfs to be y2038 safe. Note that whenever a single call to current_time() is used to change timestamps in different inodes, it is because they share the same time granularity. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
7872559664 |
Merge branch 'nsfs-ioctls' into HEAD
From: Andrey Vagin <avagin@openvz.org> Each namespace has an owning user namespace and now there is not way to discover these relationships. Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships too. Why we may want to know relationships between namespaces? One use would be visualization, in order to understand the running system. Another would be to answer the question: what capability does process X have to perform operations on a resource governed by namespace Y? One more use-case (which usually called abnormal) is checkpoint/restart. In CRIU we are going to dump and restore nested namespaces. There [1] was a discussion about which interface to choose to determing relationships between namespaces. Eric suggested to add two ioctl-s [2]: > Grumble, Grumble. I think this may actually a case for creating ioctls > for these two cases. Now that random nsfs file descriptors are bind > mountable the original reason for using proc files is not as pressing. > > One ioctl for the user namespace that owns a file descriptor. > One ioctl for the parent namespace of a namespace file descriptor. Here is an implementaions of these ioctl-s. $ man man7/namespaces.7 ... Since Linux 4.X, the following ioctl(2) calls are supported for namespace file descriptors. The correct syntax is: fd = ioctl(ns_fd, ioctl_type); where ioctl_type is one of the following: NS_GET_USERNS Returns a file descriptor that refers to an owning user names‐ pace. NS_GET_PARENT Returns a file descriptor that refers to a parent namespace. This ioctl(2) can be used for pid and user namespaces. For user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same meaning. In addition to generic ioctl(2) errors, the following specific ones can occur: EINVAL NS_GET_PARENT was called for a nonhierarchical namespace. EPERM The requested namespace is outside of the current namespace scope. [1] https://lkml.org/lkml/2016/7/6/158 [2] https://lkml.org/lkml/2016/7/9/101 Changes for v2: * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing outside of the init namespace, so we can return EPERM in this case too. > The fewer special cases the easier the code is to get > correct, and the easier it is to read. // Eric Changes for v3: * rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: "W. Trevor King" <wking@tremily.us> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@canonical.com> |
||
|
|
bcac25a58b |
kernel: add a helper to get an owning user namespace for a namespace
Return -EPERM if an owning user namespace is outside of a process
current user namespace.
v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
This special cases was removed from this version. There is nothing
outside of init_user_ns, so we can return EPERM.
v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
||
|
|
df75e7748b |
userns: When the per user per user namespace limit is reached return ENOSPC
The current error codes returned when a the per user per user namespace limit are hit (EINVAL, EUSERS, and ENFILE) are wrong. I asked for advice on linux-api and it we made clear that those were the wrong error code, but a correct effor code was not suggested. The best general error code I have found for hitting a resource limit is ENOSPC. It is not perfect but as it is unambiguous it will serve until someone comes up with a better error code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
|
|
aba3566163 |
ipcns: Add a limit on the number of ipc namespaces
Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
|
|
3bd080e4d8 |
ipc: delete "nr_ipc_ns"
Write-only variable. Link: http://lkml.kernel.org/r/20160708214356.GA6785@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9b24fef9f0 |
sysv, ipc: fix security-layer leaking
Commit |
||
|
|
a867d7349e |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns vfs updates from Eric Biederman:
"This tree contains some very long awaited work on generalizing the
user namespace support for mounting filesystems to include filesystems
with a backing store. The real world target is fuse but the goal is
to update the vfs to allow any filesystem to be supported. This
patchset is based on a lot of code review and testing to approach that
goal.
While looking at what is needed to support the fuse filesystem it
became clear that there were things like xattrs for security modules
that needed special treatment. That the resolution of those concerns
would not be fuse specific. That sorting out these general issues
made most sense at the generic level, where the right people could be
drawn into the conversation, and the issues could be solved for
everyone.
At a high level what this patchset does a couple of simple things:
- Add a user namespace owner (s_user_ns) to struct super_block.
- Teach the vfs to handle filesystem uids and gids not mapping into
to kuids and kgids and being reported as INVALID_UID and
INVALID_GID in vfs data structures.
By assigning a user namespace owner filesystems that are mounted with
only user namespace privilege can be detected. This allows security
modules and the like to know which mounts may not be trusted. This
also allows the set of uids and gids that are communicated to the
filesystem to be capped at the set of kuids and kgids that are in the
owning user namespace of the filesystem.
One of the crazier corner casees this handles is the case of inodes
whose i_uid or i_gid are not mapped into the vfs. Most of the code
simply doesn't care but it is easy to confuse the inode writeback path
so no operation that could cause an inode write-back is permitted for
such inodes (aka only reads are allowed).
This set of changes starts out by cleaning up the code paths involved
in user namespace permirted mounts. Then when things are clean enough
adds code that cleanly sets s_user_ns. Then additional restrictions
are added that are possible now that the filesystem superblock
contains owner information.
These changes should not affect anyone in practice, but there are some
parts of these restrictions that are changes in behavior.
- Andy's restriction on suid executables that does not honor the
suid bit when the path is from another mount namespace (think
/proc/[pid]/fd/) or when the filesystem was mounted by a less
privileged user.
- The replacement of the user namespace implicit setting of MNT_NODEV
with implicitly setting SB_I_NODEV on the filesystem superblock
instead.
Using SB_I_NODEV is a stronger form that happens to make this state
user invisible. The user visibility can be managed but it caused
problems when it was introduced from applications reasonably
expecting mount flags to be what they were set to.
There is a little bit of work remaining before it is safe to support
mounting filesystems with backing store in user namespaces, beyond
what is in this set of changes.
- Verifying the mounter has permission to read/write the block device
during mount.
- Teaching the integrity modules IMA and EVM to handle filesystems
mounted with only user namespace root and to reduce trust in their
security xattrs accordingly.
- Capturing the mounters credentials and using that for permission
checks in d_automount and the like. (Given that overlayfs already
does this, and we need the work in d_automount it make sense to
generalize this case).
Furthermore there are a few changes that are on the wishlist:
- Get all filesystems supporting posix acls using the generic posix
acls so that posix_acl_fix_xattr_from_user and
posix_acl_fix_xattr_to_user may be removed. [Maintainability]
- Reducing the permission checks in places such as remount to allow
the superblock owner to perform them.
- Allowing the superblock owner to chown files with unmapped uids and
gids to something that is mapped so the files may be treated
normally.
I am not considering even obvious relaxations of permission checks
until it is clear there are no more corner cases that need to be
locked down and handled generically.
Many thanks to Seth Forshee who kept this code alive, and putting up
with me rewriting substantial portions of what he did to handle more
corner cases, and for his diligent testing and reviewing of my
changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
fs: Call d_automount with the filesystems creds
fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
evm: Translate user/group ids relative to s_user_ns when computing HMAC
dquot: For now explicitly don't support filesystems outside of init_user_ns
quota: Handle quota data stored in s_user_ns in quota_setxquota
quota: Ensure qids map to the filesystem
vfs: Don't create inodes with a uid or gid unknown to the vfs
vfs: Don't modify inodes with a uid or gid unknown to the vfs
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Check for invalid i_uid in may_follow_link()
vfs: Verify acls are valid within superblock's s_user_ns.
userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
fs: Refuse uid/gid changes which don't map into s_user_ns
selinux: Add support for unprivileged mounts from user namespaces
Smack: Handle labels consistently in untrusted mounts
Smack: Add support for unprivileged mounts from user namespaces
fs: Treat foreign mounts as nosuid
fs: Limit file caps to the user namespace of the super block
userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
userns: Remove implicit MNT_NODEV fragility.
...
|
||
|
|
4595ef88d1 |
shmem: make shmem_inode_info::lock irq-safe
We are going to need to call shmem_charge() under tree_lock to get accoutning right on collapse of small tmpfs pages into a huge one. The problem is that tree_lock is irq-safe and lockdep is not happy, that we take irq-unsafe lock under irq-safe[1]. Let's convert the lock to irq-safe. [1] https://gist.github.com/kiryl/80c0149e03ed35dfaf26628b8e03cdbc Link: http://lkml.kernel.org/r/1466021202-61880-34-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c01d5b3007 |
shmem: get_unmapped_area align huge page
Provide a shmem_get_unmapped_area method in file_operations, called at mmap time to decide the mapping address. It could be conditional on CONFIG_TRANSPARENT_HUGEPAGE, but save #ifdefs in other places by making it unconditional. shmem_get_unmapped_area() first calls the usual mm->get_unmapped_area (which we treat as a black box, highly dependent on architecture and config and executable layout). Lots of conditions, and in most cases it just goes with the address that chose; but when our huge stars are rightly aligned, yet that did not provide a suitable address, go back to ask for a larger arena, within which to align the mapping suitably. There have to be some direct calls to shmem_get_unmapped_area(), not via the file_operations: because of the way shmem_zero_setup() is called to create a shmem object late in the mmap sequence, when MAP_SHARED is requested with MAP_ANONYMOUS or /dev/zero. Though this only matters when /proc/sys/vm/shmem_huge has been set. Link: http://lkml.kernel.org/r/1466021202-61880-29-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a2982cc922 |
vfs: Generalize filesystem nodev handling.
Introduce a function may_open_dev that tests MNT_NODEV and a new superblock flab SB_I_NODEV. Use this new function in all of the places where MNT_NODEV was previously tested. Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those filesystems should never support device nodes, and a simple superblock flags makes that very hard to get wrong. With SB_I_NODEV set if any device nodes somehow manage to show up on on a filesystem those device nodes will be unopenable. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
|
|
3ee690143c |
ipc/mqueue: The mqueue filesystem should never contain executables
Set SB_I_NOEXEC on mqueuefs to ensure small implementation mistakes do not result in executable on mqueuefs by accident. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
|
|
d91ee87d8d |
vfs: Pass data, ns, and ns->userns to mount_ns
Today what is normally called data (the mount options) is not passed to fill_super through mount_ns. Pass the mount options and the namespace separately to mount_ns so that filesystems such as proc that have mount options, can use mount_ns. Pass the user namespace to mount_ns so that the standard permission check that verifies the mounter has permissions over the namespace can be performed in mount_ns instead of in each filesystems .mount method. Thus removing the duplication between mqueuefs and proc in terms of permission checks. The extra permission check does not currently affect the rpc_pipefs filesystem and the nfsd filesystem as those filesystems do not currently allow unprivileged mounts. Without unpvileged mounts it is guaranteed that the caller has already passed capable(CAP_SYS_ADMIN) which guarantees extra permission check will pass. Update rpc_pipefs and the nfsd filesystem to ensure that the network namespace reference is always taken in fill_super and always put in kill_sb so that the logic is simpler and so that errors originating inside of fill_super do not cause a network namespace leak. Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
|
|
b236017acf |
ipc: Initialize ipc_namespace->user_ns early.
Allow the ipc namespace initialization code to depend on ns->user_ns being set during initialization. In particular this allows mq_init_ns to use ns->user_ns for permission checks and initializating s_user_ns while the the mq filesystem is being mounted. Acked-by: Seth Forshee <seth.forshee@canonical.com> Suggested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
|
|
be3e784498 |
locking/spinlock: Update spin_unlock_wait() users
With the modified semantics of spin_unlock_wait() a number of explicit barriers can be removed. Also update the comment for the do_exit() usecase, as that was somewhat stale/obscure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |