bka
861 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
17e8f8bca2 |
UPSTREAM: close_range(): fix the logics in descriptor table trimming
commit 678379e1d4f7443b170939525d3312cfc37bf86b upstream. Cloning a descriptor table picks the size that would cover all currently opened files. That's fine for clone() and unshare(), but for close_range() there's an additional twist - we clone before we close, and it would be a shame to have close_range(3, ~0U, CLOSE_RANGE_UNSHARE) leave us with a huge descriptor table when we are not going to keep anything past stderr, just because some large file descriptor used to be open before our call has taken it out. Unfortunately, it had been dealt with in an inherently racy way - sane_fdtable_size() gets a "don't copy anything past that" argument (passed via unshare_fd() and dup_fd()), close_range() decides how much should be trimmed and passes that to unshare_fd(). The problem is, a range that used to extend to the end of descriptor table back when close_range() had looked at it might very well have stuff grown after it by the time dup_fd() has allocated a new files_struct and started to figure out the capacity of fdtable to be attached to that. That leads to interesting pathological cases; at the very least it's a QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes a snapshot of descriptor table one might have observed at some point. Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination of unshare(CLONE_FILES) with plain close_range(), ending up with a weird state that would never occur with unshare(2) is confusing, to put it mildly. It's not hard to get rid of - all it takes is passing both ends of the range down to sane_fdtable_size(). There we are under ->files_lock, so the race is trivially avoided. So we do the following: * switch close_files() from calling unshare_fd() to calling dup_fd(). * undo the calling convention change done to unshare_fd() in 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" * introduce struct fd_range, pass a pointer to that to dup_fd() and sane_fdtable_size() instead of "trim everything past that point" they are currently getting. NULL means "we are not going to be punching any holes"; NR_OPEN_MAX is gone. * make sane_fdtable_size() use find_last_bit() instead of open-coding it; it's easier to follow that way. * while we are at it, have dup_fd() report errors by returning ERR_PTR(), no need to use a separate int *errorp argument. Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" Cc: stable@vger.kernel.org Change-Id: I6782a2edf98970b6c2d662048061e28f7e57b9c9 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com> |
||
|
|
8276673b59 |
BACKPORT: close_range: add CLOSE_RANGE_UNSHARE
One of the use-cases of close_range() is to drop file descriptors just before
execve(). This would usually be expressed in the sequence:
unshare(CLONE_FILES);
close_range(3, ~0U);
as pointed out by Linus it might be desirable to have this be a part of
close_range() itself under a new flag CLOSE_RANGE_UNSHARE.
This expands {dup,unshare)_fd() to take a max_fds argument that indicates the
maximum number of file descriptors to copy from the old struct files. When the
user requests that all file descriptors are supposed to be closed via
close_range(min, max) then we can cap via unshare_fd(min) and hence don't need
to do any of the heavy fput() work for everything above min.
The patch makes it so that if CLOSE_RANGE_UNSHARE is requested and we do in
fact currently share our file descriptor table we create a new private copy.
We then close all fds in the requested range and finally after we're done we
install the new fd table.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I0813045886501e40a45693ee1edad50bdf2b66e5
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
|
||
|
|
c3bb003b5f |
simple_lmk: Report mm as freed as soon as exit_mmap() finishes
exit_mmap() is responsible for freeing the vast majority of an mm's memory; in order to unblock Simple LMK faster, report an mm as freed as soon as exit_mmap() finishes. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> |
||
|
|
1f5833f9ec |
simple_lmk: Introduce Simple Low Memory Killer for Android
This is a complete low memory killer solution for Android that is small and simple. Processes are killed according to the priorities that Android gives them, so that the least important processes are always killed first. Processes are killed until memory deficits are satisfied, as observed from kswapd struggling to free up pages. Simple LMK stops killing processes when kswapd finally goes back to sleep. The only tunables are the desired amount of memory to be freed per reclaim event and desired frequency of reclaim events. Simple LMK tries to free at least the desired amount of memory per reclaim and waits until all of its victims' memory is freed before proceeding to kill more processes. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> |
||
|
|
db4cea55c5 |
BACKPORT: timekeeping: Use proper clock specifier names in functions
This makes boot uniformly boottime and tai uniformly clocktai, to address the remaining oversights. Change-Id: I3463b9045bddeba00d6f9fcf78d63008459c1b9a Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190621203249.3909-2-Jason@zx2c4.com Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com> |
||
|
|
b355187a03 |
BACKPORT: mm: consolidate page table accounting
Currently, we account page tables separately for each page table level, but that's redundant -- we only make use of total memory allocated to page tables for oom_badness calculation. We also provide the information to userspace, but it has dubious value there too. This patch switches page table accounting to single counter. mm->pgtables_bytes is now used to account all page table levels. We use bytes, because page table size for different levels of page table tree may be different. The change has user-visible effect: we don't have VmPMD and VmPUD reported in /proc/[pid]/status. Not sure if anybody uses them. (As alternative, we can always report 0 kB for them.) OOM-killer report is also slightly changed: we now report pgtables_bytes instead of nr_ptes, nr_pmd, nr_puds. Apart from reducing number of counters per-mm, the benefit is that we now calculate oom_badness() more correctly for machines which have different size of page tables depending on level or where page tables are less than a page in size. The only downside can be debuggability because we do not know which page table level could leak. But I do not remember many bugs that would be caught by separate counters so I wouldn't lose sleep over this. [akpm@linux-foundation.org: fix mm/huge_memory.c] Link: http://lkml.kernel.org/r/20171006100651.44742-2-kirill.shutemov@linux.intel.com Change-Id: Iaede03b89e74d4b23f90cfb72e2fe9b4cc8c3b79 Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> [kirill.shutemov@linux.intel.com: fix build] Link: http://lkml.kernel.org/r/20171016150113.ikfxy3e7zzfvsr4w@black.fi.intel.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com> |
||
|
|
a9dc0cb0f5 |
BACKPORT: mm: introduce wrappers to access mm->nr_ptes
Let's add wrappers for ->nr_ptes with the same interface as for nr_pmd and nr_pud. The patch also makes nr_ptes accounting dependent onto CONFIG_MMU. Page table accounting doesn't make sense if you don't have page tables. It's preparation for consolidation of page-table counters in mm_struct. Link: http://lkml.kernel.org/r/20171006100651.44742-1-kirill.shutemov@linux.intel.com Change-Id: I50ec0423b481e72aca4ea2c7c244b585b064ea53 Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com> |
||
|
|
15faf3fbdf |
BACKPORT: mm: account pud page tables
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup. The trick is to allocate a lot of PUD page tables. We don't
account PUD page tables, only PMD and PTE.
We already addressed the same issue for PMD page tables, see commit
|
||
|
|
4cfa8da2d6 |
BACKPORT: cgroup: cgroup v2 freezer
Cgroup v1 implements the freezer controller, which provides an ability
to stop the workload in a cgroup and temporarily free up some
resources (cpu, io, network bandwidth and, potentially, memory)
for some other tasks. Cgroup v2 lacks this functionality.
This patch implements freezer for cgroup v2.
Cgroup v2 freezer tries to put tasks into a state similar to jobctl
stop. This means that tasks can be killed, ptraced (using
PTRACE_SEIZE*), and interrupted. It is possible to attach to
a frozen task, get some information (e.g. read registers) and detach.
It's also possible to migrate a frozen tasks to another cgroup.
This differs cgroup v2 freezer from cgroup v1 freezer, which mostly
tried to imitate the system-wide freezer. However uninterruptible
sleep is fine when all tasks are going to be frozen (hibernation case),
it's not the acceptable state for some subset of the system.
Cgroup v2 freezer is not supporting freezing kthreads.
If a non-root cgroup contains kthread, the cgroup still can be frozen,
but the kthread will remain running, the cgroup will be shown
as non-frozen, and the notification will not be delivered.
* PTRACE_ATTACH is not working because non-fatal signal delivery
is blocked in frozen state.
There are some interface differences between cgroup v1 and cgroup v2
freezer too, which are required to conform the cgroup v2 interface
design principles:
1) There is no separate controller, which has to be turned on:
the functionality is always available and is represented by
cgroup.freeze and cgroup.events cgroup control files.
2) The desired state is defined by the cgroup.freeze control file.
Any hierarchical configuration is allowed.
3) The interface is asynchronous. The actual state is available
using cgroup.events control file ("frozen" field). There are no
dedicated transitional states.
4) It's allowed to make any changes with the cgroup hierarchy
(create new cgroups, remove old cgroups, move tasks between cgroups)
no matter if some cgroups are frozen.
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
No-objection-from-me-by: Oleg Nesterov <oleg@redhat.com>
Cc: kernel-team@fb.com
Change-Id: I3404119678cbcd7410aa56e9334055cee79d02fa
(cherry picked from commit 76f969e8948d82e78e1bc4beb6b9465908e74873)
Bug: 154548692
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
|
||
|
|
2a336c668c |
Merge branch 'android-4.14-stable' of https://android.googlesource.com/kernel/common into lineage-19.0
Signed-off-by: SamarV-121 <samarvispute121@pm.me> Change-Id: I26530a14b3cf6acfacd310d46d1e45b03ecaeff9 |
||
|
|
597842aef6 |
kernel: Import oplus changes
Change-Id: Ia97972540645d4b40cffaafd92189546a7be2805 Signed-off-by: Pranaya Deomani <pranayadeomani@protonmail.com> |
||
|
|
1dff798c56 |
Merge 4.14.247 into android-4.14-stable
Changes in 4.14.247 ext4: fix race writing to an inline_data file while its xattrs are changing xtensa: fix kconfig unmet dependency warning for HAVE_FUTEX_CMPXCHG qed: Fix the VF msix vectors flow net: macb: Add a NULL check on desc_ptp qede: Fix memset corruption perf/x86/intel/pt: Fix mask of num_address_ranges perf/x86/amd/ibs: Work around erratum #1197 cryptoloop: add a deprecation warning ARM: 8918/2: only build return_address() if needed ALSA: pcm: fix divide error in snd_pcm_lib_ioctl clk: fix build warning for orphan_list media: stkwebcam: fix memory leak in stk_camera_probe igmp: Add ip_mc_list lock in ip_check_mc_rcu USB: serial: mos7720: improve OOM-handling in read_mos_reg() f2fs: fix potential overflow ath10k: fix recent bandwidth conversion bug ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) s390/disassembler: correct disassembly lines alignment mm/kmemleak.c: make cond_resched() rate-limiting more efficient crypto: talitos - reduce max key size for SEC1 powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Delete unneeded .globl _zimage_start net: ll_temac: Remove left-over debug message mm/page_alloc: speed up the iteration of max_order Revert "btrfs: compression: don't try to compress if we don't have enough pages" usb: host: xhci-rcar: Don't reload firmware after the completion x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions PCI: Call Max Payload Size-related fixup quirks early regmap: fix the offset of register error log crypto: mxs-dcp - Check for DMA mapping errors power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop() udf: Check LVID earlier isofs: joliet: Fix iocharset=utf8 mount option nvme-rdma: don't update queue count when failing to set io queues power: supply: max17042_battery: fix typo in MAx17042_TOFF s390/cio: add dev_busid sysfs entry for each subchannel libata: fix ata_host_start() crypto: qat - do not ignore errors from enable_vf2pf_comms() crypto: qat - handle both source of interrupt in VF ISR crypto: qat - fix reuse of completion variable crypto: qat - fix naming for init/shutdown VF to PF notifications crypto: qat - do not export adf_iov_putmsg() udf_get_extendedattr() had no boundary checks. m68k: emu: Fix invalid free in nfeth_cleanup() spi: spi-fsl-dspi: Fix issue with uninitialized dma_slave_config spi: spi-pic32: Fix issue with uninitialized dma_slave_config clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel crypto: qat - use proper type for vf_mask certs: Trigger creation of RSA module signing key if it's not an RSA key soc: rockchip: ROCKCHIP_GRF should not default to y, unconditionally media: dvb-usb: fix uninit-value in dvb_usb_adapter_dvb_init media: dvb-usb: fix uninit-value in vp702x_read_mac_addr media: go7007: remove redundant initialization Bluetooth: sco: prevent information leak in sco_conn_defer_accept() tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos net: cipso: fix warnings in netlbl_cipsov4_add_std i2c: highlander: add IRQ check media: em28xx-input: fix refcount bug in em28xx_usb_disconnect PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently PCI: PM: Enable PME if it can be signaled from D3cold soc: qcom: smsm: Fix missed interrupts if state changes while masked Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflow arm64: dts: exynos: correct GIC CPU interfaces address range on Exynos7 Bluetooth: fix repeated calls to sco_sock_kill drm/msm/dsi: Fix some reference counted resource leaks usb: gadget: udc: at91: add IRQ check usb: phy: fsl-usb: add IRQ check usb: phy: twl6030: add IRQ checks Bluetooth: Move shutdown callback before flushing tx and rx queue usb: host: ohci-tmio: add IRQ check usb: phy: tahvo: add IRQ check mac80211: Fix insufficient headroom issue for AMSDU usb: gadget: mv_u3d: request_irq() after initializing UDC Bluetooth: add timeout sanity check to hci_inquiry i2c: iop3xx: fix deferred probing i2c: s3c2410: fix IRQ check mmc: dw_mmc: Fix issue with uninitialized dma_slave_config mmc: moxart: Fix issue with uninitialized dma_slave_config CIFS: Fix a potencially linear read overflow i2c: mt65xx: fix IRQ check usb: ehci-orion: Handle errors of clk_prepare_enable() in probe usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available tty: serial: fsl_lpuart: fix the wrong mapbase value ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point() bcma: Fix memory leak for internally-handled cores ipv4: make exception cache less predictible net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed net: qualcomm: fix QCA7000 checksum handling netns: protect netns ID lookups with RCU tty: Fix data race between tiocsti() and flush_to_ldisc() x86/resctrl: Fix a maybe-uninitialized build warning treated as error KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted IMA: remove -Wmissing-prototypes warning backlight: pwm_bl: Improve bootloader/kernel device handover clk: kirkwood: Fix a clocking boot regression fbmem: don't allow too huge resolutions rtc: tps65910: Correct driver module alias blk-zoned: allow zone management send operations without CAP_SYS_ADMIN blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN PCI/MSI: Skip masking MSI-X on Xen PV powerpc/perf/hv-gpci: Fix counter value parsing xen: fix setting of max_pfn in shared_info include/linux/list.h: add a macro to test if entry is pointing to the head 9p/xen: Fix end of loop tests for list_for_each_entry soc: aspeed: lpc-ctrl: Fix boundary check for mmap crypto: public_key: fix overflow during implicit conversion block: bfq: fix bfq_set_next_ioprio_data() power: supply: max17042: handle fails of reading status register dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc() VMCI: fix NULL pointer dereference when unmapping queue pair media: uvc: don't do DMA on stack media: rc-loopback: return number of emitters rather than error libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs ARM: 9105/1: atags_to_fdt: don't warn about stack size PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure PCI: xilinx-nwl: Enable the clock through CCF PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response PCI: aardvark: Fix masking and unmasking legacy INTx interrupts HID: input: do not report stylus battery state as "full" RDMA/iwcm: Release resources if iw_cm module initialization fails docs: Fix infiniband uverbs minor number pinctrl: samsung: Fix pinctrl bank pin count vfio: Use config not menuconfig for VFIO_NOIOMMU openrisc: don't printk() unconditionally pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry() scsi: qedi: Fix error codes in qedi_alloc_global_queues() MIPS: Malta: fix alignment of the devicetree buffer media: dib8000: rewrite the init prbs logic crypto: mxs-dcp - Use sg_mapping_iter to copy data PCI: Use pci_update_current_state() in pci_enable_device_flags() iio: dac: ad5624r: Fix incorrect handling of an optional regulator. ARM: dts: qcom: apq8064: correct clock names video: fbdev: kyro: fix a DoS bug by restricting user input netlink: Deal with ESRCH error in nlmsg_notify() Smack: Fix wrong semantics in smk_access_entry() usb: host: fotg210: fix the endpoint's transactional opportunities calculation usb: host: fotg210: fix the actual_length of an iso packet usb: gadget: u_ether: fix a potential null pointer dereference usb: gadget: composite: Allow bMaxPower=0 if self-powered staging: board: Fix uninitialized spinlock when attaching genpd tty: serial: jsm: hold port lock when reporting modem line changes bpf/tests: Fix copy-and-paste error in double word test bpf/tests: Do not PASS tests without actually testing the result video: fbdev: asiliantfb: Error out if 'pixclock' equals zero video: fbdev: kyro: Error out if 'pixclock' equals zero video: fbdev: riva: Error out if 'pixclock' equals zero ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs() flow_dissector: Fix out-of-bounds warnings s390/jump_label: print real address in a case of a jump label bug serial: 8250: Define RX trigger levels for OxSemi 950 devices xtensa: ISS: don't panic in rs_init hvsi: don't panic on tty_register_driver failure serial: 8250_pci: make setup_port() parameters explicitly unsigned staging: ks7010: Fix the initialization of the 'sleep_status' structure ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init() Bluetooth: skip invalid hci_sync_conn_complete_evt ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output media: v4l2-dv-timings.c: fix wrong condition in two for-loops arm64: dts: qcom: sdm660: use reg value for memory node net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe() Bluetooth: avoid circular locks in sco_sock_connect gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port() ARM: tegra: tamonten: Fix UART pad setting rpc: fix gss_svc_init cleanup on failure staging: rts5208: Fix get_ms_information() heap buffer size gfs2: Don't call dlm after protocol is unmounted mmc: sdhci-of-arasan: Check return value of non-void funtions mmc: rtsx_pci: Fix long reads when clock is prescaled selftests/bpf: Enlarge select() timeout for test_maps cifs: fix wrong release in sess_alloc_buffer() failed path Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set" usb: musb: musb_dsps: request_irq() after initializing musb usbip: give back URBs for unsent unlink requests during cleanup usbip:vhci_hcd USB port can get stuck in the disabled state ASoC: rockchip: i2s: Fix regmap_ops hang ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B parport: remove non-zero check on count ath9k: fix OOB read ar9300_eeprom_restore_internal ath9k: fix sleeping in atomic context net: fix NULL pointer reference in cipso_v4_doi_free net: w5100: check return value after calling platform_get_resource() parisc: fix crash with signals and alloca scsi: BusLogic: Fix missing pr_cont() use scsi: qla2xxx: Sync queue idx with queue_pair_map idx cpufreq: powernv: Fix init_chip_info initialization in numa=off mm/hugetlb: initialize hugetlb_usage in mm_init memcg: enable accounting for pids in nested pid namespaces platform/chrome: cros_ec_proto: Send command again when timeout occurs xen: reset legacy rtc flag for PV domU bnx2x: Fix enabling network interfaces without VFs PM: base: power: don't try to use non-existing RTC for storing data x86/mm: Fix kern_addr_valid() to cope with existing but not present entries net-caif: avoid user-triggerable WARN_ON(1) ptp: dp83640: don't define PAGE0 dccp: don't duplicate ccid when cloning dccp sock net/l2tp: Fix reference count leak in l2tp_udp_recv_core r6040: Restore MDIO clock frequency after MAC reset tipc: increase timeout in tipc_sk_enqueue() events: Reuse value read using READ_ONCE instead of re-reading it net/af_unix: fix a data-race in unix_dgram_poll tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation mfd: Don't use irq_create_mapping() to resolve a mapping PCI: Add ACS quirks for Cavium multi-function devices net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920 ethtool: Fix an error code in cxgb2.c PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()' ARC: export clear_user_page() for modules net: dsa: b53: Fix calculating number of switch ports netfilter: socket: icmp6: fix use-after-scope qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom net: renesas: sh_eth: Fix freeing wrong tx descriptor s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant Linux 4.14.247 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: If4d48fb4bfd873036c9584406e8cf4ebbdb8a655 |
||
|
|
4fef2787d5 |
mm/hugetlb: initialize hugetlb_usage in mm_init
commit 13db8c50477d83ad3e3b9b0ae247e5cd833a7ae4 upstream.
After fork, the child process will get incorrect (2x) hugetlb_usage. If
a process uses 5 2MB hugetlb pages in an anonymous mapping,
HugetlbPages: 10240 kB
and then forks, the child will show,
HugetlbPages: 20480 kB
The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent
to child. Child will have 2x actual usage.
Fix this by adding hugetlb_count_init in mm_init.
Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com
Fixes:
|
||
|
|
155b2a3170 |
Merge 4.14.205 into android-4.14-stable
Changes in 4.14.205 drm/i915: Break up error capture compression loops with cond_resched() xen/events: don't use chip_data for legacy IRQs tipc: fix use-after-free in tipc_bcast_get_mode gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP gianfar: Account for Tx PTP timestamp in the skb headroom net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms sfp: Fix error handing in sfp_probe() Blktrace: bail out early if block debugfs is not configured blktrace: fix debugfs use after free i40e: Fix a potential NULL pointer dereference i40e: add num_vectors checker in iwarp handler i40e: Wrong truncation from u16 to u8 i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c i40e: Memory leak in i40e_config_iwarp_qvlist Fonts: Replace discarded const qualifier ALSA: usb-audio: Add implicit feedback quirk for Qu-16 lib/crc32test: remove extra local_irq_disable/enable kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled mm: always have io_remap_pfn_range() set pgprot_decrypted() gfs2: Wake up when sd_glock_disposal becomes zero ftrace: Fix recursion check for NMI test ftrace: Handle tracing when switching between context tracing: Fix out of bounds write in get_trace_buf futex: Handle transient "ownerless" rtmutex state correctly ARM: dts: sun4i-a10: fix cpu_alert temperature x86/kexec: Use up-to-dated screen_info copy to fill boot params of: Fix reserved-memory overlap detection blk-cgroup: Fix memleak on error path blk-cgroup: Pre-allocate tree node on blkg_conf_prep scsi: core: Don't start concurrent async scan on same host vsock: use ns_capable_noaudit() on socket create drm/vc4: drv: Add error handding for bind ACPI: NFIT: Fix comparison to '-ENXIO' vt: Disable KD_FONT_OP_COPY fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent serial: 8250_mtk: Fix uart_get_baud_rate warning serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init USB: serial: cyberjack: fix write-URB completion race USB: serial: option: add Quectel EC200T module support USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231 USB: serial: option: add Telit FN980 composition 0x1055 USB: Add NO_LPM quirk for Kingston flash drive usb: mtu3: fix panic in mtu3_gadget_stop() ARC: stack unwinding: avoid indefinite looping Revert "ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE" PM: runtime: Resume the device earlier in __device_release_driver() arm64: dts: marvell: espressobin: add ethernet alias Linux 4.14.205 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I57cdf9a75fc420bc9013c1a8e7228d2e52d44743 |
||
|
|
ee55b8c6bf |
fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent
commit b4e00444cab4c3f3fec876dc0cccc8cbb0d1a948 upstream. current->group_leader->exit_signal may change during copy_process() if current->real_parent exits. Move the assignment inside tasklist_lock to avoid the race. Signed-off-by: Eddy Wu <eddy_wu@trendmicro.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
9ddf99d35b |
Merge 4.14.203 into android-4.14-stable
Changes in 4.14.203 ibmveth: Switch order of ibmveth_helper calls. ibmveth: Identify ingress large send packets. ipv4: Restore flowi4_oif update before call to xfrm_lookup_route mlx4: handle non-napi callers to napi_poll net: usb: qmi_wwan: add Cellient MPL200 card tipc: fix the skb_unshare() in tipc_buf_append() net/ipv4: always honour route mtu during forwarding r8169: fix data corruption issue on RTL8402 binder: fix UAF when releasing todo list ALSA: bebob: potential info leak in hwdep_read() net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in nfc_genl_fw_download() tcp: fix to update snd_wl1 in bulk receiver fast path icmp: randomize the global rate limiter cifs: remove bogus debug code cifs: Return the error from crypt_message when enc/dec key not found. KVM: x86/mmu: Commit zap of remaining invalid pages when recovering lpages KVM: SVM: Initialize prev_ga_tag before use ima: Don't ignore errors from crypto_shash_update() crypto: algif_aead - Do not set MAY_BACKLOG on the async path EDAC/i5100: Fix error handling order in i5100_init_one() x86/fpu: Allow multiple bits in clearcpuid= parameter drivers/perf: xgene_pmu: Fix uninitialized resource struct crypto: algif_skcipher - EBUSY on aio should be an error crypto: mediatek - Fix wrong return value in mtk_desc_ring_alloc() crypto: ixp4xx - Fix the size used in a 'dma_free_coherent()' call media: tuner-simple: fix regression in simple_set_radio_freq media: Revert "media: exynos4-is: Add missed check for pinctrl_lookup_state()" media: m5mols: Check function pointer in m5mols_sensor_power media: uvcvideo: Set media controller entity functions media: omap3isp: Fix memleak in isp_probe crypto: omap-sham - fix digcnt register handling with export/import cypto: mediatek - fix leaks in mtk_desc_ring_alloc media: mx2_emmaprp: Fix memleak in emmaprp_probe media: tc358743: initialize variable media: platform: fcp: Fix a reference count leak. media: s5p-mfc: Fix a reference count leak media: ti-vpe: Fix a missing check and reference count leak regulator: resolve supply after creating regulator ath10k: provide survey info as accumulated data Bluetooth: hci_uart: Cancel init work before unregistering ath6kl: prevent potential array overflow in ath6kl_add_new_sta() ath9k: Fix potential out of bounds in ath9k_htc_txcompletion_cb() wcn36xx: Fix reported 802.11n rx_highest rate wcn3660/wcn3680 ASoC: qcom: lpass-platform: fix memory leak ASoC: qcom: lpass-cpu: fix concurrency issue brcmfmac: check ndev pointer mwifiex: Do not use GFP_KERNEL in atomic context drm/gma500: fix error check scsi: qla4xxx: Fix an error handling path in 'qla4xxx_get_host_stats()' scsi: csiostor: Fix wrong return value in csio_hw_prep_fw() backlight: sky81452-backlight: Fix refcount imbalance on error VMCI: check return value of get_user_pages_fast() for errors tty: serial: earlycon dependency tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup() pty: do tty_flip_buffer_push without port->lock in pty_write pwm: lpss: Fix off by one error in base_unit math in pwm_lpss_prepare() pwm: lpss: Add range limit check for the base_unit register value drivers/virt/fsl_hypervisor: Fix error handling path video: fbdev: vga16fb: fix setting of pixclock because a pass-by-value error video: fbdev: sis: fix null ptr dereference HID: roccat: add bounds checking in kone_sysfs_write_settings() pinctrl: mcp23s08: Fix mcp23x17_regmap initialiser pinctrl: mcp23s08: Fix mcp23x17 precious range ath6kl: wmi: prevent a shift wrapping bug in ath6kl_wmi_delete_pstream_cmd() misc: mic: scif: Fix error handling path ALSA: seq: oss: Avoid mutex lock for a long-time ioctl usb: dwc2: Fix parameter type in function pointer prototype quota: clear padding in v2r1_mem2diskdqb() HID: hid-input: fix stylus battery reporting qtnfmac: fix resource leaks on unsupported iftype error return path net: enic: Cure the enic api locking trainwreck mfd: sm501: Fix leaks in probe() iwlwifi: mvm: split a print to avoid a WARNING in ROC usb: gadget: f_ncm: fix ncm_bitrate for SuperSpeed and above. usb: gadget: u_ether: enable qmult on SuperSpeed Plus as well nl80211: fix non-split wiphy information usb: dwc2: Fix INTR OUT transfers in DDMA mode. scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs() mwifiex: fix double free net: korina: fix kfree of rx/tx descriptor array mm/memcg: fix device private memcg accounting mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary IB/mlx4: Fix starvation in paravirt mux/demux IB/mlx4: Adjust delayed work when a dup is observed powerpc/pseries: Fix missing of_node_put() in rng_init() powerpc/icp-hv: Fix missing of_node_put() in success path mtd: lpddr: fix excessive stack usage with clang mtd: mtdoops: Don't write panic data twice ARM: 9007/1: l2c: fix prefetch bits init in L2X0_AUX_CTRL using DT values arc: plat-hsdk: fix kconfig dependency warning when !RESET_CONTROLLER xfs: limit entries returned when counting fsmap records RDMA/qedr: Fix use of uninitialized field powerpc/tau: Use appropriate temperature sample interval powerpc/tau: Remove duplicated set_thresholds() call powerpc/tau: Disable TAU between measurements perf intel-pt: Fix "context_switch event has no tid" error RDMA/hns: Set the unsupported wr opcode kdb: Fix pager search for multi-line strings overflow: Include header file with SIZE_MAX declaration powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraints powerpc/perf/hv-gpci: Fix starting index value cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier IB/rdmavt: Fix sizeof mismatch f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info lib/crc32.c: fix trivial typo in preprocessor condition ramfs: fix nommu mmap with gaps in the page cache rapidio: fix error handling path rapidio: fix the missed put_device() for rio_mport_add_riodev mailbox: avoid timer start from callback i2c: rcar: Auto select RESET_CONTROLLER PCI: iproc: Set affinity mask on MSI interrupts clk: at91: clk-main: update key before writing AT91_CKGR_MOR clk: bcm2835: add missing release if devm_clk_hw_register fails ext4: limit entries returned when counting fsmap records vfio/pci: Clear token on bypass registration failure vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages Input: imx6ul_tsc - clean up some errors in imx6ul_tsc_resume() Input: stmfts - fix a & vs && typo Input: ep93xx_keypad - fix handling of platform_get_irq() error Input: omap4-keypad - fix handling of platform_get_irq() error Input: twl4030_keypad - fix handling of platform_get_irq() error Input: sun4i-ps2 - fix handling of platform_get_irq() error KVM: x86: emulating RDPID failure shall return #UD rather than #GP memory: omap-gpmc: Fix a couple off by ones memory: fsl-corenet-cf: Fix handling of platform_get_irq() error arm64: dts: qcom: msm8916: Fix MDP/DSI interrupts ARM: dts: owl-s500: Fix incorrect PPI interrupt specifiers arm64: dts: zynqmp: Remove additional compatible string for i2c IPs powerpc/powernv/dump: Fix race while processing OPAL dump nvmet: fix uninitialized work for zero kato NTB: hw: amd: fix an issue about leak system resources perf: correct SNOOPX field offset i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs crypto: ccp - fix error handling media: firewire: fix memory leak media: ati_remote: sanity check for both endpoints media: st-delta: Fix reference count leak in delta_run_work media: sti: Fix reference count leaks media: exynos4-is: Fix several reference count leaks due to pm_runtime_get_sync media: exynos4-is: Fix a reference count leak due to pm_runtime_get_sync media: exynos4-is: Fix a reference count leak media: vsp1: Fix runtime PM imbalance on error media: platform: s3c-camif: Fix runtime PM imbalance on error media: platform: sti: hva: Fix runtime PM imbalance on error media: bdisp: Fix runtime PM imbalance on error media: media/pci: prevent memory leak in bttv_probe media: uvcvideo: Ensure all probed info is returned to v4l2 mmc: sdio: Check for CISTPL_VERS_1 buffer size media: saa7134: avoid a shift overflow fs: dlm: fix configfs memory leak media: venus: core: Fix runtime PM imbalance in venus_probe ntfs: add check for mft record size in superblock mac80211: handle lack of sband->bitrates in rates PM: hibernate: remove the bogus call to get_gendisk() in software_resume() scsi: mvumi: Fix error return in mvumi_io_attach() scsi: target: core: Add CONTROL field for trace events mic: vop: copy data to kernel space then write to io memory misc: vop: add round_up(x,4) for vring_size to avoid kernel panic usb: gadget: function: printer: fix use-after-free in __lock_acquire udf: Limit sparing table size udf: Avoid accessing uninitialized data on failed inode read USB: cdc-acm: handle broken union descriptors can: flexcan: flexcan_chip_stop(): add error handling and propagate error value ath9k: hif_usb: fix race condition between usb_get_urb() and usb_kill_anchored_urbs() misc: rtsx: Fix memory leak in rtsx_pci_probe reiserfs: only call unlock_new_inode() if I_NEW xfs: make sure the rt allocator doesn't run off the end usb: ohci: Default to per-port over-current protection Bluetooth: Only mark socket zapped after unlocking scsi: ibmvfc: Fix error return in ibmvfc_probe() brcmsmac: fix memory leak in wlc_phy_attach_lcnphy rtl8xxxu: prevent potential memory leak Fix use after free in get_capset_info callback. scsi: qedi: Protect active command list to avoid list corruption scsi: qedi: Fix list_del corruption while removing active I/O tty: ipwireless: fix error handling ipvs: Fix uninit-value in do_ip_vs_set_ctl() reiserfs: Fix memory leak in reiserfs_parse_options() mwifiex: don't call del_timer_sync() on uninitialized timer brcm80211: fix possible memleak in brcmf_proto_msgbuf_attach usb: core: Solve race condition in anchor cleanup functions scsi: ufs: ufs-qcom: Fix race conditions caused by ufs_qcom_testbus_config() ath10k: check idx validity in __ath10k_htt_rx_ring_fill_n() net: korina: cast KSEG0 address to pointer in kfree tty: serial: fsl_lpuart: fix lpuart32_poll_get_char usb: cdc-acm: add quirk to blacklist ETAS ES58X devices USB: cdc-wdm: Make wdm_flush() interruptible and add wdm_fsync(). eeprom: at25: set minimum read/write access stride to 1 usb: gadget: f_ncm: allow using NCM in SuperSpeed Plus gadgets. powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler Linux 4.14.203 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia4c1c9fc1a8d03c662b37fbe1448b4fb1f88007a |
||
|
|
fc7d33941b |
mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
[ Upstream commit 67197a4f28d28d0b073ab0427b03cb2ee5382578 ]
Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm. This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.
Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important. Such operation can happen frequently. We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression. Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us. Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.
Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK). Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes. To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex. Its scope is changed to global.
The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork(). To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified. Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare. Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.
With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.
[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com
Fixes:
|
||
|
|
2f9c776ac2 |
[ALPS05077885] [Do NOT Sync]Merge branch android-4.14 into alps-trunk-r0.basic
[Detail] Parent: |
||
|
|
c21f649209 |
[ALPS04791510] task-turbo: scheduler tuning for task turbo
Give min_nice to turbo task for it has more oppotunity to occupy CPU resource and Turbo task wake-up at BCPU if SUB_FEAT_PREFER_BIGCORE is enable. Related trace event: a. echo 1 > /d/tracing/events/task_turbo/sched_turbo_nice_set/enable - Tracking turbo task set nice value b. echo 1 > /d/tracing/events/sched/sched_set_user_nice/enable - Tracking all task set nice value MTK-Commit-Id: d16f4b17e77536b6469b9af9c2120f69c8861ddd Change-Id: Ib8cd77e5a907513143a3fe22c274afaf1ff54f4b Signed-off-by: JianMin Liu <jian-min.liu@mediatek.com> CR-Id: ALPS04791510 Feature: System Performance |
||
|
|
fe19ab4ae9 |
[ALPS04791510] task-turbo: Introduce to Task Turbo
A new option CONFIG_MTK_TASK_TURBO for task-turbo feature.
Task turbo provide enhancement of APP launch and lock latency
via Preempting lock waiting queue and more oppotunity to occupy
CPU resource. user can apply pid to turbo the specific task via
turbo_pid interface.
If task-turbo enabled
1) When app is TOP-APP group, turbo UI thread/Render thread
2) Inherit turbo abilty to lock holder and binders target task
3) turbo User-specified task
How to enable(default off):
Task-turbo for launch:
- echo 15 > /sys/module/task_turbo/parameters/feats
Task-turbo for lock latency:
- echo 7 > /sys/module/task_turbo/parameters/feats
Related proc node and setting:
a. cat /proc/[pid]/task/[tid]/turbo
- query turbo task status
b. echo [pid] > /sys/module/task_turbo/parameters/turbo_pid
- turbo specific task by pid
c. echo pid > /sys/module/task_turbo/parameters/unset_turbo_pid
- de-turbo specific task by pid
MTK-Commit-Id: cd06fe7846efde21e4af495da3406bca40876739
Change-Id: Ic7f0ccc00332cf1feb39bb6b9a55bf756228187d
CR-Id: ALPS04791510
Feature: System Performance
Signed-off-by: JianMin Liu <jian-min.liu@mediatek.com>
|
||
|
|
ed6188ccce |
[ALPS05012802] [Do NOT Sync]Merge branch android-4.14 into alps-trunk-r0.basic
[Detail] Parent: |
||
|
|
dcfa42e329 |
[ALPS04748062] [Do NOT Sync]Merge branch android-4.14-q into alps-mp-q0.mp1
[Detail] Parent: |
||
|
|
e74031c866 |
[ALPS04721022] FROMLIST: mm: protect against PTE changes done by dup_mmap()
(from https://lore.kernel.org/patchwork/patch/1062672/) Vinayak Menon and Ganesh Mahendran reported that the following scenario may lead to thread being blocked due to data corruption: CPU 1 CPU 2 CPU 3 Process 1, Process 1, Process 1, Thread A Thread B Thread C while (1) { while (1) { while(1) { pthread_mutex_lock(l) pthread_mutex_lock(l) fork pthread_mutex_unlock(l) pthread_mutex_unlock(l) } } } In the details this happens because : CPU 1 CPU 2 CPU 3 fork() copy_pte_range() set PTE rdonly got to next VMA... . PTE is seen rdonly PTE still writable . thread is writing to page . -> page fault . copy the page Thread writes to page . . -> no page fault . update the PTE . flush TLB for that PTE flush TLB PTE are now rdonly So the write done by the CPU 3 is interfering with the page copy operation done by CPU 2, leading to the data corruption. To avoid this we mark all the VMA involved in the COW mechanism as changing by calling vm_write_begin(). This ensures that the speculative page fault handler will not try to handle a fault on these pages. The marker is set until the TLB is flushed, ensuring that all the CPUs will now see the PTE as not writable. Once the TLB is flush, the marker is removed by calling vm_write_end(). The variable last is used to keep tracked of the latest VMA marked to handle the error path where part of the VMA may have been marked. Since multiple VMA from the same mm may have the sequence count increased during this process, the use of the vm_raw_write_begin/end() is required to avoid lockdep false warning messages. Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Reported-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> MTK-Commit-Id: 37e7d65151723ab65ea6dc5b48118fd916d750f1 Change-Id: I38dfc4f370f8d9061a8e9e4d14972bb28d18a05d Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com> CR-Id: ALPS04721022 Feature: Memory Optimization |
||
|
|
b5c6782f4b |
[ALPS04721022] FROMLIST: mm: protect mm_rb tree with a rwlock
(from https://lore.kernel.org/patchwork/patch/906225/) This change is inspired by the Peters proposal patch [1] which was protecting the VMA using SRCU. Unfortunately, SRCU is not scaling well in that particular case, and it is introducing major performance degradation due to excessive scheduling operations. To allow access to the mm_rb tree without grabbing the mmap_sem, this patch is protecting it access using a rwlock. As the mm_rb tree is a O(log n) search it is safe to protect it using such a lock. The VMA cache is not protected by the new rwlock and it should not be used without holding the mmap_sem. To allow the picked VMA structure to be used once the rwlock is released, a use count is added to the VMA structure. When the VMA is allocated it is set to 1. Each time the VMA is picked with the rwlock held its use count is incremented. Each time the VMA is released it is decremented. When the use count hits zero, this means that the VMA is no more used and should be freed. This patch is preparing for 2 kind of VMA access : - as usual, under the control of the mmap_sem, - without holding the mmap_sem for the speculative page fault handler. Access done under the control the mmap_sem doesnt require to grab the rwlock to protect read access to the mm_rb tree, but access in write must be done under the protection of the rwlock too. This affects inserting and removing of elements in the RB tree. The patch is introducing 2 new functions: - vma_get() to find a VMA based on an address by holding the new rwlock. - vma_put() to release the VMA when its no more used. These services are designed to be used when access are made to the RB tree without holding the mmap_sem. When a VMA is removed from the RB tree, its vma->vm_rb field is cleared and we rely on the WMB done when releasing the rwlock to serialize the write with the RMB done in a later patch to check for the VMAs validity. When free_vma is called, the file associated with the VMA is closed immediately, but the policy and the file structure remained in used until the VMAs use count reach 0, which may happens later when exiting an in progress speculative page fault. [1] https://patchwork.kernel.org/patch/5108281/ Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> MTK-Commit-Id: 1f5b0906e926e164f29de930e3c79a31a6de5be6 Change-Id: If843e0ff543ecd85700bc1533315b81ced26075d Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com> CR-Id: ALPS04721022 Feature: Memory Optimization |
||
|
|
ce32250030 |
[ALPS04721022] FROMLIST: mm: introduce INIT_VMA()
(from https://lore.kernel.org/patchwork/patch/906213/) Some VMA struct fields need to be initialized once the VMA structure is allocated. Currently this only concerns anon_vma_chain field but some other will be added to support the speculative page fault. Instead of spreading the initialization calls all over the code, lets introduce a dedicated inline function. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> MTK-Commit-Id: cf0226c5c67fe28e0b1e93f085f7a99f003154ea Change-Id: I8cf06e24e97eb5b8fcb88ffb737e786af7ef6f68 Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com> CR-Id: ALPS04721022 Feature: Memory Optimization |
||
|
|
17776fea6a |
[ALPS04522325] [Do NOT Sync]Merge branch android-4.14-q into alps-trunk-q0.basic
[Detail] Parent: |
||
|
|
45de14c3ed |
[ALPS04364611] [Do NOT Sync]Merge branch android-4.14 into alps-trunk-q0.basic
[Detail] Parent: |
||
|
|
f185a86ae7 |
[ALPS04233374] pidmap: Introduce map of pid and its task name
We often lose task name information in exception dump which
make issue analysis more difficult. The reason is we only
take "snapshot" for pid map while exception is happening.
To get complete pid map which even includes terminated or killed
processes, we introduce a more positive way to keep those mappings.
The mapping also has TGID (Thread Group ID) information to
identify who starts the current thread.
This patch fixes incomplete feature porting from commit a29ab7da2b27
("[ALPS04133285] msdc:kernel-4.14 migration").
MTK-Commit-Id: a09c2334db387bedbf5f375343c004ecbf68cb62
Change-Id: Id6db7bf0d576966386ae6797b98ec8b51a7f6fe9
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
CR-Id: ALPS04233374
Feature: Android Exception Engine(AEE)
(cherry picked from commit d1cedea959db6ceba308d78457a3996db0ec32e6)
(cherry picked from commit 335b978aea24c548077cf42399b53b7809ed45fd)
(cherry picked from commit a1a91810ab540f014cfde1465c6baed2099e06a0)
|
||
|
|
97e5e79850 |
[ALPS04256189] mlog: integrate mlog driver
mlog is a time-based memory profiling tool used to record the variation of memory usage. MTK-Commit-Id: 68b8d8ef5b8cad2c7721aa6b6dd551059fe18e3d Change-Id: I1755991492a6d2d83778be47fdfa9f04a50e8912 Signed-off-by: Chinwen Chang <chinwen.chang@mediatek.com> CR-Id: ALPS04256189 Feature: Memory Optimization (cherry-pick from 039a05ee88d5217784cf48295b045858c63b7ffe) |
||
|
|
a7f2106930 |
FROMLIST: add support for Clang's Shadow Call Stack (SCS)
This change adds generic support for Clang's Shadow Call Stack, which uses a shadow stack to protect return addresses from being overwritten by an attacker. Details are available here: https://clang.llvm.org/docs/ShadowCallStack.html Note that security guarantees in the kernel differ from the ones documented for user space. The kernel must store addresses of shadow stacks used by other tasks and interrupt handlers in memory, which means an attacker capable reading and writing arbitrary memory may be able to locate them and hijack control flow by modifying shadow stacks that are not currently in use. Bug: 145210207 Change-Id: Ia5f1650593fa95da4efcf86f84830a20989f161c (am from https://lore.kernel.org/patchwork/patch/1149054/) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> |
||
|
|
84afceb668 |
Merge 4.14.158 into android-4.14
Changes in 4.14.158 Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS" clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX ASoC: compress: fix unsigned integer overflow check reset: Fix memory leak in reset_control_array_put() ASoC: kirkwood: fix external clock probe defer clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume reset: fix reset_control_ops kerneldoc comment clk: at91: avoid sleeping early clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18 idr: Fix idr_alloc_u32 on 32-bit systems x86/resctrl: Prevent NULL pointer dereference when reading mondata clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call net: fec: add missed clk_disable_unprepare in remove bridge: ebtables: don't crash when using dnat target in output chains can: peak_usb: report bus recovery as well can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error watchdog: meson: Fix the wrong value of left time scripts/gdb: fix debugging modules compiled with hot/cold partitioning net: bcmgenet: reapply manual settings to the PHY ceph: return -EINVAL if given fsc mount option on kernel w/o support mac80211: fix station inactive_time shortly after boot block: drbd: remove a stray unlock in __drbd_send_protocol() pwm: bcm-iproc: Prevent unloading the driver module while in use scsi: lpfc: Fix kernel Oops due to null pring pointers scsi: lpfc: Fix dif and first burst use in write commands ARM: dts: Fix up SQ201 flash access ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication parisc: Fix serio address output parisc: Fix HP SDC hpa address output arm64: mm: Prevent mismatched 52-bit VA support arm64: smp: Handle errors reported by the firmware ARM: OMAP1: fix USB configuration for device-only setups RDMA/vmw_pvrdma: Use atomic memory allocation in create AH PM / AVS: SmartReflex: NULL check before some freeing functions is not needed ARM: ks8695: fix section mismatch warning ACPI / LPSS: Ignore acpi_device_fix_up_power() return value scsi: lpfc: Enable Management features for IF_TYPE=6 crypto: user - support incremental algorithm dumps mwifiex: fix potential NULL dereference and use after free mwifiex: debugfs: correct histogram spacing, formatting rtl818x: fix potential use after free xfs: require both realtime inodes to mount ubi: Put MTD device after it is not used ubi: Do not drop UBI device reference before using microblaze: adjust the help to the real behavior microblaze: move "... is ready" messages to arch/microblaze/Makefile iwlwifi: move iwl_nvm_check_version() into dvm gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB kvm: vmx: Set IA32_TSC_AUX for legacy mode guests VSOCK: bind to random port for VMADDR_PORT_ANY mmc: meson-gx: make sure the descriptor is stopped on errors mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET btrfs: only track ref_heads in delayed_ref_updates HID: intel-ish-hid: fixes incorrect error handling serial: 8250: Rate limit serial port rx interrupts during input overruns kprobes/x86/xen: blacklist non-attachable xen interrupt functions xen/pciback: Check dev_data before using it vfio-mdev/samples: Use u8 instead of char for handle functions pinctrl: xway: fix gpio-hog related boot issues net/mlx5: Continue driver initialization despite debugfs failure exofs_mount(): fix leaks on failure exits bnxt_en: Return linux standard errors in bnxt_ethtool.c bnxt_en: query force speeds before disabling autoneg mode. KVM: s390: unregister debug feature on failing arch init pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10 HID: doc: fix wrong data structure reference for UHID_OUTPUT dm flakey: Properly corrupt multi-page bios. gfs2: take jdata unstuff into account in do_grow xfs: Align compat attrlist_by_handle with native implementation. xfs: Fix bulkstat compat ioctls on x32 userspace. IB/qib: Fix an error code in qib_sdma_verbs_send() clocksource/drivers/fttmr010: Fix invalid interrupt register access vxlan: Fix error path in __vxlan_dev_create() powerpc/book3s/32: fix number of bats in p/v_block_mapped() powerpc/xmon: fix dump_segments() drivers/regulator: fix a missing check of return value Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading serial: max310x: Fix tx_empty() callback openrisc: Fix broken paths to arch/or32 RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer scsi: qla2xxx: deadlock by configfs_depend_item scsi: csiostor: fix incorrect dma device in case of vport ath6kl: Only use match sets when firmware supports it ath6kl: Fix off by one error in scan completion powerpc/perf: Fix unit_sel/cache_sel checks powerpc/prom: fix early DEBUG messages powerpc/mm: Make NULL pointer deferences explicit on bad page faults. powerpc/44x/bamboo: Fix PCI range vfio/spapr_tce: Get rid of possible infinite loop powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status drbd: ignore "all zero" peer volume sizes in handshake drbd: reject attach of unsuitable uuids even if connected drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix print_st_err()'s prototype to match the definition IB/rxe: Make counters thread safe regulator: tps65910: fix a missing check of return value powerpc/83xx: handle machine check caused by watchdog timer powerpc/pseries: Fix node leak in update_lmb_associativity_index() crypto: mxc-scc - fix build warnings on ARM64 pwm: clps711x: Fix period calculation net/netlink_compat: Fix a missing check of nla_parse_nested net/net_namespace: Check the return value of register_pernet_subsys() f2fs: fix to dirty inode synchronously um: Make GCOV depend on !KCOV net: (cpts) fix a missing check of clk_prepare net: stmicro: fix a missing check of clk_prepare net: dsa: bcm_sf2: Propagate error value from mdio_write atl1e: checking the status of atl1e_write_phy_reg tipc: fix a missing check of genlmsg_put net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe() ocfs2: clear journal dirty flag after shutdown journal vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk lib/genalloc.c: use vzalloc_node() to allocate the bitmap fork: fix some -Wmissing-prototypes warnings drivers/base/platform.c: kmemleak ignore a known leak lib/genalloc.c: include vmalloc.h mtd: Check add_mtd_device() ret code tipc: fix memory leak in tipc_nl_compat_publ_dump net/core/neighbour: tell kmemleak about hash tables PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() net/core/neighbour: fix kmemleak minimal reference count for hash tables serial: 8250: Fix serial8250 initialization crash gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel decnet: fix DN_IFREQ_SIZE net/smc: prevent races between smc_lgr_terminate() and smc_conn_free() blktrace: Show requests without sector tipc: fix skb may be leaky in tipc_link_input sfc: initialise found bitmap in efx_ef10_mtd_probe net: fix possible overflow in __sk_mem_raise_allocated() sctp: don't compare hb_timer expire date before starting it bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() net: dev: Use unsigned integer as an argument to left-shift kvm: properly check debugfs dentry before using it bpf: drop refcount if bpf_map_new_fd() fails in map_create() net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED iommu/amd: Fix NULL dereference bug in match_hid_uid apparmor: delete the dentry in aafs_remove() to avoid a leak scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery ACPI / APEI: Don't wait to serialise with oops messages when panic()ing ACPI / APEI: Switch estatus pool to use vmalloc memory scsi: libsas: Check SMP PHY control function result powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() mtd: Remove a debug trace in mtdpart.c mm, gup: add missing refcount overflow checks on s390 clk: at91: fix update bit maps on CFG_MOR write clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated() staging: rtl8192e: fix potential use after free staging: rtl8723bs: Drop ACPI device ids staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P mei: bus: prefix device names on bus with the bus name xfrm: Fix memleak on xfrm state destroy media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE net: macb: fix error format in dev_err() pwm: Clear chip_data in pwm_put() media: atmel: atmel-isc: fix asd memory allocation media: atmel: atmel-isc: fix INIT_WORK misplacement macvlan: schedule bc_work even if error net: psample: fix skb_over_panic openvswitch: fix flow command message size slip: Fix use-after-free Read in slip_open openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() openvswitch: remove another BUG_ON() tipc: fix link name length check sctp: cache netns in sctp_ep_common net: sched: fix `tc -s class show` no bstats on class with nolock subqueues ext4: add more paranoia checking in ext4_expand_extra_isize handling watchdog: sama5d4: fix WDD value to be always set to max net: macb: Fix SUBNS increment and increase resolution net: macb driver, check for SKBTX_HW_TSTAMP mtd: rawnand: atmel: Fix spelling mistake in error message mtd: rawnand: atmel: fix possible object reference leak mtd: spi-nor: cast to u64 to avoid uint overflows y2038: futex: Move compat implementation into futex.c futex: Prevent robust futex exit race futex: Move futex exit handling into futex code futex: Replace PF_EXITPIDONE with a state exit/exec: Seperate mm_release() futex: Split futex_mm_release() for exit/exec futex: Set task::futex_state to DEAD right after handling futex exit futex: Mark the begin of futex exit explicitly futex: Sanitize exit state handling futex: Provide state handling for exec() as well futex: Add mutex around futex exit futex: Provide distinct return value when owner is exiting futex: Prevent exit livelock HID: core: check whether Usage Page item is after Usage ID items crypto: stm32/hash - Fix hmac issue more than 256 bytes media: stm32-dcmi: fix DMA corruption when stopping streaming hwrng: stm32 - fix unbalanced pm_runtime_enable mailbox: mailbox-test: fix null pointer if no mmio pinctrl: stm32: fix memory leak issue ASoC: stm32: i2s: fix dma configuration ASoC: stm32: i2s: fix 16 bit format support ASoC: stm32: i2s: fix IRQ clearing platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size net: fec: fix clock count mis-match Linux 4.14.158 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
a6dc90f43f |
futex: Split futex_mm_release() for exit/exec
commit 150d71584b12809144b8145b817e83b81158ae5f upstream. To allow separate handling of the futex exit state in the futex exit code for exit and exec, split futex_mm_release() into two functions and invoke them from the corresponding exit/exec_mm_release() callsites. Preparatory only, no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.332094221@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
7d79d1c681 |
exit/exec: Seperate mm_release()
commit 4610ba7ad877fafc0a25a30c6c82015304120426 upstream. mm_release() contains the futex exit handling. mm_release() is called from do_exit()->exit_mm() and from exec()->exec_mm(). In the exit_mm() case PF_EXITING and the futex state is updated. In the exec_mm() case these states are not touched. As the futex exit code needs further protections against exit races, this needs to be split into two functions. Preparatory only, no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.240518241@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2f6c5ebbbb |
futex: Move futex exit handling into futex code
commit ba31c1a48538992316cc71ce94fa9cd3e7b427c0 upstream. The futex exit handling is #ifdeffed into mm_release() which is not pretty to begin with. But upcoming changes to address futex exit races need to add more functionality to this exit code. Split it out into a function, move it into futex code and make the various futex exit functions static. Preparatory only and no functional change. Folded build fix from Borislav. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.049705556@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
7c2ec471cf |
fork: fix some -Wmissing-prototypes warnings
[ Upstream commit fb5bf31722d0805a3f394f7d59f2e8cd07acccb7 ] We get a warning when building kernel with W=1: kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes] kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes] Add the missing declaration in head file to fix this. Also, remove arch_release_thread_stack() completely because no arch seems to implement it since bb9d81264 (arch: remove tile port). Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
234de92896 |
Merge 4.14.150 into android-4.14
Changes in 4.14.150
panic: ensure preemption is disabled during panic()
f2fs: use EINVAL for superblock with invalid magic
USB: rio500: Remove Rio 500 kernel driver
USB: yurex: Don't retry on unexpected errors
USB: yurex: fix NULL-derefs on disconnect
USB: usb-skeleton: fix runtime PM after driver unbind
USB: usb-skeleton: fix NULL-deref on disconnect
xhci: Fix false warning message about wrong bounce buffer write length
xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
xhci: Check all endpoints for LPM timeout
usb: xhci: wait for CNR controller not ready bit in xhci resume
xhci: Increase STS_SAVE timeout in xhci_suspend()
USB: adutux: remove redundant variable minor
USB: adutux: fix use-after-free on disconnect
USB: adutux: fix NULL-derefs on disconnect
USB: adutux: fix use-after-free on release
USB: iowarrior: fix use-after-free on disconnect
USB: iowarrior: fix use-after-free on release
USB: iowarrior: fix use-after-free after driver unbind
USB: usblp: fix runtime PM after driver unbind
USB: chaoskey: fix use-after-free on release
USB: ldusb: fix NULL-derefs on driver unbind
serial: uartlite: fix exit path null pointer
USB: serial: keyspan: fix NULL-derefs on open() and write()
USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
USB: serial: option: add Telit FN980 compositions
USB: serial: option: add support for Cinterion CLS8 devices
USB: serial: fix runtime PM after driver unbind
USB: usblcd: fix I/O after disconnect
USB: microtek: fix info-leak at probe
USB: dummy-hcd: fix power budget for SuperSpeed mode
usb: renesas_usbhs: gadget: Do not discard queues in usb_ep_set_{halt,wedge}()
usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior
USB: legousbtower: fix slab info leak at probe
USB: legousbtower: fix deadlock on disconnect
USB: legousbtower: fix potential NULL-deref on disconnect
USB: legousbtower: fix open after failed reset request
USB: legousbtower: fix use-after-free on release
staging: vt6655: Fix memory leak in vt6655_probe
iio: adc: ad799x: fix probe error handling
iio: adc: axp288: Override TS pin bias current for some models
iio: light: opt3001: fix mutex unlock race
efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
perf llvm: Don't access out-of-scope array
perf inject jit: Fix JIT_CODE_MOVE filename
CIFS: Gracefully handle QueryInfo errors during open
CIFS: Force revalidate inode when dentry is stale
CIFS: Force reval dentry if LOOKUP_REVAL flag is set
kernel/sysctl.c: do not override max_threads provided by userspace
firmware: google: increment VPD key_len properly
gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source
Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
iio: hx711: add delay until DOUT is ready
iio: adc: hx711: fix bug in sampling of data
btrfs: fix incorrect updating of log root tree
NFS: Fix O_DIRECT accounting of number of bytes read/written
MIPS: Disable Loongson MMI instructions for kernel build
Fix the locking in dcache_readdir() and friends
media: stkwebcam: fix runtime PM after driver unbind
tracing/hwlat: Report total time spent in all NMIs during the sample
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
ftrace: Get a reference counter for the trace_array on filter files
tracing: Get trace_array reference for available_tracers files
x86/asm: Fix MWAITX C-state hint value
xfs: clear sb->s_fs_info on mount failure
Linux 4.14.150
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
b27c133657 |
kernel/sysctl.c: do not override max_threads provided by userspace
commit b0f53dbc4bc4c371f38b14c391095a3bb8a0bb40 upstream. Partially revert |
||
|
|
e44e96da7f |
UPSTREAM: pidfd: add polling support
This patch adds polling support to pidfd. Android low memory killer (LMK) needs to know when a process dies once it is sent the kill signal. It does so by checking for the existence of /proc/pid which is both racy and slow. For example, if a PID is reused between when LMK sends a kill signal and checks for existence of the PID, since the wrong PID is now possibly checked for existence. Using the polling support, LMK will be able to get notified when a process exists in race-free and fast way, and allows the LMK to do other things (such as by polling on other fds) while awaiting the process being killed to die. For notification to polling processes, we follow the same existing mechanism in the kernel used when the parent of the task group is to be notified of a child's death (do_notify_parent). This is precisely when the tasks waiting on a poll of pidfd are also awakened in this patch. We have decided to include the waitqueue in struct pid for the following reasons: 1. The wait queue has to survive for the lifetime of the poll. Including it in task_struct would not be option in this case because the task can be reaped and destroyed before the poll returns. 2. By including the struct pid for the waitqueue means that during de_thread(), the new thread group leader automatically gets the new waitqueue/pid even though its task_struct is different. Appropriate test cases are added in the second patch to provide coverage of all the cases the patch is handling. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Jonathan Kowalski <bl0pbl33p@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: kernel-team@android.com Reviewed-by: Oleg Nesterov <oleg@redhat.com> Co-developed-by: Daniel Colascione <dancol@google.com> Signed-off-by: Daniel Colascione <dancol@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Christian Brauner <christian@brauner.io> (cherry picked from commit b53b0b9d9a613c418057f6cb921c2f40a6f78c24) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: I02f259d2875bec46b198d580edfbb067f077084e Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
379fed52b7 |
UPSTREAM: fork: do not release lock that wasn't taken
Avoid calling cgroup_threadgroup_change_end() without having called
cgroup_threadgroup_change_begin() first.
During process creation we need to check whether the cgroup we are in
allows us to fork. To perform this check the cgroup needs to guard itself
against threadgroup changes and takes a lock.
Prior to CLONE_PIDFD the cleanup target "bad_fork_free_pid" would also need
to call cgroup_threadgroup_change_end() because said lock had already been
taken.
However, this is not the case anymore with the addition of CLONE_PIDFD. We
are now allocating a pidfd before we check whether the cgroup we're in can
fork and thus prior to taking the lock. So when copy_process() fails at the
right step it would release a lock we haven't taken.
This bug is not even very subtle to be honest. It's just not very clear
from the naming of cgroup_threadgroup_change_{begin,end}() that a lock is
taken.
Here's the relevant splat:
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fec849
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078
RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000
RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(depth <= 0)
WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 __lock_release
kernel/locking/lockdep.c:4052 [inline]
WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052
lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 7744 Comm: syz-executor007 Not tainted 5.1.0+ #4
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2cb/0x65c kernel/panic.c:214
__warn.cold+0x20/0x45 kernel/panic.c:566
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:179 [inline]
fixup_bug arch/x86/kernel/traps.c:174 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972
RIP: 0010:__lock_release kernel/locking/lockdep.c:4052 [inline]
RIP: 0010:lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321
Code: 0f 85 a0 03 00 00 8b 35 77 66 08 08 85 f6 75 23 48 c7 c6 a0 55 6b 87
48 c7 c7 40 25 6b 87 4c 89 85 70 ff ff ff e8 b7 a9 eb ff <0f> 0b 4c 8b 85
70 ff ff ff 4c 89 ea 4c 89 e6 4c 89 c7 e8 52 63 ff
RSP: 0018:ffff888094117b48 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 1ffff11012822f6f RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815af236 RDI: ffffed1012822f5b
RBP: ffff888094117c00 R08: ffff888092bfc400 R09: fffffbfff113301d
R10: fffffbfff113301c R11: ffffffff889980e3 R12: ffffffff8a451df8
R13: ffffffff8142e71f R14: ffffffff8a44cc80 R15: ffff888094117bd8
percpu_up_read.constprop.0+0xcb/0x110 include/linux/percpu-rwsem.h:92
cgroup_threadgroup_change_end include/linux/cgroup-defs.h:712 [inline]
copy_process.part.0+0x47ff/0x6710 kernel/fork.c:2222
copy_process kernel/fork.c:1772 [inline]
_do_fork+0x25d/0xfd0 kernel/fork.c:2338
__do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline]
__se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline]
__ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236
do_syscall_32_irqs_on arch/x86/entry/common.c:334 [inline]
do_fast_syscall_32+0x281/0xd54 arch/x86/entry/common.c:405
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fec849
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078
RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000
RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..
Reported-and-tested-by: syzbot+3286e58549edc479faae@syzkaller.appspotmail.com
Fixes: b3e583825266 ("clone: add CLONE_PIDFD")
Signed-off-by: Christian Brauner <christian@brauner.io>
(cherry picked from commit c3b7112df86b769927a60a6d7175988ca3d60f09)
Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: Ib9ecb1e5c0c6e2d062b89c25109ec571570eb497
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
|
||
|
|
4f3acf41da |
BACKPORT: clone: add CLONE_PIDFD
This patchset makes it possible to retrieve pid file descriptors at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. As spotted by Linus, there is exactly one bit for clone() left. CLONE_PIDFD creates file descriptors based on the anonymous inode implementation in the kernel that will also be used to implement the new mount api. They serve as a simple opaque handle on pids. Logically, this makes it possible to interpret a pidfd differently, narrowing or widening the scope of various operations (e.g. signal sending). Thus, a pidfd cannot just refer to a tgid, but also a tid, or in theory - given appropriate flag arguments in relevant syscalls - a process group or session. A pidfd does not represent a privilege. This does not imply it cannot ever be that way but for now this is not the case. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the parent_tidptr argument of clone. This has the advantage that we can give back the associated pid and the pidfd at the same time. To remove worries about missing metadata access this patchset comes with a sample program that illustrates how a combination of CLONE_PIDFD, and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. The sample program can easily be translated into a helper that would be suitable for inclusion in libc so that users don't have to worry about writing it themselves. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit b3e5838252665ee4cfa76b82bdf1198dca81e5be) Conflicts: kernel/fork.c (1. Replaced proc_pid_ns() with its direct implementation.) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: I3c804a92faea686e5bf7f99df893fe3a5d87ddf7 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
7870b283a5 |
Merge 4.14.136 into android-4.14-q
Changes in 4.14.136 VSOCK: use TCP state constants for sk_state vsock: correct removal of socket from the list NFS: Fix dentry revalidation on NFSv4 lookup NFS: Refactor nfs_lookup_revalidate() NFSv4: Fix lookup revalidate of regular files arm64: dts: marvell: Fix A37xx UART0 register size i2c: qup: fixed releasing dma without flush operation completion arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ binder: fix possible UAF when freeing buffer ISDN: hfcsusb: checking idx of ep configuration media: au0828: fix null dereference in error path ath10k: Change the warning message string media: cpia2_usb: first wake up, then free in disconnect media: pvrusb2: use a different format for warnings NFS: Cleanup if nfs_match_client is interrupted media: radio-raremono: change devm_k*alloc to k*alloc iommu/vt-d: Don't queue_iova() if there is no flush queue iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA hv_sock: Add support for delayed close Bluetooth: hci_uart: check for missing tty operations sched/fair: Don't free p->numa_faults with concurrent readers drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl Fix allyesconfig output. ceph: hold i_ceph_lock when removing caps for freeing inode ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL Linux 4.14.136 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
20c71e6d5a |
Merge 4.14.136 into android-4.14
Changes in 4.14.136 VSOCK: use TCP state constants for sk_state vsock: correct removal of socket from the list NFS: Fix dentry revalidation on NFSv4 lookup NFS: Refactor nfs_lookup_revalidate() NFSv4: Fix lookup revalidate of regular files arm64: dts: marvell: Fix A37xx UART0 register size i2c: qup: fixed releasing dma without flush operation completion arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ binder: fix possible UAF when freeing buffer ISDN: hfcsusb: checking idx of ep configuration media: au0828: fix null dereference in error path ath10k: Change the warning message string media: cpia2_usb: first wake up, then free in disconnect media: pvrusb2: use a different format for warnings NFS: Cleanup if nfs_match_client is interrupted media: radio-raremono: change devm_k*alloc to k*alloc iommu/vt-d: Don't queue_iova() if there is no flush queue iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA hv_sock: Add support for delayed close Bluetooth: hci_uart: check for missing tty operations sched/fair: Don't free p->numa_faults with concurrent readers drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl Fix allyesconfig output. ceph: hold i_ceph_lock when removing caps for freeing inode ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL Linux 4.14.136 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
d0919216e4 |
sched/fair: Don't free p->numa_faults with concurrent readers
commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes:
|
||
|
|
817de622e8 |
Merge 4.14.121 into android-4.14-q
Changes in 4.14.121
net: core: another layer of lists, around PF_MEMALLOC skb handling
locking/rwsem: Prevent decrement of reader count before increment
PCI: hv: Fix a memory leak in hv_eject_device_work()
PCI: hv: Add hv_pci_remove_slots() when we unload the driver
PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary
x86/speculation/mds: Revert CPU buffer clear on double fault exit
x86/speculation/mds: Improve CPU buffer clear documentation
objtool: Fix function fallthrough detection
ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260
ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3
ARM: exynos: Fix a leaked reference by adding missing of_node_put
power: supply: axp288_charger: Fix unchecked return value
arm64: compat: Reduce address limit
arm64: Clear OSDLR_EL1 on CPU boot
arm64: Save and restore OSDLR_EL1 across suspend/resume
sched/x86: Save [ER]FLAGS on context switch
crypto: chacha20poly1305 - set cra_name correctly
crypto: vmx - fix copy-paste error in CTR mode
crypto: skcipher - don't WARN on unprocessed data after slow walk step
crypto: crct10dif-generic - fix use via crypto_shash_digest()
crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
crypto: rockchip - update IV buffer to contain the next IV
crypto: arm/aes-neonbs - don't access already-freed walk.iv
ALSA: usb-audio: Fix a memory leak bug
ALSA: hda/hdmi - Read the pin sense from register when repolling
ALSA: hda/hdmi - Consider eld_valid when reporting jack event
ALSA: hda/realtek - EAPD turn on later
ASoC: max98090: Fix restore of DAPM Muxes
ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
bpf, arm64: remove prefetch insn in xadd mapping
mm/mincore.c: make mincore() more conservative
ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
userfaultfd: use RCU to free the task struct when fork fails
mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values
mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write
tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0
tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
jbd2: check superblock mapped prior to committing
ext4: make sanity check in mballoc more strict
ext4: ignore e_value_offs for xattrs with value-in-ea-inode
ext4: avoid drop reference to iloc.bh twice
Btrfs: do not start a transaction during fiemap
Btrfs: do not start a transaction at iterate_extent_inodes()
bcache: fix a race between cache register and cacheset unregister
bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
ext4: fix use-after-free race with debug_want_extra_isize
ext4: actually request zeroing of inode table after grow
ext4: fix ext4_show_options for file systems w/o journal
ipmi:ssif: compare block number correctly for multi-part return messages
crypto: arm64/aes-neonbs - don't access already-freed walk.iv
crypto: salsa20 - don't access already-freed walk.iv
crypto: ccm - fix incompatibility between "ccm" and "ccm_base"
fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return 0...")
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
ext4: zero out the unused memory region in the extent tree block
ext4: fix data corruption caused by overlapping unaligned and aligned IO
ext4: fix use-after-free in dx_release()
ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
iov_iter: optimize page_copy_sane()
ext4: fix compile error when using BUFFER_TRACE
Linux 4.14.121
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
2470653b00 |
Merge 4.14.121 into android-4.14
Changes in 4.14.121
net: core: another layer of lists, around PF_MEMALLOC skb handling
locking/rwsem: Prevent decrement of reader count before increment
PCI: hv: Fix a memory leak in hv_eject_device_work()
PCI: hv: Add hv_pci_remove_slots() when we unload the driver
PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary
x86/speculation/mds: Revert CPU buffer clear on double fault exit
x86/speculation/mds: Improve CPU buffer clear documentation
objtool: Fix function fallthrough detection
ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260
ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3
ARM: exynos: Fix a leaked reference by adding missing of_node_put
power: supply: axp288_charger: Fix unchecked return value
arm64: compat: Reduce address limit
arm64: Clear OSDLR_EL1 on CPU boot
arm64: Save and restore OSDLR_EL1 across suspend/resume
sched/x86: Save [ER]FLAGS on context switch
crypto: chacha20poly1305 - set cra_name correctly
crypto: vmx - fix copy-paste error in CTR mode
crypto: skcipher - don't WARN on unprocessed data after slow walk step
crypto: crct10dif-generic - fix use via crypto_shash_digest()
crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
crypto: rockchip - update IV buffer to contain the next IV
crypto: arm/aes-neonbs - don't access already-freed walk.iv
ALSA: usb-audio: Fix a memory leak bug
ALSA: hda/hdmi - Read the pin sense from register when repolling
ALSA: hda/hdmi - Consider eld_valid when reporting jack event
ALSA: hda/realtek - EAPD turn on later
ASoC: max98090: Fix restore of DAPM Muxes
ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
bpf, arm64: remove prefetch insn in xadd mapping
mm/mincore.c: make mincore() more conservative
ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
userfaultfd: use RCU to free the task struct when fork fails
mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values
mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write
tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0
tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
jbd2: check superblock mapped prior to committing
ext4: make sanity check in mballoc more strict
ext4: ignore e_value_offs for xattrs with value-in-ea-inode
ext4: avoid drop reference to iloc.bh twice
Btrfs: do not start a transaction during fiemap
Btrfs: do not start a transaction at iterate_extent_inodes()
bcache: fix a race between cache register and cacheset unregister
bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
ext4: fix use-after-free race with debug_want_extra_isize
ext4: actually request zeroing of inode table after grow
ext4: fix ext4_show_options for file systems w/o journal
ipmi:ssif: compare block number correctly for multi-part return messages
crypto: arm64/aes-neonbs - don't access already-freed walk.iv
crypto: salsa20 - don't access already-freed walk.iv
crypto: ccm - fix incompatibility between "ccm" and "ccm_base"
fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return 0...")
fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
ext4: zero out the unused memory region in the extent tree block
ext4: fix data corruption caused by overlapping unaligned and aligned IO
ext4: fix use-after-free in dx_release()
ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
iov_iter: optimize page_copy_sane()
ext4: fix compile error when using BUFFER_TRACE
Linux 4.14.121
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
851d1a7cc4 |
userfaultfd: use RCU to free the task struct when fork fails
commit c3f3ce049f7d97cc7ec9c01cb51d9ec74e0f37c2 upstream.
The task structure is freed while get_mem_cgroup_from_mm() holds
rcu_read_lock() and dereferences mm->owner.
get_mem_cgroup_from_mm() failing fork()
---- ---
task = mm->owner
mm->owner = NULL;
free(task)
if (task) *task; /* use after free */
The fix consists in freeing the task with RCU also in the fork failure
case, exactly like it always happens for the regular exit(2) path. That
is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm()
(left side above) effective to avoid a use after free when dereferencing
the task structure.
An alternate possible fix would be to defer the delivery of the
userfaultfd contexts to the monitor until after fork() is guaranteed to
succeed. Such a change would require more changes because it would
create a strict ordering dependency where the uffd methods would need to
be called beyond the last potentially failing branch in order to be
safe. This solution as opposed only adds the dependency to common code
to set mm->owner to NULL and to free the task struct that was pointed by
mm->owner with RCU, if fork ends up failing. The userfaultfd methods
can still be called anywhere during the fork runtime and the monitor
will keep discarding orphaned "mm" coming from failed forks in userland.
This race condition couldn't trigger if CONFIG_MEMCG was set =n at build
time.
[aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal]
Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com
Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com
Fixes:
|
||
|
|
94eab674ff |
UPSTREAM: psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills. In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.
In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.
A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively. Stall states are aggregate versions of the per-task delay
accounting delays:
cpu: some tasks are runnable but not executing on a CPU
memory: tasks are reclaiming, or waiting for swapin or thrashing cache
io: tasks are waiting for io completions
These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit. They can also indicate when the system
is approaching lockup scenarios and OOMs.
To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states. Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime. A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).
[hannes@cmpxchg.org: doc fixlet, per Randy]
Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2)
Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
|
||
|
|
da2880fa47 |
Merge 4.14.93 into android-4.14
Changes in 4.14.93
pinctrl: meson: fix pull enable register calculation
powerpc: Fix COFF zImage booting on old powermacs
powerpc/mm: Fix linux page tables build with some configs
HID: ite: Add USB id match for another ITE based keyboard rfkill key quirk
ARM: imx: update the cpu power up timing setting on i.mx6sx
ARM: dts: imx7d-nitrogen7: Fix the description of the Wifi clock
Input: restore EV_ABS ABS_RESERVED
checkstack.pl: fix for aarch64
xfrm: Fix error return code in xfrm_output_one()
xfrm: Fix bucket count reported to userspace
xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry.
netfilter: seqadj: re-load tcp header pointer after possible head reallocation
scsi: bnx2fc: Fix NULL dereference in error handling
Input: omap-keypad - fix idle configuration to not block SoC idle states
Input: synaptics - enable RMI on ThinkPad T560
ibmvnic: Fix non-atomic memory allocation in IRQ context
ieee802154: ca8210: fix possible u8 overflow in ca8210_rx_done
x86/mm: Fix guard hole handling
x86/dump_pagetables: Fix LDT remap address marker
i40e: fix mac filter delete when setting mac address
netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel
netfilter: nat: can't use dst_hold on noref dst
bnx2x: Clear fip MAC when fcoe offload support is disabled
bnx2x: Remove configured vlans as part of unload sequence.
bnx2x: Send update-svid ramrod with retry/poll flags enabled
scsi: target: iscsi: cxgbit: fix csk leak
scsi: target: iscsi: cxgbit: add missing spin_lock_init()
x86, hyperv: remove PCI dependency
drivers: net: xgene: Remove unnecessary forward declarations
w90p910_ether: remove incorrect __init annotation
net: hns: Incorrect offset address used for some registers.
net: hns: All ports can not work when insmod hns ko after rmmod.
net: hns: Some registers use wrong address according to the datasheet.
net: hns: Fixed bug that netdev was opened twice
net: hns: Clean rx fbd when ae stopped.
net: hns: Free irq when exit from abnormal branch
net: hns: Avoid net reset caused by pause frames storm
net: hns: Fix ntuple-filters status error.
net: hns: Add mac pcs config when enable|disable mac
net: hns: Fix ping failed when use net bridge and send multicast
SUNRPC: Fix a race with XPRT_CONNECTING
qed: Fix an error code qed_ll2_start_xmit()
net: macb: fix random memory corruption on RX with 64-bit DMA
net: macb: fix dropped RX frames due to a race
lan78xx: Resolve issue with changing MAC address
vxge: ensure data0 is initialized in when fetching firmware version information
mac80211: free skb fraglist before freeing the skb
kbuild: fix false positive warning/error about missing libelf
virtio: fix test build after uio.h change
gpio: mvebu: only fail on missing clk if pwm is actually to be used
Input: synaptics - enable SMBus for HP EliteBook 840 G4
net: netxen: fix a missing check and an uninitialized use
qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
serial/sunsu: fix refcount leak
scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
scsi: lpfc: do not set queue->page_count to 0 if pc_sli4_params.wqpcnt is invalid
genirq/affinity: Don't return with empty affinity masks on error
tools: fix cross-compile var clobbering
fork: record start_time late
zram: fix double free backing device
hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
mm, devm_memremap_pages: kill mapping "System RAM" support
mm, hmm: use devm semantics for hmm_devmem_{add, remove}
mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
mm, swap: fix swapoff with KSM pages
sunrpc: fix cache_head leak due to queued request
sunrpc: use SVC_NET() in svcauth_gss_* functions
powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer
powerpc: Disable -Wbuiltin-requires-header when setjmp is used
ftrace: Build with CPPFLAGS to get -Qunused-arguments
md: raid10: remove VLAIS
kbuild: add -no-integrated-as Clang option unconditionally
kbuild: consolidate Clang compiler flags
Makefile: Export clang toolchain variables
powerpc/boot: Set target when cross-compiling for clang
raid6/ppc: Fix build for clang
vhost/vsock: fix uninitialized vhost_vsock->guest_cid
dm verity: fix crash on bufio buffer that was allocated with vmalloc
dm zoned: Fix target BIO completion handling
ALSA: cs46xx: Potential NULL dereference in probe
ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
dlm: fixed memory leaks after failed ls_remove_names allocation
dlm: possible memory leak on error path in create_lkb()
dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
dlm: memory leaks on error path in dlm_user_request()
gfs2: Get rid of potential double-freeing in gfs2_create_inode
gfs2: Fix loop in gfs2_rbm_find
b43: Fix error in cordic routine
selinux: policydb - fix byte order and alignment issues
lockd: Show pid of lockd for remote locks
scripts/kallsyms: filter arm64's __efistub_ symbols
arm64: drop linker script hack to hide __efistub_ symbols
arm64: relocatable: fix inconsistencies in linker script and options
powerpc/tm: Set MSR[TS] just prior to recheckpoint
9p/net: put a lower bound on msize
rxe: fix error completion wr_id and qp_num
iommu/vt-d: Handle domain agaw being less than iommu agaw
sched/fair: Fix infinite loop in update_blocked_averages() by reverting
|
||
|
|
3f2e4e1d9a |
fork: record start_time late
commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream. This changes the fork(2) syscall to record the process start_time after initializing the basic task structure but still before making the new process visible to user-space. Technically, we could record the start_time anytime during fork(2). But this might lead to scenarios where a start_time is recorded long before a process becomes visible to user-space. For instance, with userfaultfd(2) and TLS, user-space can delay the execution of fork(2) for an indefinite amount of time (and will, if this causes network access, or similar). By recording the start_time late, it much closer reflects the point in time where the process becomes live and can be observed by other processes. Lastly, this makes it much harder for user-space to predict and control the start_time they get assigned. Previously, user-space could fork a process and stall it in copy_thread_tls() before its pid is allocated, but after its start_time is recorded. This can be misused to later-on cycle through PIDs and resume the stalled fork(2) yielding a process that has the same pid and start_time as a process that existed before. This can be used to circumvent security systems that identify processes by their pid+start_time combination. Even though user-space was always aware that start_time recording is flaky (but several projects are known to still rely on start_time-based identification), changing the start_time to be recorded late will help mitigate existing attacks and make it much harder for user-space to control the start_time a process gets assigned. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
f8223ece3a |
Merge 4.14.70 into android-4.14
Changes in 4.14.70 act_ife: fix a potential use-after-free ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state net: bcmgenet: use MAC link status for fixed phy net: macb: do not disable MDIO bus at open/close time net: sched: Fix memory exposure from short TCA_U32_SEL qlge: Fix netdev features configuration. r8169: add support for NCube 8168 network card tcp: do not restart timewait timer on rst reception vti6: remove !skb->ignore_df check from vti6_xmit() net/sched: act_pedit: fix dump of extended layered op tipc: fix a missing rhashtable_walk_exit() nfp: wait for posted reconfigs when disabling the device sctp: hold transport before accessing its asoc in sctp_transport_get_next mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge vhost: correctly check the iova range when waking virtqueue hv_netvsc: ignore devices that are not PCI hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe() act_ife: move tcfa_lock down to where necessary act_ife: fix a potential deadlock net: sched: action_ife: take reference to meta module cifs: check if SMB2 PDU size has been padded and suppress the warning hfsplus: don't return 0 when fill_super() failed hfs: prevent crash on exit from failed search sunrpc: Don't use stack buffer with scatterlist fork: don't copy inconsistent signal handler state to child reiserfs: change j_timestamp type to time64_t hfsplus: fix NULL dereference in hfsplus_lookup() fs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list entries fat: validate ->i_start before using scripts: modpost: check memory allocation results virtio: pci-legacy: Validate queue pfn x86/mce: Add notifier_block forward declaration IB/hfi1: Invalid NUMA node information can cause a divide by zero pwm: meson: Fix mux clock names mm/fadvise.c: fix signed overflow UBSAN complaint fs/dcache.c: fix kmemcheck splat at take_dentry_name_snapshot() platform/x86: intel_punit_ipc: fix build errors netfilter: ip6t_rpfilter: set F_IFACE for linklocal addresses s390/kdump: Fix memleak in nt_vmcoreinfo ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest() mfd: sm501: Set coherent_dma_mask when creating subdevices platform/x86: asus-nb-wmi: Add keymap entry for lid flip action on UX360 netfilter: fix memory leaks on netlink_dump_start error tcp, ulp: add alias for all ulp modules RDMA/hns: Fix usage of bitmap allocation functions return values net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Fix for phy link issue when using marvell phy driver perf tools: Check for null when copying nsinfo. irqchip/bcm7038-l1: Hide cpu offline callback when building for !SMP net/9p/trans_fd.c: fix race by holding the lock net/9p: fix error path of p9_virtio_probe f2fs: fix to clear PG_checked flag in set_page_dirty() powerpc/uaccess: Enable get_user(u64, *p) on 32-bit powerpc: Fix size calculation using resource_size() perf probe powerpc: Fix trace event post-processing block: bvec_nr_vecs() returns value for wrong slab s390/dasd: fix hanging offline processing due to canceled worker s390/dasd: fix panic for failed online processing ACPI / scan: Initialize status to ACPI_STA_DEFAULT scsi: aic94xx: fix an error code in aic94xx_init() NFSv4: Fix error handling in nfs4_sp4_select_mode() Input: do not use WARN() in input_alloc_absinfo() xen/balloon: fix balloon initialization for PVH Dom0 PCI: mvebu: Fix I/O space end address calculation dm kcopyd: avoid softlockup in run_complete_job staging: comedi: ni_mio_common: fix subdevice flags for PFI subdevice ASoC: rt5677: Fix initialization of rt5677_of_match.data iommu/omap: Fix cache flushes on L2 table entries selftests/powerpc: Kill child processes on SIGINT RDS: IB: fix 'passing zero to ERR_PTR()' warning cfq: Suppress compiler warnings about comparisons smb3: fix reset of bytes read and written stats SMB3: Number of requests sent should be displayed for SMB3 not just CIFS powerpc/platforms/85xx: fix t1042rdb_diu.c build errors & warning powerpc/64s: Make rfi_flush_fallback a little more robust powerpc/pseries: Avoid using the size greater than RTAS_ERROR_LOG_MAX. clk: rockchip: Add pclk_rkpwm_pmu to PMU critical clocks in rk3399 KVM: vmx: track host_state.loaded using a loaded_vmcs pointer kvm: nVMX: Fix fault vector for VMX operation at CPL > 0 btrfs: Exit gracefully when chunk map cannot be inserted to the tree btrfs: replace: Reset on-disk dev stats value after replace btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized btrfs: Don't remove block group that still has pinned down bytes arm64: rockchip: Force CONFIG_PM on Rockchip systems ARM: rockchip: Force CONFIG_PM on Rockchip systems drm/i915/lpe: Mark LPE audio runtime pm as "no callbacks" drm/amdgpu: Fix RLC safe mode test in gfx_v9_0_enter_rlc_safe_mode drm/amd/pp/Polaris12: Fix a chunk of registers missed to program drm/edid: Add 6 bpc quirk for SDC panel in Lenovo B50-80 drm/amdgpu: update tmr mc address drm/amdgpu:add tmr mc address into amdgpu_firmware_info drm/amdgpu:add new firmware id for VCN drm/amdgpu:add VCN support in PSP driver drm/amdgpu:add VCN booting with firmware loaded by PSP uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name debugobjects: Make stack check warning more informative sched/deadline: Fix switching to -deadline lightnvm: pblk: free padded entries in write buffer mm: Fix devm_memremap_pages() collision handling HID: add quirk for another PIXART OEM mouse used by HP usb: dwc3: core: Fix ULPI PHYs and prevent phy_get/ulpi_init during suspend/resume x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear x86/xen: don't write ptes directly in 32-bit PV guests drm/i915: Increase LSPCON timeout kbuild: make missing $DEPMOD a Warning instead of an Error s390/lib: use expoline for all bcr instructions irda: Fix memory leak caused by repeated binds of irda socket irda: Only insert new objects into the global database via setsockopt Revert "ARM: imx_v6_v7_defconfig: Select ULPI support" kvm: x86: Set highest physical address bits in non-present/reserved SPTEs x86: kvm: avoid unused variable warning arm64: cpu_errata: include required headers ASoC: wm8994: Fix missing break in switch arm64: Fix mismatched cache line size detection arm64: Handle mismatched cache type Linux 4.14.70 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |