305 Commits

Author SHA1 Message Date
Nathan Chancellor
79a097e001 Merge 4.4.179 into android-msm-wahoo-4.4
Changes in 4.4.179: (170 commits)
        arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
        arm64: debug: Ensure debug handlers check triggering exception level
        ext4: cleanup bh release code in ext4_ind_remove_space()
        lib/int_sqrt: optimize initial value compute
        tty/serial: atmel: Add is_half_duplex helper
        mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
        i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA
        Bluetooth: Fix decrementing reference count twice in releasing socket
        tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
        CIFS: fix POSIX lock leak and invalid ptr deref
        h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
        tracing: kdb: Fix ftdump to not sleep
        gpio: gpio-omap: fix level interrupt idling
        sysctl: handle overflow for file-max
        enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
        mm/cma.c: cma_declare_contiguous: correct err handling
        mm/page_ext.c: fix an imbalance with kmemleak
        mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
        mm/slab.c: kmemleak no scan alien caches
        ocfs2: fix a panic problem caused by o2cb_ctl
        f2fs: do not use mutex lock in atomic context
        fs/file.c: initialize init_files.resize_wait
        cifs: use correct format characters
        dm thin: add sanity checks to thin-pool and external snapshot creation
        cifs: Fix NULL pointer dereference of devname
        fs: fix guard_bio_eod to check for real EOD errors
        tools lib traceevent: Fix buffer overflow in arg_eval
        usb: chipidea: Grab the (legacy) USB PHY by phandle first
        scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
        coresight: etm4x: Add support to enable ETMv4.2
        ARM: 8840/1: use a raw_spinlock_t in unwind
        mmc: omap: fix the maximum timeout setting
        e1000e: Fix -Wformat-truncation warnings
        IB/mlx4: Increase the timeout for CM cache
        scsi: megaraid_sas: return error when create DMA pool failed
        perf test: Fix failure of 'evsel-tp-sched' test on s390
        SoC: imx-sgtl5000: add missing put_device()
        media: sh_veu: Correct return type for mem2mem buffer helpers
        media: s5p-jpeg: Correct return type for mem2mem buffer helpers
        media: s5p-g2d: Correct return type for mem2mem buffer helpers
        media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
        leds: lp55xx: fix null deref on firmware load failure
        kprobes: Prohibit probing on bsearch()
        ARM: 8833/1: Ensure that NEON code always compiles with Clang
        ALSA: PCM: check if ops are defined before suspending PCM
        bcache: fix input overflow to cache set sysfs file io_error_halflife
        bcache: fix input overflow to sequential_cutoff
        bcache: improve sysfs_strtoul_clamp()
        fbdev: fbmem: fix memory access if logo is bigger than the screen
        cdrom: Fix race condition in cdrom_sysctl_register
        ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
        soc: qcom: gsbi: Fix error handling in gsbi_probe()
        mt7601u: bump supported EEPROM version
        ARM: avoid Cortex-A9 livelock on tight dmb loops
        tty: increase the default flip buffer limit to 2*640K
        media: mt9m111: set initial frame size other than 0x0
        hwrng: virtio - Avoid repeated init of completion
        soc/tegra: fuse: Fix illegal free of IO base address
        hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
        dmaengine: imx-dma: fix warning comparison of distinct pointer types
        netfilter: physdev: relax br_netfilter dependency
        media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
        regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
        wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
        x86/build: Mark per-CPU symbols as absolute explicitly for LLD
        dmaengine: tegra: avoid overflow of byte tracking
        drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
        binfmt_elf: switch to new creds when switching to new mm
        kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
        x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
        x86: vdso: Use $LD instead of $CC to link
        x86/vdso: Drop implicit common-page-size linker flag
        lib/string.c: implement a basic bcmp
        tty: mark Siemens R3964 line discipline as BROKEN
        tty: ldisc: add sysctl to prevent autoloading of ldiscs
        ipv6: Fix dangling pointer when ipv6 fragment
        ipv6: sit: reset ip header pointer in ipip6_rcv
        net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
        openvswitch: fix flow actions reallocation
        qmi_wwan: add Olicard 600
        sctp: initialize _pad of sockaddr_in before copying to user memory
        tcp: Ensure DCTCP reacts to losses
        netns: provide pure entropy for net_hash_mix()
        net: ethtool: not call vzalloc for zero sized memory request
        ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
        ALSA: seq: Fix OOB-reads from strlcpy
        include/linux/bitrev.h: fix constant bitrev
        ASoC: fsl_esai: fix channel swap issue when stream starts
        block: do not leak memory in bio_copy_user_iov()
        genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
        ARM: dts: at91: Fix typo in ISC_D0 on PC9
        arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
        xen: Prevent buffer overflow in privcmd ioctl
        sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
        xtensa: fix return_address
        PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
        perf/core: Restore mmap record type correctly
        ext4: add missing brelse() in add_new_gdb_meta_bg()
        ext4: report real fs size after failed resize
        ALSA: echoaudio: add a check for ioremap_nocache
        ALSA: sb8: add a check for request_region
        IB/mlx4: Fix race condition between catas error reset and aliasguid flows
        mmc: davinci: remove extraneous __init annotation
        ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
        thermal/int340x_thermal: Add additional UUIDs
        thermal/int340x_thermal: fix mode setting
        tools/power turbostat: return the exit status of a command
        perf top: Fix error handling in cmd_top()
        perf evsel: Free evsel->counts in perf_evsel__exit()
        perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
        perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
        x86/hpet: Prevent potential NULL pointer dereference
        x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
        iommu/vt-d: Check capability before disabling protected memory
        x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
        fix incorrect error code mapping for OBJECTID_NOT_FOUND
        ext4: prohibit fstrim in norecovery mode
        rsi: improve kernel thread handling to fix kernel panic
        9p: do not trust pdu content for stat item size
        9p locks: add mount option for lock retry interval
        f2fs: fix to do sanity check with current segment number
        serial: uartps: console_setup() can't be placed to init section
        ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
        ACPI / SBS: Fix GPE storm on recent MacBookPro's
        cifs: fallback to older infolevels on findfirst queryinfo retry
        crypto: sha256/arm - fix crash bug in Thumb2 build
        crypto: sha512/arm - fix crash bug in Thumb2 build
        iommu/dmar: Fix buffer overflow during PCI bus notification
        ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
        appletalk: Fix use-after-free in atalk_proc_exit
        lib/div64.c: off by one in shift
        include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
        tpm/tpm_crb: Avoid unaligned reads in crb_recv()
        ovl: fix uid/gid when creating over whiteout
        appletalk: Fix compile regression
        bonding: fix event handling for stacked bonds
        net: atm: Fix potential Spectre v1 vulnerabilities
        net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
        net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
        tcp: tcp_grow_window() needs to respect tcp_space()
        ipv4: recompile ip options in ipv4_link_failure
        ipv4: ensure rcu_read_lock() in ipv4_link_failure()
        crypto: crypto4xx - properly set IV after de- and encrypt
        modpost: file2alias: go back to simple devtable lookup
        modpost: file2alias: check prototype of handler
        tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
        KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
        iio/gyro/bmg160: Use millidegrees for temperature scale
        iio: ad_sigma_delta: select channel when reading register
        iio: adc: at91: disable adc channel interrupt in timeout case
        io: accel: kxcjk1013: restore the range after resume.
        staging: comedi: vmk80xx: Fix use of uninitialized semaphore
        staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
        staging: comedi: ni_usb6501: Fix use of uninitialized mutex
        staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
        ALSA: core: Fix card races between register and disconnect
        crypto: x86/poly1305 - fix overflow during partial reduction
        arm64: futex: Restore oldval initialization to work around buggy compilers
        x86/kprobes: Verify stack frame on kretprobe
        kprobes: Mark ftrace mcount handler functions nokprobe
        kprobes: Fix error check when reusing optimized probes
        mac80211: do not call driver wake_tx_queue op during reconfig
        Revert "kbuild: use -Oz instead of -Os when using clang"
        sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
        device_cgroup: fix RCU imbalance in error case
        mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
        ALSA: info: Fix racy addition/deletion of nodes
        Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()"
        kernel/sysctl.c: fix out-of-bounds access when setting file-max
        Linux 4.4.179

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Conflicts:
	Makefile
	fs/ext4/ioctl.c
2019-04-27 09:07:11 -07:00
Carlos Maiolino
2e5086f3ac fs: fix guard_bio_eod to check for real EOD errors
[ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]

guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.

It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.

In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.

Fix this by checking if truncated_bytes is lower than PAGE_SIZE.

This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.

I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.

The following script can trigger the error.

MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0

mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT

losetup -D

losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT

find $MNT -type f -exec cat {} + >/dev/null

Kudos to Eric Sandeen for coming up with the reproducer above

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-27 09:33:49 +02:00
Thierry Strudel
75c8bc7183 Merged linux-4.4.80 into android-msm-wahoo-4.4
Linux 4.4.80
    ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
    scsi: snic: Return error code on memory allocation failure
    scsi: fnic: Avoid sending reset to firmware when another reset is in progress
    HID: ignore Petzl USB headlamp
    ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
    sh_eth: enable RX descriptor word 0 shift on SH7734
    nvmem: imx-ocotp: Fix wrong register size
    arm64: mm: fix show_pte KERN_CONT fallout
    vfio-pci: Handle error from pci_iomap
    video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
    perf symbols: Robustify reading of build-id from sysfs
    perf tools: Install tools/lib/traceevent plugins with install-bin
    xfrm: Don't use sk_family for socket policy lookups
    tools lib traceevent: Fix prev/next_prio for deadline tasks
    Btrfs: adjust outstanding_extents counter properly when dio write is split
    usb: gadget: Fix copy/pasted error message
    ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
    ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
    ARM64: zynqmp: Fix i2c node's compatible string
    ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
    dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
    dmaengine: ioatdma: workaround SKX ioatdma version
    dmaengine: ioatdma: Add Skylake PCI Dev ID
    openrisc: Add _text symbol to fix ksym build error
    irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
    ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
    spi: dw: Make debugfs name unique between instances
    ASoC: tlv320aic3x: Mark the RESET register as volatile
    irqchip/keystone: Fix "scheduling while atomic" on rt
    vfio-pci: use 32-bit comparisons for register address for gcc-4.5
    drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
    drm/msm: Ensure that the hardware write pointer is valid
    net/mlx4: Remove BUG_ON from ICM allocation routine
    ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
    ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
    r8169: add support for RTL8168 series add-on card.
    x86/mce/AMD: Make the init code more robust
    tpm: Replace device number bitmap with IDR
    tpm: fix a kernel memory leak in tpm-sysfs.c
    xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
    xen/blkback: don't free be structure too early
    sched/cputime: Fix prev steal time accouting during CPU hotplug
    net: skb_needs_check() accepts CHECKSUM_NONE for tx
    pstore: Use dynamic spinlock initializer
    pstore: Correctly initialize spinlock and flags
    pstore: Allow prz to control need for locking
    vlan: Propagate MAC address to VLANs
    /proc/iomem: only expose physical resource addresses to privileged users
    Make file credentials available to the seqfile interfaces
    v4l: s5c73m3: fix negation operator
    dentry name snapshots
    ipmi/watchdog: fix watchdog timeout set on reboot
    libnvdimm, btt: fix btt_rw_page not returning errors
    RDMA/uverbs: Fix the check for port number
    PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
    sched/cgroup: Move sched_online_group() back into css_online() to fix crash
    kaweth: fix oops upon failed memory allocation
    kaweth: fix firmware download
    mpt3sas: Don't overreach ioc->reply_post[] during initialization
    mailbox: handle empty message in tx_tick
    mailbox: skip complete wait event if timer expired
    mailbox: always wait in mbox_send_message for blocking Tx mode
    wil6210: fix deadlock when using fw_no_recovery option
    ath10k: fix null deref on wmi-tlv when trying spectral scan
    isdn/i4l: fix buffer overflow
    isdn: Fix a sleep-in-atomic bug
    net: phy: Do not perform software reset for Generic PHY
    nfc: fdp: fix NULL pointer dereference
    xfs: don't BUG() on mixed direct and mapped I/O
    perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
    perf intel-pt: Use FUP always when scanning for an IP
    perf intel-pt: Fix last_ip usage
    perf intel-pt: Fix ip compression
    drm: rcar-du: Simplify and fix probe error handling
    drm: rcar-du: Perform initialization/cleanup at probe/remove time
    drm/rcar: Nuke preclose hook
    Staging: comedi: comedi_fops: Avoid orphaned proc entry
    Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
    KVM: PPC: Book3S HV: Save/restore host values of debug registers
    KVM: PPC: Book3S HV: Reload HTM registers explicitly
    KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
    KVM: PPC: Book3S HV: Context-switch EBB registers properly
    drm/nouveau/bar/gf100: fix access to upper half of BAR2
    drm/vmwgfx: Fix gcc-7.1.1 warning
    md/raid5: add thread_group worker async_tx_issue_pending_all
    crypto: authencesn - Fix digest_null crash
    powerpc/pseries: Fix of_node_put() underflow during reconfig remove
    net: reduce skb_warn_bad_offload() noise
    pstore: Make spinlock per zone instead of global
    af_key: Add lock to key dump
Linux 4.4.79
    alarmtimer: don't rate limit one-shot timers
    tracing: Fix kmemleak in instance_rmdir
    spmi: Include OF based modalias in device uevent
    of: device: Export of_device_{get_modalias, uvent_modalias} to modules
    drm/mst: Avoid processing partially received up/down message transactions
    drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
    drm/mst: Fix error handling during MST sideband message reception
    RDMA/core: Initialize port_num in qp_attr
    ceph: fix race in concurrent readdir
    staging: rtl8188eu: add TL-WN722N v2 support
    Revert "perf/core: Drop kernel samples even though :u is specified"
    perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
    target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce
    udf: Fix deadlock between writeback and udf_setsize()
    NFS: only invalidate dentrys that are clearly invalid.
    Input: i8042 - fix crash at boot time
    MIPS: Fix a typo: s/preset/present/ in r2-to-r6 emulation error message
    MIPS: Send SIGILL for linked branches in `__compute_return_epc_for_insn'
    MIPS: Rename `sigill_r6' to `sigill_r2r6' in `__compute_return_epc_for_insn'
    MIPS: Send SIGILL for BPOSGE32 in `__compute_return_epc_for_insn'
    MIPS: math-emu: Prevent wrong ISA mode instruction emulation
    MIPS: Fix unaligned PC interpretation in `compute_return_epc'
    MIPS: Actually decode JALX in `__compute_return_epc_for_insn'
    MIPS: Save static registers before sysmips
    MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
    x86/ioapic: Pass the correct data to unmask_ioapic_irq()
    x86/acpi: Prevent out of bound access caused by broken ACPI tables
    MIPS: Negate error syscall return in trace
    MIPS: Fix mips_atomic_set() with EVA
    MIPS: Fix mips_atomic_set() retry condition
    ftrace: Fix uninitialized variable in match_records()
    vfio: New external user group/file match
    vfio: Fix group release deadlock
    f2fs: Don't clear SGID when inheriting ACLs
    ipmi:ssif: Add missing unlock in error branch
    ipmi: use rcu lock around call to intf->handlers->sender()
    drm/radeon: Fix eDP for single-display iMac10,1 (v2)
    drm/radeon/ci: disable mclk switching for high refresh rates (v2)
    drm/amd/amdgpu: Return error if initiating read out of range on vram
    s390/syscalls: Fix out of bounds arguments access
    Raid5 should update rdev->sectors after reshape
    cx88: Fix regression in initial video standard setting
    x86/xen: allow userspace access during hypercalls
    md: don't use flush_signals in userspace processes
    usb: renesas_usbhs: gadget: disable all eps when the driver stops
    usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL
    USB: cdc-acm: add device-id for quirky printer
    usb: storage: return on error to avoid a null pointer dereference
    xhci: Fix NULL pointer dereference when cleaning up streams for removed host
    xhci: fix 20000ms port resume timeout
    ipvs: SNAT packet replies only for NATed connections
    PCI/PM: Restore the status of PCI devices across hibernation
    af_key: Fix sadb_x_ipsecrequest parsing
    powerpc/asm: Mark cr0 as clobbered in mftb()
    powerpc: Fix emulation of mfocrf in emulate_step()
    powerpc: Fix emulation of mcrf in emulate_step()
    powerpc/64: Fix atomic64_inc_not_zero() to return an int
    iscsi-target: Add login_keys_workaround attribute for non RFC initiators
    scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails.
    PM / Domains: Fix unsafe iteration over modified list of domain providers
    PM / Domains: Fix unsafe iteration over modified list of device links
    ASoC: compress: Derive substream from stream based on direction
    wlcore: fix 64K page support
    Bluetooth: use constant time memory comparison for secret values
    perf intel-pt: Clear FUP flag on error
    perf intel-pt: Ensure IP is zero when state is INTEL_PT_STATE_NO_IP
    perf intel-pt: Fix missing stack clear
    perf intel-pt: Improve sample timestamp
    perf intel-pt: Move decoder error setting into one condition
    NFC: Add sockaddr length checks before accessing sa_family in bind handlers
    nfc: Fix the sockaddr length sanitization in llcp_sock_connect
    nfc: Ensure presence of required attributes in the activate_target handler
    NFC: nfcmrvl: fix firmware-management initialisation
    NFC: nfcmrvl: use nfc-device for firmware download
    NFC: nfcmrvl: do not use device-managed resources
    NFC: nfcmrvl_uart: add missing tty-device sanity check
    NFC: fix broken device allocation
    ath9k: fix tx99 bus error
    ath9k: fix tx99 use after free
    thermal: cpu_cooling: Avoid accessing potentially freed structures
    s5p-jpeg: don't return a random width/height
    ir-core: fix gcc-7 warning on bool arithmetic
    disable new gcc-7.1.1 warnings for now
Linux 4.4.78
    kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
    kvm: vmx: Check value written to IA32_BNDCFGS
    kvm: x86: Guest BNDCFGS requires guest MPX support
    kvm: vmx: Do not disable intercepts for BNDCFGS
    KVM: x86: disable MPX if host did not enable MPX XSAVE features
    tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
    PM / QoS: return -EINVAL for bogus strings
    PM / wakeirq: Convert to SRCU
    sched/topology: Optimize build_group_mask()
    sched/topology: Fix overlapping sched_group_mask
    crypto: caam - fix signals handling
    crypto: sha1-ssse3 - Disable avx2
    crypto: atmel - only treat EBUSY as transient if backlog
    crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
    mm: fix overflow check in expand_upwards()
    tpm: Issue a TPM2_Shutdown for TPM2 devices.
    Add "shutdown" to "struct class".
    tpm: Provide strong locking for device removal
    tpm: Get rid of chip->pdev
    selftests/capabilities: Fix the test_execve test
    mnt: Make propagate_umount less slow for overlapping mount propagation trees
    mnt: In propgate_umount handle visiting mounts in any order
    mnt: In umount propagation reparent in a separate pass
    vt: fix unchecked __put_user() in tioclinux ioctls
    exec: Limit arg stack to at most 75% of _STK_LIM
    s390: reduce ELF_ET_DYN_BASE
    powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
    arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
    arm: move ELF_ET_DYN_BASE to 4MB
    binfmt_elf: use ELF_ET_DYN_BASE only for PIE
    checkpatch: silence perl 5.26.0 unescaped left brace warnings
    fs/dcache.c: fix spin lockup issue on nlru->lock
    mm/list_lru.c: fix list_lru_count_node() to be race free
    kernel/extable.c: mark core_kernel_text notrace
    tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
    parisc/mm: Ensure IRQs are off in switch_mm()
    parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
    parisc: use compat_sys_keyctl()
    parisc: Report SIGSEGV instead of SIGBUS when running out of stack
    irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
    cfg80211: Check if PMKID attribute is of expected size
    cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
    cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
    brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
    rds: tcp: use sock_create_lite() to create the accept socket
    vrf: fix bug_on triggered by rx when destroying a vrf
    net: ipv6: Compare lwstate in detecting duplicate nexthops
    ipv6: dad: don't remove dynamic addresses if link is down
    net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
    bpf: prevent leaking pointer via xadd on unpriviledged
    net: prevent sign extension in dev_get_stats()
    tcp: reset sk_rx_dst in tcp_disconnect()
    net: dp83640: Avoid NULL pointer dereference.
    ipv6: avoid unregistering inet6_dev for loopback
    net/phy: micrel: configure intterupts after autoneg workaround
    net: sched: Fix one possible panic when no destroy callback
    net_sched: fix error recovery at qdisc creation
Linux 4.4.77
    saa7134: fix warm Medion 7134 EEPROM read
    x86/mm/pat: Don't report PAT on CPUs that don't support it
    ext4: check return value of kstrtoull correctly in reserved_clusters_store
    staging: comedi: fix clean-up of comedi_class in comedi_init()
    staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
    tcp: fix tcp_mark_head_lost to check skb len before fragmenting
    md: fix super_offset endianness in super_1_rdev_size_change
    md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
    perf tools: Use readdir() instead of deprecated readdir_r() again
    perf tests: Remove wrong semicolon in while loop in CQM test
    perf trace: Do not process PERF_RECORD_LOST twice
    perf dwarf: Guard !x86_64 definitions under #ifdef else clause
    perf pmu: Fix misleadingly indented assignment (whitespace)
    perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
    perf tools: Remove duplicate const qualifier
    perf script: Use readdir() instead of deprecated readdir_r()
    perf thread_map: Use readdir() instead of deprecated readdir_r()
    perf tools: Use readdir() instead of deprecated readdir_r()
    perf bench numa: Avoid possible truncation when using snprintf()
    perf tests: Avoid possible truncation with dirent->d_name + snprintf
    perf scripting perl: Fix compile error with some perl5 versions
    perf thread_map: Correctly size buffer used with dirent->dt_name
    perf intel-pt: Use __fallthrough
    perf top: Use __fallthrough
    tools strfilter: Use __fallthrough
    tools string: Use __fallthrough in perf_atoll()
    tools include: Add a __fallthrough statement
    mqueue: fix a use-after-free in sys_mq_notify()
    RDMA/uverbs: Check port number supplied by user verbs cmds
    KEYS: Fix an error code in request_master_key()
    ath10k: override CE5 config for QCA9377
    x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
    x86/tools: Fix gcc-7 warning in relocs.c
    gfs2: Fix glock rhashtable rcu bug
    USB: serial: qcserial: new Sierra Wireless EM7305 device ID
    USB: serial: option: add two Longcheer device ids
    pinctrl: sh-pfc: Update info pointer after SoC-specific init
    pinctrl: mxs: atomically switch mux and drive strength config
    pinctrl: sunxi: Fix SPDIF function name for A83T
    pinctrl: meson: meson8b: fix the NAND DQS pins
    pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
    sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
    sysctl: don't print negative flag for proc_douintvec
    mac80211_hwsim: Replace bogus hrtimer clockid
    usb: Fix typo in the definition of Endpoint[out]Request
    usb: usbip: set buffer pointers to NULL after free
    Add USB quirk for HVR-950q to avoid intermittent device resets
    USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
    usb: dwc3: replace %p with %pK
    drm/virtio: don't leak bo on drm_gem_object_init failure
    tracing/kprobes: Allow to create probe with a module name starting with a digit
    mm: fix classzone_idx underflow in shrink_zones()
    bgmac: reset & enable Ethernet core before using it
    driver core: platform: fix race condition with driver_override
    fs: completely ignore unknown open flags
    fs: add a VALID_OPEN_FLAGS
Linux 4.4.76
    KVM: nVMX: Fix exception injection
    KVM: x86: zero base3 of unusable segments
    KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
    KVM: x86: fix emulation of RSM and IRET instructions
    cpufreq: s3c2416: double free on driver init error path
    iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
    iommu: Handle default domain attach failure
    iommu/vt-d: Don't over-free page table directories
    ocfs2: o2hb: revert hb threshold to keep compatible
    x86/mm: Fix flush_tlb_page() on Xen
    x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
    ARM: 8685/1: ensure memblock-limit is pmd-aligned
    ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
    sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
    watchdog: bcm281xx: Fix use of uninitialized spinlock.
    xfrm: Oops on error in pfkey_msg2xfrm_state()
    xfrm: NULL dereference on allocation failure
    xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
    jump label: fix passing kbuild_cflags when checking for asm goto support
    ravb: Fix use-after-free on `ifconfig eth0 down`
    sctp: check af before verify address in sctp_addr_id2transport
    net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
    perf probe: Fix to show correct locations for events on modules
    be2net: fix status check in be_cmd_pmac_add()
    s390/ctl_reg: make __ctl_load a full memory barrier
    swiotlb: ensure that page-sized mappings are page-aligned
    coredump: Ensure proper size of sparse core files
    x86/mpx: Use compatible types in comparison to fix sparse error
    mac80211: initialize SMPS field in HT capabilities
    spi: davinci: use dma_mapping_error()
    scsi: lpfc: avoid double free of resource identifiers
    HID: i2c-hid: Add sleep between POWER ON and RESET
    kernel/panic.c: add missing \n
    ibmveth: Add a proper check for the availability of the checksum features
    vxlan: do not age static remote mac entries
    virtio_net: fix PAGE_SIZE > 64k
    vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
    drm/amdgpu: check ring being ready before using
    net: dsa: Check return value of phy_connect_direct()
    amd-xgbe: Check xgbe_init() return code
    platform/x86: ideapad-laptop: handle ACPI event 1
    scsi: virtio_scsi: Reject commands when virtqueue is broken
    xen-netfront: Fix Rx stall during network stress and OOM
    swiotlb-xen: update dev_addr after swapping pages
    virtio_console: fix a crash in config_work_handler
    Btrfs: fix truncate down when no_holes feature is enabled
    gianfar: Do not reuse pages from emergency reserve
    powerpc/eeh: Enable IO path on permanent error
    net: bgmac: Remove superflous netif_carrier_on()
    net: bgmac: Start transmit queue in bgmac_open
    net: bgmac: Fix SOF bit checking
    bgmac: Fix reversed test of build_skb() return value.
    mtd: bcm47xxpart: don't fail because of bit-flips
    bgmac: fix a missing check for build_skb
    mtd: bcm47xxpart: limit scanned flash area on BCM47XX (MIPS) only
    MIPS: ralink: fix MT7628 wled_an pinmux gpio
    MIPS: ralink: fix MT7628 pinmux typos
    MIPS: ralink: Fix invalid assignment of SoC type
    MIPS: ralink: fix USB frequency scaling
    MIPS: ralink: MT7688 pinmux fixes
    net: korina: Fix NAPI versus resources freeing
    MIPS: ath79: fix regression in PCI window initialization
    net: mvneta: Fix for_each_present_cpu usage
    ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
    qla2xxx: Fix erroneous invalid handle message
    scsi: lpfc: Set elsiocb contexts to NULL after freeing it
    scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
    KVM: x86: fix fixing of hypercalls
    mm: numa: avoid waiting on freed migrated pages
    block: fix module reference leak on put_disk() call for cgroups throttle
    sysctl: enable strict writes
    usb: gadget: f_fs: Fix possibe deadlock
    drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
    ALSA: hda - set input_path bitmap to zero after moving it to new place
    ALSA: hda - Fix endless loop of codec configure
    MIPS: Fix IRQ tracing & lockdep when rescheduling
    MIPS: pm-cps: Drop manual cache-line alignment of ready_count
    MIPS: Avoid accidental raw backtrace
    mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
    drm/ast: Handle configuration without P2A bridge
    NFSv4: fix a reference leak caused WARNING messages
    netfilter: synproxy: fix conntrackd interaction
    netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
    rtnetlink: add IFLA_GROUP to ifla_policy
    ipv6: Do not leak throw route references
    sfc: provide dummy definitions of vswitch functions
    net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
    decnet: always not take dst->__refcnt when inserting dst into hash table
    net/mlx5: Wait for FW readiness before initializing command interface
    ipv6: fix calling in6_ifa_hold incorrectly for dad work
    igmp: add a missing spin_lock_init()
    igmp: acquire pmc lock for ip_mc_clear_src()
    net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx
    Fix an intermittent pr_emerg warning about lo becoming free.
    af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
    net: Zero ifla_vf_info in rtnl_fill_vfinfo()
    decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
    net: don't call strlen on non-terminated string in dev_set_alias()
    ipv6: release dst on error in ip6_dst_lookup_tail
Linux 4.4.75
    nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
    nvme/quirk: Add a delay before checking for adapter readiness
    net: phy: fix marvell phy status reading
    net: phy: Initialize mdio clock at probe function
    usb: gadget: f_fs: avoid out of bounds access on comp_desc
    powerpc/slb: Force a full SLB flush when we insert for a bad EA
    mtd: spi-nor: fix spansion quad enable
    of: Add check to of_scan_flat_dt() before accessing initial_boot_params
    rxrpc: Fix several cases where a padded len isn't checked in ticket decode
    USB: usbip: fix nonconforming hub descriptor
    drm/amdgpu: adjust default display clock
    drm/amdgpu/atom: fix ps allocation size for EnableDispPowerGating
    drm/radeon: add a quirk for Toshiba Satellite L20-183
    drm/radeon: add a PX quirk for another K53TK variant
    iscsi-target: Reject immediate data underflow larger than SCSI transfer length
    target: Fix kref->refcount underflow in transport_cmd_finish_abort
    time: Fix clock->read(clock) race around clocksource changes
    Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list
    powerpc/kprobes: Pause function_graph tracing during jprobes handling
    signal: Only reschedule timers on signals timers have sent
    HID: Add quirk for Dell PIXART OEM mouse
    CIFS: Improve readdir verbosity
    KVM: PPC: Book3S HV: Preserve userspace HTM state properly
    lib/cmdline.c: fix get_options() overflow while parsing ranges
    autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
    fs/exec.c: account for argv/envp pointers
Linux 4.4.74
    mm: fix new crash in unmapped_area_topdown()
    Allow stack to grow up to address space limit
    mm: larger stack guard gap, between vmas
    alarmtimer: Rate limit periodic intervals
    MIPS: Fix bnezc/jialc return address calculation
    usb: dwc3: exynos fix axius clock error path to do cleanup
    alarmtimer: Prevent overflow of relative timers
    genirq: Release resources in __setup_irq() error path
    swap: cond_resched in swap_cgroup_prepare()
    mm/memory-failure.c: use compound_head() flags for huge pages
    USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
    usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
    drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
    usb: r8a66597-hcd: decrease timeout
    usb: r8a66597-hcd: select a different endpoint on timeout
    USB: gadget: dummy_hcd: fix hub-descriptor removable fields
    pvrusb2: reduce stack usage pvr2_eeprom_analyze()
    usb: core: fix potential memory leak in error path during hcd creation
    USB: hub: fix SS max number of ports
    iio: proximity: as3935: recalibrate RCO after resume
    staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
    mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
    x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
    serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
    mac80211: fix IBSS presp allocation size
    mac80211: fix CSA in IBSS mode
    mac80211/wpa: use constant time memory comparison for MACs
    mac80211: don't look at the PM bit of BAR frames
    vb2: Fix an off by one error in 'vb2_plane_vaddr'
    cpufreq: conservative: Allow down_threshold to take values from 1 to 10
    can: gs_usb: fix memory leak in gs_cmd_reset()
    configfs: Fix race between create_link and configfs_rmdir
Linux 4.4.73
    sparc64: make string buffers large enough
    s390/kvm: do not rely on the ILC on kvm host protection fauls
    xtensa: don't use linux IRQ #0
    tipc: ignore requests when the connection state is not CONNECTED
    proc: add a schedule point in proc_pid_readdir()
    romfs: use different way to generate fsid for BLOCK or MTD
    sctp: sctp_addr_id2transport should verify the addr before looking up assoc
    r8152: avoid start_xmit to schedule napi when napi is disabled
    r8152: fix rtl8152_post_reset function
    r8152: re-schedule napi for tx
    nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
    ravb: unmap descriptors when freeing rings
    drm/ast: Fixed system hanged if disable P2A
    drm/nouveau: Don't enabling polling twice on runtime resume
    parisc, parport_gsc: Fixes for printk continuation lines
    net: adaptec: starfire: add checks for dma mapping errors
    pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
    gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
    net/mlx4_core: Avoid command timeouts during VF driver device shutdown
    drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
    drm/nouveau: prevent userspace from deleting client object
    ipv6: fix flow labels when the traffic class is non-0
    FS-Cache: Initialise stores_lock in netfs cookie
    fscache: Clear outstanding writes when disabling a cookie
    fscache: Fix dead object requeue
    ethtool: do not vzalloc(0) on registers dump
    log2: make order_base_2() behave correctly on const input value zero
    kasan: respect /proc/sys/kernel/traceoff_on_warning
    jump label: pass kbuild_cflags when checking for asm goto support
    PM / runtime: Avoid false-positive warnings from might_sleep_if()
    ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
    i2c: piix4: Fix request_region size
    sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
    sierra_net: Skip validating irrelevant fields for IDLE LSIs
    net: hns: Fix the device being used for dma mapping during TX
    NET: mkiss: Fix panic
    NET: Fix /proc/net/arp for AX.25
    ipv6: Inhibit IPv4-mapped src address on the wire.
    ipv6: Handle IPv4-mapped src to in6addr_any dst.
    net: xilinx_emaclite: fix receive buffer overflow
    net: xilinx_emaclite: fix freezes due to unordered I/O
    Call echo service immediately after socket reconnect
    staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
    ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation
    partitions/msdos: FreeBSD UFS2 file systems are not recognized
    s390/vmem: fix identity mapping
Linux 4.4.72
    arm64: ensure extension of smp_store_release value
    arm64: armv8_deprecated: ensure extension of addr
    usercopy: Adjust tests to deal with SMAP/PAN
    RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
    arm64: entry: improve data abort handling of tagged pointers
    arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
    Make __xfs_xattr_put_listen preperly report errors.
    NFSv4: Don't perform cached access checks before we've OPENed the file
    NFS: Ensure we revalidate attributes before using execute_ok()
    mm: consider memblock reservations for deferred memory initialization sizing
    net: better skb->sender_cpu and skb->napi_id cohabitation
    serial: sh-sci: Fix panic when serial console and DMA are enabled
    tty: Drop krefs for interrupted tty lock
    drivers: char: mem: Fix wraparound check to allow mappings up to the end
    ASoC: Fix use-after-free at card unregistration
    ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
    ALSA: timer: Fix race between read and ioctl
    drm/nouveau/tmr: fully separate alarm execution/pending lists
    drm/vmwgfx: Make sure backup_handle is always valid
    drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
    drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
    perf/core: Drop kernel samples even though :u is specified
    powerpc/hotplug-mem: Fix missing endian conversion of aa_index
    powerpc/numa: Fix percpu allocations to be NUMA aware
    powerpc/eeh: Avoid use after free in eeh_handle_special_event()
    scsi: qla2xxx: don't disable a not previously enabled PCI device
    KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
    btrfs: fix memory leak in update_space_info failure path
    btrfs: use correct types for page indices in btrfs_page_exists_in_range
    cxl: Fix error path on bad ioctl
    ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
    ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
    ufs: set correct ->s_maxsize
    ufs: restore maintaining ->i_blocks
    fix ufs_isblockset()
    ufs: restore proper tail allocation
    fs: add i_blocksize()
    cpuset: consider dying css as offline
    Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
    drm/msm: Expose our reservation object when exporting a dmabuf.
    target: Re-add check to reject control WRITEs with overflow data
    cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
    stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
    random: properly align get_random_int_hash
    drivers: char: random: add get_random_long()
    iio: proximity: as3935: fix AS3935_INT mask
    iio: light: ltr501 Fix interchanged als/ps register field
    staging/lustre/lov: remove set_fs() call from lov_getstripe()
    usb: chipidea: debug: check before accessing ci_role
    usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
    usb: gadget: f_mass_storage: Serialize wake and sleep execution
    ext4: fix fdatasync(2) after extent manipulation operations
    ext4: keep existing extra fields when inode expands
    ext4: fix SEEK_HOLE
    xen-netfront: cast grant table reference first to type int
    xen-netfront: do not cast grant table reference to signed short
    xen/privcmd: Support correctly 64KB page granularity when mapping memory
    dmaengine: ep93xx: Always start from BASE0
    dmaengine: usb-dmac: Fix DMAOR AE bit definition
    KVM: async_pf: avoid async pf injection when in guest mode
    arm: KVM: Allow unaligned accesses at HYP
    KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
    kvm: async_pf: fix rcu_irq_enter() with irqs enabled
    nfsd: Fix up the "supattr_exclcreat" attributes
    nfsd4: fix null dereference on replay
    drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
    crypto: gcm - wait for crypto op not signal safe
    KEYS: fix freeing uninitialized memory in key_update()
    KEYS: fix dereferencing NULL payload with nonzero length
    ptrace: Properly initialize ptracer_cred on fork
    serial: ifx6x60: fix use-after-free on module unload
    arch/sparc: support NR_CPUS = 4096
    sparc64: delete old wrap code
    sparc64: new context wrap
    sparc64: add per-cpu mm of secondary contexts
    sparc64: redefine first version
    sparc64: combine activate_mm and switch_mm
    sparc64: reset mm cpumask after wrap
    sparc: Machine description indices can vary
    sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
    net: bridge: start hello timer only if device is up
    net: ethoc: enable NAPI before poll may be scheduled
    net: ping: do not abuse udp_poll()
    ipv6: Fix leak in ipv6_gso_segment().
    vxlan: fix use-after-free on deletion
    tcp: disallow cwnd undo when switching congestion control
    cxgb4: avoid enabling napi twice to the same queue
    ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
    bnx2x: Fix Multi-Cos
Linux 4.4.71
    xfs: only return -errno or success from attr ->put_listent
    xfs: in _attrlist_by_handle, copy the cursor back to userspace
    xfs: fix unaligned access in xfs_btree_visit_blocks
    xfs: bad assertion for delalloc an extent that start at i_size
    xfs: fix indlen accounting error on partial delalloc conversion
    xfs: wait on new inodes during quotaoff dquot release
    xfs: update ag iterator to support wait on new inodes
    xfs: support ability to wait on new inodes
    xfs: fix up quotacheck buffer list error handling
    xfs: prevent multi-fsb dir readahead from reading random blocks
    xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
    xfs: fix over-copying of getbmap parameters from userspace
    xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
    xfs: Fix missed holes in SEEK_HOLE implementation
    mlock: fix mlock count can not decrease in race condition
    mm/migrate: fix refcount handling when !hugepage_migration_supported()
    drm/gma500/psb: Actually use VBT mode when it is found
    slub/memcg: cure the brainless abuse of sysfs attributes
    ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
    pcmcia: remove left-over %Z format
    drm/radeon: Unbreak HPD handling for r600+
    drm/radeon/ci: disable mclk switching for high refresh rates (v2)
    scsi: mpt3sas: Force request partial completion alignment
    HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
    mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
    i2c: i2c-tiny-usb: fix buffer not being DMA capable
    vlan: Fix tcp checksum offloads in Q-in-Q vlans
    net: phy: marvell: Limit errata to 88m1101
    netem: fix skb_orphan_partial()
    ipv4: add reference counting to metrics
    sctp: fix ICMP processing if skb is non-linear
    tcp: avoid fastopen API to be used on AF_UNSPEC
    virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
    be2net: Fix offload features for Q-in-Q packets
    ipv6: fix out of bound writes in __ip6_append_data()
    bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
    qmi_wwan: add another Lenovo EM74xx device ID
    bridge: netlink: check vlan_default_pvid range
    ipv6: Check ip6_find_1stfragopt() return value properly.
    ipv6: Prevent overrun when parsing v6 header options
    net: Improve handling of failures on link and route dumps
    tcp: eliminate negative reordering in tcp_clean_rtx_queue
    sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
    sctp: fix src address selection if using secondary addresses for ipv6
    tcp: avoid fragmenting peculiar skbs in SACK
    s390/qeth: avoid null pointer dereference on OSN
    s390/qeth: unbreak OSM and OSN support
    s390/qeth: handle sysfs error during initialization
    ipv6/dccp: do not inherit ipv6_mc_list from parent
    dccp/tcp: do not inherit mc_list from parent
    sparc: Fix -Wstringop-overflow warning

Bug: 62730977
Change-Id: Ifca755d82f9e4b11016f6660298c2c1b073bfb3a
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2017-09-20 16:42:37 -07:00
Fabian Frederick
044470266a fs: add i_blocksize()
commit 93407472a21b82f39c955ea7787e5bc7da100642 upstream.

Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
  Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:24 +02:00
Laura Abbott
a2ac6d40d3 fs/buffer.c: Revoke LRU when trying to drop buffers
When a buffer is added to the LRU list, a reference is taken which is
not dropped until the buffer is evicted from the LRU list. This is the
correct behavior, however this LRU reference will prevent the buffer
from being dropped. This means that the buffer can't actually be dropped
until it is selected for eviction. There's no bound on the time spent
on the LRU list, which means that the buffer may be undroppable for
very long periods of time. Given that migration involves dropping
buffers, the associated page is now unmigratible for long periods of
time as well. CMA relies on being able to migrate a specific range
of pages, so these these types of failures make CMA significantly
less reliable, especially under high filesystem usage.

Rather than waiting for the LRU algorithm to eventually kick out
the buffer, explicitly remove the buffer from the LRU list when trying
to drop it. There is still the possibility that the buffer
could be added back on the list, but that indicates the buffer is
still in use and would probably have other 'in use' indicates to
prevent dropping.

Change-Id: I253f4ee2069e190c1115afc421dadd27a7fa87dc
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-06-01 15:20:51 -07:00
Venkat Gopalakrishnan
a627c73479 block/fs: make tracking dirty task debug only
Adding a new element "tsk_dirty" to struct page increases the size
of mem_map/vmemmap, restrict this to a debug only functionality to
save few MB of memory.

Considering a system with 1G of RAM, there will be nearly 262144
pages and thus that many number of page structures in mem_map/vmemmap.
With pointer size of 8 bytes on a 64 bit system, adding this
pointer to "struct page" means an increase of "2MB" for mem_map.

CRs-Fixed: 738692
Change-Id: Idf3217dcbe17cf1ab4d462d2aa8d39da1ffd8b13
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflict]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:02:01 -07:00
Venkat Gopalakrishnan
014929f975 block/fs: keep track of the task that dirtied the page
Background writes happen in the context of a background thread.
It is very useful to identify the actual task that generated the
request instead of background task that submited the request.
Hence keep track of the task when a page gets dirtied and dump
this task info while tracing. Not all the pages in the bio are
dirtied by the same task but most likely it will be, since the
sectors accessed on the device must be adjacent.

Change-Id: I6afba85a2063dd3350a0141ba87cf8440ce9f777
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflicts]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:02:00 -07:00
Ross Zwisler
5c50002963 vfs: remove unused wrapper block_page_mkwrite()
The function currently called "__block_page_mkwrite()" used to be called
"block_page_mkwrite()" until a wrapper for this function was added by:

commit 24da4fab5a ("vfs: Create __block_page_mkwrite() helper passing
	error values back")

This wrapper, the current "block_page_mkwrite()", is currently unused.
__block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.

Remove the unused wrapper, rename __block_page_mkwrite() back to
block_page_mkwrite() and update the comment above block_page_mkwrite().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Jan Kara <jack@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11 02:19:33 -05:00
Michal Hocko
c62d25556b mm, fs: introduce mapping_gfp_constraint()
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kent Overstreet
6cf66b4caf fs: use helper bio_add_page() instead of open coding on bi_io_vec
Call pre-defined helper bio_add_page() instead of open coding for
iterating through bi_io_vec[]. Doing that, it's possible to make some
parts in filesystems and mm/page_io.c simpler than before.

Acked-by: Dave Kleikamp <shaggy@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-13 12:32:00 -06:00
Jens Axboe
b7c44ed9d2 block: manipulate bio->bi_flags through helpers
Some places use helpers now, others don't. We only have the 'is set'
helper, add helpers for setting and clearing flags too.

It was a bit of a mess of atomic vs non-atomic access. With
BIO_UPTODATE gone, we don't have any risk of concurrent access to the
flags. So relax the restriction and don't make any of them atomic. The
flags that do have serialization issues (reffed and chained), we
already handle those separately.

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29 08:55:20 -06:00
Christoph Hellwig
4246a0b63b block: add a bi_error field to struct bio
Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29 08:55:15 -06:00
Jens Axboe
d2e73fcceb buffer: remove unusued 'ret' variable
Merge hickup on my part, due to a clash between the writeback
changes and the EOPNOTSUPP removal in _submit_bh().

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 09:22:34 -06:00
Tejun Heo
2a81490811 writeback: implement foreign cgroup inode detection
As concurrent write sharing of an inode is expected to be very rare
and memcg only tracks page ownership on first-use basis severely
confining the usefulness of such sharing, cgroup writeback tracks
ownership per-inode.  While the support for concurrent write sharing
of an inode is deemed unnecessary, an inode being written to by
different cgroups at different points in time is a lot more common,
and, more importantly, charging only by first-use can too readily lead
to grossly incorrect behaviors (single foreign page can lead to
gigabytes of writeback to be incorrectly attributed).

To resolve this issue, cgroup writeback detects the majority dirtier
of an inode and will transfer the ownership to it.  To avoid
unnnecessary oscillation, the detection mechanism keeps track of
history and gives out the switch verdict only if the foreign usage
pattern is stable over a certain amount of time and/or writeback
attempts.

The detection mechanism has fairly low space and computation overhead.
It adds 8 bytes to struct inode (one int and two u16's) and minimal
amount of calculation per IO.  The detection mechanism converges to
the correct answer usually in several seconds of IO time when there's
a clear majority dirtier.  Even when there isn't, it can reach an
acceptable answer fairly quickly under most circumstances.

Please see wb_detach_inode() for more details.

This patch only implements detection.  Following patches will
implement actual switching.

v2: wbc_account_io() now checks whether the wbc is associated with a
    wb before dereferencing it.  This can happen when pageout() is
    writing pages directly without going through the usual writeback
    path.  As pageout() path is single-threaded, we don't want it to
    be blocked behind a slow cgroup and ultimately want it to delegate
    actual writing to the usual writeback path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:40:20 -06:00
Tejun Heo
b16b1deb55 writeback: make writeback_control track the inode being written back
Currently, for cgroup writeback, the IO submission paths directly
associate the bio's with the blkcg from inode_to_wb_blkcg_css();
however, it'd be necessary to keep more writeback context to implement
foreign inode writeback detection.  wbc (writeback_control) is the
natural fit for the extra context - it persists throughout the
writeback of each inode and is passed all the way down to IO
submission paths.

This patch adds wbc_attach_and_unlock_inode(), wbc_detach_inode(), and
wbc_attach_fdatawrite_inode() which are used to associate wbc with the
inode being written back.  IO submission paths now use wbc_init_bio()
instead of directly associating bio's with blkcg themselves.  This
leaves inode_to_wb_blkcg_css() w/o any user.  The function is removed.

wbc currently only tracks the associated wb (bdi_writeback).  Future
patches will add more for foreign inode detection.  The association is
established under i_lock which will be depended upon when migrating
foreign inodes to other wb's.

As currently, once established, inode to wb association never changes,
going through wbc when initializing bio's doesn't cause any behavior
changes.

v2: submit_blk_blkcg() now checks whether the wbc is associated with a
    wb before dereferencing it.  This can happen when pageout() is
    writing pages directly without going through the usual writeback
    path.  As pageout() path is single-threaded, we don't want it to
    be blocked behind a slow cgroup and ultimately want it to delegate
    actual writing to the usual writeback path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:39:48 -06:00
Tejun Heo
bafc0dba1e buffer, writeback: make __block_write_full_page() honor cgroup writeback
[__]block_write_full_page() is used to implement ->writepage in
various filesystems.  All writeback logic is now updated to handle
cgroup writeback and the block cgroup to issue IOs for is encoded in
writeback_control and can be retrieved from the inode; however,
[__]block_write_full_page() currently ignores the blkcg indicated by
inode and issues all bio's without explicit blkcg association.

This patch adds submit_bh_blkcg() which associates the bio with the
specified blkio cgroup before issuing and uses it in
__block_write_full_page() so that the issued bio's are associated with
inode_to_wb_blkcg_css(inode).

v2: Updated for per-inode wb association.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:37:23 -06:00
Greg Thelen
c4843a7593 memcg: add per cgroup dirty page accounting
When modifying PG_Dirty on cached file pages, update the new
MEM_CGROUP_STAT_DIRTY counter.  This is done in the same places where
global NR_FILE_DIRTY is managed.  The new memcg stat is visible in the
per memcg memory.stat cgroupfs file.  The most recent past attempt at
this was http://thread.gmane.org/gmane.linux.kernel.cgroups/8632

The new accounting supports future efforts to add per cgroup dirty
page throttling and writeback.  It also helps an administrator break
down a container's memory usage and provides evidence to understand
memcg oom kills (the new dirty count is included in memcg oom kill
messages).

The ability to move page accounting between memcg
(memory.move_charge_at_immigrate) makes this accounting more
complicated than the global counter.  The existing
mem_cgroup_{begin,end}_page_stat() lock is used to serialize move
accounting with stat updates.
Typical update operation:
	memcg = mem_cgroup_begin_page_stat(page)
	if (TestSetPageDirty()) {
		[...]
		mem_cgroup_update_page_stat(memcg)
	}
	mem_cgroup_end_page_stat(memcg)

Summary of mem_cgroup_end_page_stat() overhead:
- Without CONFIG_MEMCG it's a no-op
- With CONFIG_MEMCG and no inter memcg task movement, it's just
  rcu_read_lock()
- With CONFIG_MEMCG and inter memcg  task movement, it's
  rcu_read_lock() + spin_lock_irqsave()

A memcg parameter is added to several routines because their callers
now grab mem_cgroup_begin_page_stat() which returns the memcg later
needed by for mem_cgroup_update_page_stat().

Because mem_cgroup_begin_page_stat() may disable interrupts, some
adjustments are needed:
- move __mark_inode_dirty() from __set_page_dirty() to its caller.
  __mark_inode_dirty() locking does not want interrupts disabled.
- use spin_lock_irqsave(tree_lock) rather than spin_lock_irq() in
  __delete_from_page_cache(), replace_page_cache_page(),
  invalidate_complete_page2(), and __remove_mapping().

   text    data     bss      dec    hex filename
8925147 1774832 1785856 12485835 be84cb vmlinux-!CONFIG_MEMCG-before
8925339 1774832 1785856 12486027 be858b vmlinux-!CONFIG_MEMCG-after
                            +192 text bytes
8965977 1784992 1785856 12536825 bf4bf9 vmlinux-CONFIG_MEMCG-before
8966750 1784992 1785856 12537598 bf4efe vmlinux-CONFIG_MEMCG-after
                            +773 text bytes

Performance tests run on v4.0-rc1-36-g4f671fe2f952.  Lower is better for
all metrics, they're all wall clock or cycle counts.  The read and write
fault benchmarks just measure fault time, they do not include I/O time.

* CONFIG_MEMCG not set:
                            baseline                              patched
  kbuild                 1m25.030000(+-0.088% 3 samples)       1m25.426667(+-0.120% 3 samples)
  dd write 100 MiB          0.859211561 +-15.10%                  0.874162885 +-15.03%
  dd write 200 MiB          1.670653105 +-17.87%                  1.669384764 +-11.99%
  dd write 1000 MiB         8.434691190 +-14.15%                  8.474733215 +-14.77%
  read fault cycles       254.0(+-0.000% 10 samples)            253.0(+-0.000% 10 samples)
  write fault cycles     2021.2(+-3.070% 10 samples)           1984.5(+-1.036% 10 samples)

* CONFIG_MEMCG=y root_memcg:
                            baseline                              patched
  kbuild                 1m25.716667(+-0.105% 3 samples)       1m25.686667(+-0.153% 3 samples)
  dd write 100 MiB          0.855650830 +-14.90%                  0.887557919 +-14.90%
  dd write 200 MiB          1.688322953 +-12.72%                  1.667682724 +-13.33%
  dd write 1000 MiB         8.418601605 +-14.30%                  8.673532299 +-15.00%
  read fault cycles       266.0(+-0.000% 10 samples)            266.0(+-0.000% 10 samples)
  write fault cycles     2051.7(+-1.349% 10 samples)           2049.6(+-1.686% 10 samples)

* CONFIG_MEMCG=y non-root_memcg:
                            baseline                              patched
  kbuild                 1m26.120000(+-0.273% 3 samples)       1m25.763333(+-0.127% 3 samples)
  dd write 100 MiB          0.861723964 +-15.25%                  0.818129350 +-14.82%
  dd write 200 MiB          1.669887569 +-13.30%                  1.698645885 +-13.27%
  dd write 1000 MiB         8.383191730 +-14.65%                  8.351742280 +-14.52%
  read fault cycles       265.7(+-0.172% 10 samples)            267.0(+-0.000% 10 samples)
  write fault cycles     2070.6(+-1.512% 10 samples)           2084.4(+-2.148% 10 samples)

As expected anon page faults are not affected by this patch.

tj: Updated to apply on top of the recent cancel_dirty_page() changes.

Signed-off-by: Sha Zhengju <handai.szj@gmail.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Tejun Heo
11f81becca page_writeback: revive cancel_dirty_page() in a restricted form
cancel_dirty_page() had some issues and b9ea25152e ("page_writeback:
clean up mess around cancel_dirty_page()") replaced it with
account_page_cleaned() which makes the caller responsible for clearing
the dirty bit; unfortunately, the planned changes for cgroup writeback
support requires synchronization between dirty bit manipulation and
stat updates.  While we can open-code such synchronization in each
account_page_cleaned() callsite, that's gonna be unnecessarily awkward
and verbose.

This patch revives cancel_dirty_page() but in a more restricted form.
All it does is TestClearPageDirty() followed by account_page_cleaned()
invocation if the page was dirty.  This helper covers all
account_page_cleaned() usages except for __delete_from_page_cache()
which is a special case anyway and left alone.  As this leaves no
module user for account_page_cleaned(), EXPORT_SYMBOL() is dropped
from it.

This patch just revives cancel_dirty_page() as a trivial wrapper to
replace equivalent usages and doesn't introduce any functional
changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02 08:33:33 -06:00
Julia Lawall
f6454b049d block: fix returnvar.cocci warnings
Remove unneeded variable used to store return value.

Generated by: scripts/coccinelle/misc/returnvar.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-26 14:02:45 -06:00
Christoph Hellwig
b25de9d6da block: remove BIO_EOPNOTSUPP
Since the big barrier rewrite/removal in 2007 we never fail FLUSH or
FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag
to help propagating those to the buffer_head layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:17:03 -06:00
Konstantin Khlebnikov
b9ea25152e page_writeback: clean up mess around cancel_dirty_page()
This patch replaces cancel_dirty_page() with a helper function
account_page_cleaned() which only updates counters.  It's called from
truncate_complete_page() and from try_to_free_buffers() (hack for ext3).
Page is locked in both cases, page-lock protects against concurrent
dirtiers: see commit 2d6d7f9828 ("mm: protect set_page_dirty() from
ongoing truncation").

Delete_from_page_cache() shouldn't be called for dirty pages, they must
be handled by caller (either written or truncated).  This patch treats
final dirty accounting fixup at the end of __delete_from_page_cache() as
a debug check and adds WARN_ON_ONCE() around it.  If something removes
dirty pages without proper handling that might be a bug and unwritten
data might be lost.

Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough
here.

cancel_dirty_page() in nfs_wb_page_cancel() is redundant.  This is
helper for nfs_invalidate_page() and it's called only in case complete
invalidation.

The mess was started in v2.6.20 after commits 46d2277c79 ("Clean up
and make try_to_free_buffers() not race with dirty pages") and
3e67c0987d ("truncate: clear page dirtiness before running
try_to_free_buffers()") first was reverted right in v2.6.20 in commit
ecdfc9787f ("Resurrect 'try_to_free_buffers()' VM hackery"), second in
v2.6.25 commit a2b345642f ("Fix dirty page accounting leak with ext3
data=journal").

Custom fixes were introduced between these points.  NFS in v2.6.23, commit
1b3b4a1a2d ("NFS: Fix a write request leak in nfs_invalidate_page()").
Kludge in __delete_from_page_cache() in v2.6.24, commit 3a6927906f ("Do
dirty page accounting when removing a page from the page cache").  Since
v2.6.25 all of them are redundant.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:01 -07:00
Robert Elliott
432f16e64f fs: clarify rate limit suppressed buffer I/O errors
When quiet_error applies rate limiting to buffer_io_error calls, what the
they apply to is unclear because the name is so generic, particularly
if the messages are interleaved with others:

[ 1936.063572] quiet_error: 664293 callbacks suppressed
[ 1936.065297] Buffer I/O error on dev sdr, logical block 257429952, lost async page write
[ 1936.067814] Buffer I/O error on dev sdr, logical block 257429953, lost async page write

Also, the function uses printk_ratelimit(), although printk.h includes a
comment advising "Please don't use... Instead use printk_ratelimited()."

Change buffer_io_error to check the BH_Quiet bit itself, drop the
printk_ratelimit call, and print using printk_ratelimited.

This makes the messages look like:

[  387.208839] buffer_io_error: 676394 callbacks suppressed
[  387.210693] Buffer I/O error on dev sdr, logical block 211291776, lost async page write
[  387.213432] Buffer I/O error on dev sdr, logical block 211291777, lost async page write

Signed-off-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-21 13:55:11 -06:00
Robert Elliott
b744c2ac4b fs: merge I/O error prints into one line
buffer.c uses two printk calls to print these messages:
[67353.422338] Buffer I/O error on device sdr, logical block 212868488
[67353.422338] lost page write due to I/O error on sdr

In a busy system, they may be interleaved with other prints,
losing the context for the second message.  Merge them into
one line with one printk call so the prints are atomic.

Also, differentiate between async page writes, sync page writes, and
async page reads.

Also, shorten "device" to "dev" to match the block layer prints:
[67353.467906] blk_update_request: critical target error, dev sdr, sector
1707107328

Also, use %llu rather than %Lu.

Resulting prints look like:
[ 1356.437006] blk_update_request: critical target error, dev sdr, sector 1719693992
[ 1361.383522] quiet_error: 659876 callbacks suppressed
[ 1361.385816] Buffer I/O error on dev sdr, logical block 256902912, lost async page write
[ 1361.385819] Buffer I/O error on dev sdr, logical block 256903644, lost async page write

Signed-off-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-21 13:55:09 -06:00
Linus Torvalds
c2661b8060 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "A large number of cleanups and bug fixes, with some (minor) journal
  optimizations"

[ This got sent to me before -rc1, but was stuck in my spam folder.   - Linus ]

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits)
  ext4: check s_chksum_driver when looking for bg csum presence
  ext4: move error report out of atomic context in ext4_init_block_bitmap()
  ext4: Replace open coded mdata csum feature to helper function
  ext4: delete useless comments about ext4_move_extents
  ext4: fix reservation overflow in ext4_da_write_begin
  ext4: add ext4_iget_normal() which is to be used for dir tree lookups
  ext4: don't orphan or truncate the boot loader inode
  ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
  ext4: optimize block allocation on grow indepth
  ext4: get rid of code duplication
  ext4: fix over-defensive complaint after journal abort
  ext4: fix return value of ext4_do_update_inode
  ext4: fix mmap data corruption when blocksize < pagesize
  vfs: fix data corruption when blocksize < pagesize for mmaped data
  ext4: fold ext4_nojournal_sops into ext4_sops
  ext4: support freezing ext2 (nojournal) file systems
  ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs()
  ext4: don't check quota format when there are no quota files
  jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list
  jbd2: avoid pointless scanning of checkpoint lists
  ...
2014-10-20 09:50:11 -07:00
Zach Brown
9470dd5d35 fs: check bh blocknr earlier when searching lru
It's very common for the buffer heads in the lru to have different block
numbers.  By comparing the blocknr before the bdev and size we can
reduce the cost of searching in the very common case where all the
entries have the same bdev and size.

In quick hot cache cycle counting tests on a single fs workstation this
cut the cost of a miss by about 20%.

A diff of the disassembly shows the reordering of the bdev and blocknr
comparisons.  This is in such a tiny loop that skipping one comparison
is a meaningful portion of the total work being done:

     1628:      83 c1 01                add    $0x1,%ecx
     162b:      83 f9 08                cmp    $0x8,%ecx
     162e:      74 60                   je     1690 <__find_get_block+0xa0>
     1630:      89 c8                   mov    %ecx,%eax
     1632:      65 4c 8b 04 c5 00 00    mov    %gs:0x0(,%rax,8),%r8
     1639:      00 00
     163b:      4d 85 c0                test   %r8,%r8
     163e:      4c 89 c3                mov    %r8,%rbx
     1641:      74 e5                   je     1628 <__find_get_block+0x38>
-    1643:      4d 3b 68 30             cmp    0x30(%r8),%r13
+    1643:      4d 3b 68 18             cmp    0x18(%r8),%r13
     1647:      75 df                   jne    1628 <__find_get_block+0x38>
-    1649:      4d 3b 60 18             cmp    0x18(%r8),%r12
+    1649:      4d 3b 60 30             cmp    0x30(%r8),%r12
     164d:      75 d9                   jne    1628 <__find_get_block+0x38>
     164f:      49 39 50 20             cmp    %rdx,0x20(%r8)
     1653:      75 d3                   jne    1628 <__find_get_block+0x38>

Signed-off-by: Zach Brown <zab@zabbo.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:26 +02:00
Linus Torvalds
77c688ac87 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The big thing in this pile is Eric's unmount-on-rmdir series; we
  finally have everything we need for that.  The final piece of prereqs
  is delayed mntput() - now filesystem shutdown always happens on
  shallow stack.

  Other than that, we have several new primitives for iov_iter (Matt
  Wilcox, culled from his XIP-related series) pushing the conversion to
  ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
  cleanups and fixes (including the external name refcounting, which
  gives consistent behaviour of d_move() wrt procfs symlinks for long
  and short names alike) and assorted cleanups and fixes all over the
  place.

  This is just the first pile; there's a lot of stuff from various
  people that ought to go in this window.  Starting with
  unionmount/overlayfs mess...  ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
  fs/file_table.c: Update alloc_file() comment
  vfs: Deduplicate code shared by xattr system calls operating on paths
  reiserfs: remove pointless forward declaration of struct nameidata
  don't need that forward declaration of struct nameidata in dcache.h anymore
  take dname_external() into fs/dcache.c
  let path_init() failures treated the same way as subsequent link_path_walk()
  fix misuses of f_count() in ppp and netlink
  ncpfs: use list_for_each_entry() for d_subdirs walk
  vfs: move getname() from callers to do_mount()
  gfs2_atomic_open(): skip lookups on hashed dentry
  [infiniband] remove pointless assignments
  gadgetfs: saner API for gadgetfs_create_file()
  f_fs: saner API for ffs_sb_create_file()
  jfs: don't hash direct inode
  [s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
  ecryptfs: ->f_op is never NULL
  android: ->f_op is never NULL
  nouveau: __iomem misannotations
  missing annotation in fs/file.c
  fs: namespace: suppress 'may be used uninitialized' warnings
  ...
2014-10-13 11:28:42 +02:00
Sebastien Buisson
86cf78d73d fs/buffer.c: increase the buffer-head per-CPU LRU size
Increase the buffer-head per-CPU LRU size to allow efficient filesystem
operations that access many blocks for each transaction.  For example,
creating a file in a large ext4 directory with quota enabled will access
multiple buffer heads and will overflow the LRU at the default 8-block LRU
size:

* parent directory inode table block (ctime, nlinks for subdirs)
* new inode bitmap
* inode table block
* 2 quota blocks
* directory leaf block (not reused, but pollutes one cache entry)
* 2 levels htree blocks (only one is reused, other pollutes cache)
* 2 levels indirect/index blocks (only one is reused)

The buffer-head per-CPU LRU size is raised to 16, as it shows in metadata
performance benchmarks up to 10% gain for create, 4% for lookup and 7% for
destroy.

Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:26:02 -04:00
Akinobu Mita
4db96b71e3 vfs: guard end of device for mpage interface
Add guard_bio_eod() check for mpage code in order to allow us to do IO
even on the odd last sectors of a device, even if the block size is some
multiple of the physical sector size.

Using mpage_readpages() for block device requires this guard check.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:53 -04:00
Akinobu Mita
59d43914ed vfs: make guard_bh_eod() more generic
This patchset implements readpages() operation for block device by using
mpage_readpages() which can create multipage BIOs instead of BIOs for each
page and reduce system CPU time consumption.

This patch (of 3):

guard_bh_eod() is used in submit_bh() to allow us to do IO even on the odd
last sectors of a device, even if the block size is some multiple of the
physical sector size.  This makes guard_bh_eod() more generic and renames
it guard_bio_eod() so that we can use it without struct buffer_head
argument.

The reason for this change is that using mpage_readpages() for block
device requires to add this guard check in mpage code.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:53 -04:00
Mikulas Patocka
c2ca0fcd20 fs: make cont_expand_zero interruptible
This patch makes it possible to kill a process looping in
cont_expand_zero. A process may spend a lot of time in this function, so
it is desirable to be able to kill it.

It happened to me that I wanted to copy a piece data from the disk to a
file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
to the "seek" parameter, dd attempted to extend the file and became stuck
doing so - the only possibility was to reset the machine or wait many
hours until the filesystem runs out of space and cont_expand_zero fails.
We need this patch to be able to terminate the process.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:03 -04:00
Jan Kara
90a8020278 vfs: fix data corruption when blocksize < pagesize for mmaped data
->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';       ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  mremap(map, 1024, 10000, 0);
  map[4095] = 'a';    ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-10-01 21:49:18 -04:00
Anton Altaparmakov
f2d5a94436 Fix nasty 32-bit overflow bug in buffer i/o code.
On 32-bit architectures, the legacy buffer_head functions are not always
handling the sector number with the proper 64-bit types, and will thus
fail on 4TB+ disks.

Any code that uses __getblk() (and thus bread(), breadahead(),
sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
in __getblk_slow() with an infinite stream of errors logged to dmesg
like this:

  __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
  b_state=0x00000020, b_size=512
  device sda1 blocksize: 512

Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
top 32-bits are missing (in this case the 0x1 at the top).

This is because grow_dev_page() is broken and has a 32-bit overflow due
to shifting the page index value (a pgoff_t - which is just 32 bits on
32-bit architectures) left-shifted as the block number.  But the top
bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
before the shift.

This patch fixes this issue by type casting "index" to sector_t before
doing the left shift.

Note this is not a theoretical bug but has been seen in the field on a
4TiB hard drive with logical sector size 512 bytes.

This patch has been verified to fix the infinite loop problem on 3.17-rc5
kernel using a 4TB disk image mounted using "-o loop".  Without this patch
doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
100% reproducibly whilst with the patch it works fine as expected.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-22 08:41:16 -07:00
Gioh Kim
3b5e6454aa fs/buffer.c: support buffer cache allocations with gfp modifiers
A buffer cache is allocated from movable area because it is referred
for a while and released soon.  But some filesystems are taking buffer
cache for a long time and it can disturb page migration.

New APIs are introduced to allocate buffer cache with user specific
flag.  *_gfp APIs are for user want to set page allocation flag for
page cache allocation.  And *_unmovable APIs are for the user wants to
allocate page cache from non-movable area.

Signed-off-by: Gioh Kim <gioh.kim@lge.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2014-09-04 22:04:42 -04:00
NeilBrown
743162013d sched: Remove proliferation of wait_on_bit() action functions
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 15:10:39 +02:00
Mel Gorman
2457aec637 mm: non-atomically mark page accessed during page cache allocation where possible
aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after.  Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage.  The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page.  This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.

The primary APIs of concern in this care are the following and are used
by most filesystems.

	find_get_page
	find_lock_page
	find_or_create_page
	grab_cache_page_nowait
	grab_cache_page_write_begin

All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not.  Then
old API is preserved but is basically a thin wrapper around this core
function.

Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job.  There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted.  This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change.  It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.

The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations.  The size of the
file is 1/10th physical memory to avoid dirty page balancing.  In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO.  The sync results are expected to be
more stable.  The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.

The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts.  Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison.  As async results were variable
do to writback timings, I'm only reporting the maximum figures.  The sync
results were stable enough to make the mean and stddev uninteresting.

The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.

async dd
                                    3.15.0-rc3            3.15.0-rc3
                                       vanilla           accessed-v2
ext3    Max      elapsed     13.9900 (  0.00%)     11.5900 ( 17.16%)
tmpfs	Max      elapsed      0.5100 (  0.00%)      0.4900 (  3.92%)
btrfs   Max      elapsed     12.8100 (  0.00%)     12.7800 (  0.23%)
ext4	Max      elapsed     18.6000 (  0.00%)     13.3400 ( 28.28%)
xfs	Max      elapsed     12.5600 (  0.00%)      2.0900 ( 83.36%)

The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.

        samples percentage
ext3       86107    0.9783  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext3       23833    0.2710  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3        5036    0.0573  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4       64566    0.8961  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext4        5322    0.0713  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4        2869    0.0384  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs        62126    1.7675  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
xfs         1904    0.0554  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs          103    0.0030  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs      10655    0.1338  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
btrfs       2020    0.0273  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs        587    0.0079  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs      59562    3.2628  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
tmpfs       1210    0.0696  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs         94    0.0054  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:10 -07:00
Mel Gorman
e7470ee89f fs: buffer: do not use unnecessary atomic operations when discarding buffers
Discarding buffers uses a bunch of atomic operations when discarding
buffers because ......  I can't think of a reason.  Use a cmpxchg loop to
clear all the necessary flags.  In most (all?) cases this will be a single
atomic operations.

[akpm@linux-foundation.org: move BUFFER_FLAGS_DISCARD into the .c file]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:10 -07:00
Matthew Wilcox
1b938c0827 fs/buffer.c: remove block_write_full_page_endio()
The last in-tree caller of block_write_full_page_endio() was removed in
January 2013.  It's time to remove the EXPORT_SYMBOL, which leaves
block_write_full_page() as the only caller of
block_write_full_page_endio(), so inline block_write_full_page_endio()
into block_write_full_page().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dheeraj Reddy <dheeraj.reddy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:02 -07:00
Peter Zijlstra
4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
Linus Torvalds
5166701b36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe->buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
2014-04-12 14:49:50 -07:00
Al Viro
c186afb4db switch ->is_partially_uptodate() to saner arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:19 -04:00
Jiri Kosina
d4263348f7 Merge branch 'master' into for-next 2014-02-20 14:54:28 +01:00
Masanari Iida
e227867f12 treewide: Fix typo in Documentation/DocBook
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-19 14:58:17 +01:00
KOSAKI Motohiro
227d53b397 mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.

aio_migratepage  [disable interrupt]
  migrate_page_copy
    clear_page_dirty_for_io
      set_page_dirty
        __set_page_dirty_buffers
          __set_page_dirty
            spin_lock_irq

This mean, current aio migration is a deadlockable.  spin_lock_irqsave
is a safer alternative and we should use it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Rientjes rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06 13:48:51 -08:00
Christoph Lameter
ca6673b02e block: Replace __this_cpu_ptr with raw_cpu_ptr
__this_cpu_ptr is being phased out.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-12-03 19:19:41 -07:00
Kent Overstreet
4f024f3797 block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23 22:33:47 -08:00
Johannes Weiner
84235de394 fs: buffer: move allocation failure loop into the allocator
Buffer allocation has a very crude indefinite loop around waking the
flusher threads and performing global NOFS direct reclaim because it can
not handle allocation failures.

The most immediate problem with this is that the allocation may fail due
to a memory cgroup limit, where flushers + direct reclaim might not make
any progress towards resolving the situation at all.  Because unlike the
global case, a memory cgroup may not have any cache at all, only
anonymous pages but no swap.  This situation will lead to a reclaim
livelock with insane IO from waking the flushers and thrashing unrelated
filesystem cache in a tight loop.

Use __GFP_NOFAIL allocations for buffers for now.  This makes sure that
any looping happens in the page allocator, which knows how to
orchestrate kswapd, direct reclaim, and the flushers sensibly.  It also
allows memory cgroups to detect allocations that can't handle failure
and will allow them to ultimately bypass the limit if reclaim can not
make progress.

Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
Mel Gorman
b45972265f mm: vmscan: take page buffers dirty and locked state into account
Page reclaim keeps track of dirty and under writeback pages and uses it
to determine if wait_iff_congested() should stall or if kswapd should
begin writing back pages.  This fails to account for buffer pages that
can be under writeback but not PageWriteback which is the case for
filesystems like ext3 ordered mode.  Furthermore, PageDirty buffer pages
can have all the buffers clean and writepage does no IO so it should not
be accounted as congested.

This patch adds an address_space operation that filesystems may
optionally use to check if a page is really dirty or really under
writeback.  An implementation is provided for for buffer_heads is added
and used for block operations and ext3 in ordered mode.  By default the
page flags are obeyed.

Credit goes to Jan Kara for identifying that the page flags alone are
not sufficient for ext3 and sanity checking a number of ideas on how the
problem could be addressed.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:29 -07:00
Lukas Czerner
d47992f86b mm: change invalidatepage prototype to accept length
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
2013-05-21 23:17:23 -04:00
Linus Torvalds
4de13d7aa8 Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block
Pull block core updates from Jens Axboe:

 - Major bit is Kents prep work for immutable bio vecs.

 - Stable candidate fix for a scheduling-while-atomic in the queue
   bypass operation.

 - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
   discard bios.

 - Tejuns changes to convert the writeback thread pool to the generic
   workqueue mechanism.

 - Runtime PM framework, SCSI patches exists on top of these in James'
   tree.

 - A few random fixes.

* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
  relay: move remove_buf_file inside relay_close_buf
  partitions/efi.c: replace useless kzalloc's by kmalloc's
  fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
  block: fix max discard sectors limit
  blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
  Documentation: cfq-iosched: update documentation help for cfq tunables
  writeback: expose the bdi_wq workqueue
  writeback: replace custom worker pool implementation with unbound workqueue
  writeback: remove unused bdi_pending_list
  aoe: Fix unitialized var usage
  bio-integrity: Add explicit field for owner of bip_buf
  block: Add an explicit bio flag for bios that own their bvec
  block: Add bio_alloc_pages()
  block: Convert some code to bio_for_each_segment_all()
  block: Add bio_for_each_segment_all()
  bounce: Refactor __blk_queue_bounce to not use bi_io_vec
  raid1: use bio_copy_data()
  pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
  pktcdvd: use bio_copy_data()
  block: Add bio_copy_data()
  ...
2013-05-08 10:13:35 -07:00
Linus Torvalds
149b306089 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Mostly performance and bug fixes, plus some cleanups.  The one new
  feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
  allows installation of a hidden inode designed for boot loaders."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
  ext4: fix type-widening bug in inode table readahead code
  ext4: add check for inodes_count overflow in new resize ioctl
  ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
  ext4: fix online resizing for ext3-compat file systems
  jbd2: trace when lock_buffer in do_get_write_access takes a long time
  ext4: mark metadata blocks using bh flags
  buffer: add BH_Prio and BH_Meta flags
  ext4: mark all metadata I/O with REQ_META
  ext4: fix readdir error in case inline_data+^dir_index.
  ext4: fix readdir error in the case of inline_data+dir_index
  jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  ext4: mext_insert_extents should update extent block checksum
  ext4: move quota initialization out of inode allocation transaction
  ext4: reserve xattr index for Rich ACL support
  jbd2: reduce journal_head size
  ext4: clear buffer_uninit flag when submitting IO
  ext4: use io_end for multiple bios
  ext4: make ext4_bio_write_page() use BH_Async_Write flags
  ext4: Use kstrtoul() instead of parse_strtoul()
  ext4: defragmentation code cleanup
  ...
2013-05-01 08:04:12 -07:00