udc
305 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
79a097e001 |
Merge 4.4.179 into android-msm-wahoo-4.4
Changes in 4.4.179: (170 commits)
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
arm64: debug: Ensure debug handlers check triggering exception level
ext4: cleanup bh release code in ext4_ind_remove_space()
lib/int_sqrt: optimize initial value compute
tty/serial: atmel: Add is_half_duplex helper
mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA
Bluetooth: Fix decrementing reference count twice in releasing socket
tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
CIFS: fix POSIX lock leak and invalid ptr deref
h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
tracing: kdb: Fix ftdump to not sleep
gpio: gpio-omap: fix level interrupt idling
sysctl: handle overflow for file-max
enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
mm/slab.c: kmemleak no scan alien caches
ocfs2: fix a panic problem caused by o2cb_ctl
f2fs: do not use mutex lock in atomic context
fs/file.c: initialize init_files.resize_wait
cifs: use correct format characters
dm thin: add sanity checks to thin-pool and external snapshot creation
cifs: Fix NULL pointer dereference of devname
fs: fix guard_bio_eod to check for real EOD errors
tools lib traceevent: Fix buffer overflow in arg_eval
usb: chipidea: Grab the (legacy) USB PHY by phandle first
scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
coresight: etm4x: Add support to enable ETMv4.2
ARM: 8840/1: use a raw_spinlock_t in unwind
mmc: omap: fix the maximum timeout setting
e1000e: Fix -Wformat-truncation warnings
IB/mlx4: Increase the timeout for CM cache
scsi: megaraid_sas: return error when create DMA pool failed
perf test: Fix failure of 'evsel-tp-sched' test on s390
SoC: imx-sgtl5000: add missing put_device()
media: sh_veu: Correct return type for mem2mem buffer helpers
media: s5p-jpeg: Correct return type for mem2mem buffer helpers
media: s5p-g2d: Correct return type for mem2mem buffer helpers
media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
leds: lp55xx: fix null deref on firmware load failure
kprobes: Prohibit probing on bsearch()
ARM: 8833/1: Ensure that NEON code always compiles with Clang
ALSA: PCM: check if ops are defined before suspending PCM
bcache: fix input overflow to cache set sysfs file io_error_halflife
bcache: fix input overflow to sequential_cutoff
bcache: improve sysfs_strtoul_clamp()
fbdev: fbmem: fix memory access if logo is bigger than the screen
cdrom: Fix race condition in cdrom_sysctl_register
ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
soc: qcom: gsbi: Fix error handling in gsbi_probe()
mt7601u: bump supported EEPROM version
ARM: avoid Cortex-A9 livelock on tight dmb loops
tty: increase the default flip buffer limit to 2*640K
media: mt9m111: set initial frame size other than 0x0
hwrng: virtio - Avoid repeated init of completion
soc/tegra: fuse: Fix illegal free of IO base address
hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
dmaengine: imx-dma: fix warning comparison of distinct pointer types
netfilter: physdev: relax br_netfilter dependency
media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
x86/build: Mark per-CPU symbols as absolute explicitly for LLD
dmaengine: tegra: avoid overflow of byte tracking
drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
binfmt_elf: switch to new creds when switching to new mm
kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
x86: vdso: Use $LD instead of $CC to link
x86/vdso: Drop implicit common-page-size linker flag
lib/string.c: implement a basic bcmp
tty: mark Siemens R3964 line discipline as BROKEN
tty: ldisc: add sysctl to prevent autoloading of ldiscs
ipv6: Fix dangling pointer when ipv6 fragment
ipv6: sit: reset ip header pointer in ipip6_rcv
net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
openvswitch: fix flow actions reallocation
qmi_wwan: add Olicard 600
sctp: initialize _pad of sockaddr_in before copying to user memory
tcp: Ensure DCTCP reacts to losses
netns: provide pure entropy for net_hash_mix()
net: ethtool: not call vzalloc for zero sized memory request
ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
ALSA: seq: Fix OOB-reads from strlcpy
include/linux/bitrev.h: fix constant bitrev
ASoC: fsl_esai: fix channel swap issue when stream starts
block: do not leak memory in bio_copy_user_iov()
genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
ARM: dts: at91: Fix typo in ISC_D0 on PC9
arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
xen: Prevent buffer overflow in privcmd ioctl
sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
xtensa: fix return_address
PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
perf/core: Restore mmap record type correctly
ext4: add missing brelse() in add_new_gdb_meta_bg()
ext4: report real fs size after failed resize
ALSA: echoaudio: add a check for ioremap_nocache
ALSA: sb8: add a check for request_region
IB/mlx4: Fix race condition between catas error reset and aliasguid flows
mmc: davinci: remove extraneous __init annotation
ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
thermal/int340x_thermal: Add additional UUIDs
thermal/int340x_thermal: fix mode setting
tools/power turbostat: return the exit status of a command
perf top: Fix error handling in cmd_top()
perf evsel: Free evsel->counts in perf_evsel__exit()
perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
x86/hpet: Prevent potential NULL pointer dereference
x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
iommu/vt-d: Check capability before disabling protected memory
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
fix incorrect error code mapping for OBJECTID_NOT_FOUND
ext4: prohibit fstrim in norecovery mode
rsi: improve kernel thread handling to fix kernel panic
9p: do not trust pdu content for stat item size
9p locks: add mount option for lock retry interval
f2fs: fix to do sanity check with current segment number
serial: uartps: console_setup() can't be placed to init section
ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
ACPI / SBS: Fix GPE storm on recent MacBookPro's
cifs: fallback to older infolevels on findfirst queryinfo retry
crypto: sha256/arm - fix crash bug in Thumb2 build
crypto: sha512/arm - fix crash bug in Thumb2 build
iommu/dmar: Fix buffer overflow during PCI bus notification
ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
appletalk: Fix use-after-free in atalk_proc_exit
lib/div64.c: off by one in shift
include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
tpm/tpm_crb: Avoid unaligned reads in crb_recv()
ovl: fix uid/gid when creating over whiteout
appletalk: Fix compile regression
bonding: fix event handling for stacked bonds
net: atm: Fix potential Spectre v1 vulnerabilities
net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
tcp: tcp_grow_window() needs to respect tcp_space()
ipv4: recompile ip options in ipv4_link_failure
ipv4: ensure rcu_read_lock() in ipv4_link_failure()
crypto: crypto4xx - properly set IV after de- and encrypt
modpost: file2alias: go back to simple devtable lookup
modpost: file2alias: check prototype of handler
tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
iio/gyro/bmg160: Use millidegrees for temperature scale
iio: ad_sigma_delta: select channel when reading register
iio: adc: at91: disable adc channel interrupt in timeout case
io: accel: kxcjk1013: restore the range after resume.
staging: comedi: vmk80xx: Fix use of uninitialized semaphore
staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
staging: comedi: ni_usb6501: Fix use of uninitialized mutex
staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
ALSA: core: Fix card races between register and disconnect
crypto: x86/poly1305 - fix overflow during partial reduction
arm64: futex: Restore oldval initialization to work around buggy compilers
x86/kprobes: Verify stack frame on kretprobe
kprobes: Mark ftrace mcount handler functions nokprobe
kprobes: Fix error check when reusing optimized probes
mac80211: do not call driver wake_tx_queue op during reconfig
Revert "kbuild: use -Oz instead of -Os when using clang"
sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
device_cgroup: fix RCU imbalance in error case
mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
ALSA: info: Fix racy addition/deletion of nodes
Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()"
kernel/sysctl.c: fix out-of-bounds access when setting file-max
Linux 4.4.179
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Conflicts:
Makefile
fs/ext4/ioctl.c
|
||
|
|
2e5086f3ac |
fs: fix guard_bio_eod to check for real EOD errors
[ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]
guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.
It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.
In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.
Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.
I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.
The following script can trigger the error.
MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0
mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT
losetup -D
losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT
find $MNT -type f -exec cat {} + >/dev/null
Kudos to Eric Sandeen for coming up with the reproducer above
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
75c8bc7183 |
Merged linux-4.4.80 into android-msm-wahoo-4.4
Linux 4.4.80
ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
scsi: snic: Return error code on memory allocation failure
scsi: fnic: Avoid sending reset to firmware when another reset is in progress
HID: ignore Petzl USB headlamp
ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
sh_eth: enable RX descriptor word 0 shift on SH7734
nvmem: imx-ocotp: Fix wrong register size
arm64: mm: fix show_pte KERN_CONT fallout
vfio-pci: Handle error from pci_iomap
video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
perf symbols: Robustify reading of build-id from sysfs
perf tools: Install tools/lib/traceevent plugins with install-bin
xfrm: Don't use sk_family for socket policy lookups
tools lib traceevent: Fix prev/next_prio for deadline tasks
Btrfs: adjust outstanding_extents counter properly when dio write is split
usb: gadget: Fix copy/pasted error message
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
ARM64: zynqmp: Fix i2c node's compatible string
ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
dmaengine: ioatdma: workaround SKX ioatdma version
dmaengine: ioatdma: Add Skylake PCI Dev ID
openrisc: Add _text symbol to fix ksym build error
irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
spi: dw: Make debugfs name unique between instances
ASoC: tlv320aic3x: Mark the RESET register as volatile
irqchip/keystone: Fix "scheduling while atomic" on rt
vfio-pci: use 32-bit comparisons for register address for gcc-4.5
drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
drm/msm: Ensure that the hardware write pointer is valid
net/mlx4: Remove BUG_ON from ICM allocation routine
ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
r8169: add support for RTL8168 series add-on card.
x86/mce/AMD: Make the init code more robust
tpm: Replace device number bitmap with IDR
tpm: fix a kernel memory leak in tpm-sysfs.c
xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
xen/blkback: don't free be structure too early
sched/cputime: Fix prev steal time accouting during CPU hotplug
net: skb_needs_check() accepts CHECKSUM_NONE for tx
pstore: Use dynamic spinlock initializer
pstore: Correctly initialize spinlock and flags
pstore: Allow prz to control need for locking
vlan: Propagate MAC address to VLANs
/proc/iomem: only expose physical resource addresses to privileged users
Make file credentials available to the seqfile interfaces
v4l: s5c73m3: fix negation operator
dentry name snapshots
ipmi/watchdog: fix watchdog timeout set on reboot
libnvdimm, btt: fix btt_rw_page not returning errors
RDMA/uverbs: Fix the check for port number
PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
sched/cgroup: Move sched_online_group() back into css_online() to fix crash
kaweth: fix oops upon failed memory allocation
kaweth: fix firmware download
mpt3sas: Don't overreach ioc->reply_post[] during initialization
mailbox: handle empty message in tx_tick
mailbox: skip complete wait event if timer expired
mailbox: always wait in mbox_send_message for blocking Tx mode
wil6210: fix deadlock when using fw_no_recovery option
ath10k: fix null deref on wmi-tlv when trying spectral scan
isdn/i4l: fix buffer overflow
isdn: Fix a sleep-in-atomic bug
net: phy: Do not perform software reset for Generic PHY
nfc: fdp: fix NULL pointer dereference
xfs: don't BUG() on mixed direct and mapped I/O
perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
perf intel-pt: Use FUP always when scanning for an IP
perf intel-pt: Fix last_ip usage
perf intel-pt: Fix ip compression
drm: rcar-du: Simplify and fix probe error handling
drm: rcar-du: Perform initialization/cleanup at probe/remove time
drm/rcar: Nuke preclose hook
Staging: comedi: comedi_fops: Avoid orphaned proc entry
Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
KVM: PPC: Book3S HV: Save/restore host values of debug registers
KVM: PPC: Book3S HV: Reload HTM registers explicitly
KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
KVM: PPC: Book3S HV: Context-switch EBB registers properly
drm/nouveau/bar/gf100: fix access to upper half of BAR2
drm/vmwgfx: Fix gcc-7.1.1 warning
md/raid5: add thread_group worker async_tx_issue_pending_all
crypto: authencesn - Fix digest_null crash
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
net: reduce skb_warn_bad_offload() noise
pstore: Make spinlock per zone instead of global
af_key: Add lock to key dump
Linux 4.4.79
alarmtimer: don't rate limit one-shot timers
tracing: Fix kmemleak in instance_rmdir
spmi: Include OF based modalias in device uevent
of: device: Export of_device_{get_modalias, uvent_modalias} to modules
drm/mst: Avoid processing partially received up/down message transactions
drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
drm/mst: Fix error handling during MST sideband message reception
RDMA/core: Initialize port_num in qp_attr
ceph: fix race in concurrent readdir
staging: rtl8188eu: add TL-WN722N v2 support
Revert "perf/core: Drop kernel samples even though :u is specified"
perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce
udf: Fix deadlock between writeback and udf_setsize()
NFS: only invalidate dentrys that are clearly invalid.
Input: i8042 - fix crash at boot time
MIPS: Fix a typo: s/preset/present/ in r2-to-r6 emulation error message
MIPS: Send SIGILL for linked branches in `__compute_return_epc_for_insn'
MIPS: Rename `sigill_r6' to `sigill_r2r6' in `__compute_return_epc_for_insn'
MIPS: Send SIGILL for BPOSGE32 in `__compute_return_epc_for_insn'
MIPS: math-emu: Prevent wrong ISA mode instruction emulation
MIPS: Fix unaligned PC interpretation in `compute_return_epc'
MIPS: Actually decode JALX in `__compute_return_epc_for_insn'
MIPS: Save static registers before sysmips
MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
x86/ioapic: Pass the correct data to unmask_ioapic_irq()
x86/acpi: Prevent out of bound access caused by broken ACPI tables
MIPS: Negate error syscall return in trace
MIPS: Fix mips_atomic_set() with EVA
MIPS: Fix mips_atomic_set() retry condition
ftrace: Fix uninitialized variable in match_records()
vfio: New external user group/file match
vfio: Fix group release deadlock
f2fs: Don't clear SGID when inheriting ACLs
ipmi:ssif: Add missing unlock in error branch
ipmi: use rcu lock around call to intf->handlers->sender()
drm/radeon: Fix eDP for single-display iMac10,1 (v2)
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/amd/amdgpu: Return error if initiating read out of range on vram
s390/syscalls: Fix out of bounds arguments access
Raid5 should update rdev->sectors after reshape
cx88: Fix regression in initial video standard setting
x86/xen: allow userspace access during hypercalls
md: don't use flush_signals in userspace processes
usb: renesas_usbhs: gadget: disable all eps when the driver stops
usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL
USB: cdc-acm: add device-id for quirky printer
usb: storage: return on error to avoid a null pointer dereference
xhci: Fix NULL pointer dereference when cleaning up streams for removed host
xhci: fix 20000ms port resume timeout
ipvs: SNAT packet replies only for NATed connections
PCI/PM: Restore the status of PCI devices across hibernation
af_key: Fix sadb_x_ipsecrequest parsing
powerpc/asm: Mark cr0 as clobbered in mftb()
powerpc: Fix emulation of mfocrf in emulate_step()
powerpc: Fix emulation of mcrf in emulate_step()
powerpc/64: Fix atomic64_inc_not_zero() to return an int
iscsi-target: Add login_keys_workaround attribute for non RFC initiators
scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails.
PM / Domains: Fix unsafe iteration over modified list of domain providers
PM / Domains: Fix unsafe iteration over modified list of device links
ASoC: compress: Derive substream from stream based on direction
wlcore: fix 64K page support
Bluetooth: use constant time memory comparison for secret values
perf intel-pt: Clear FUP flag on error
perf intel-pt: Ensure IP is zero when state is INTEL_PT_STATE_NO_IP
perf intel-pt: Fix missing stack clear
perf intel-pt: Improve sample timestamp
perf intel-pt: Move decoder error setting into one condition
NFC: Add sockaddr length checks before accessing sa_family in bind handlers
nfc: Fix the sockaddr length sanitization in llcp_sock_connect
nfc: Ensure presence of required attributes in the activate_target handler
NFC: nfcmrvl: fix firmware-management initialisation
NFC: nfcmrvl: use nfc-device for firmware download
NFC: nfcmrvl: do not use device-managed resources
NFC: nfcmrvl_uart: add missing tty-device sanity check
NFC: fix broken device allocation
ath9k: fix tx99 bus error
ath9k: fix tx99 use after free
thermal: cpu_cooling: Avoid accessing potentially freed structures
s5p-jpeg: don't return a random width/height
ir-core: fix gcc-7 warning on bool arithmetic
disable new gcc-7.1.1 warnings for now
Linux 4.4.78
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
kvm: vmx: Check value written to IA32_BNDCFGS
kvm: x86: Guest BNDCFGS requires guest MPX support
kvm: vmx: Do not disable intercepts for BNDCFGS
KVM: x86: disable MPX if host did not enable MPX XSAVE features
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
PM / QoS: return -EINVAL for bogus strings
PM / wakeirq: Convert to SRCU
sched/topology: Optimize build_group_mask()
sched/topology: Fix overlapping sched_group_mask
crypto: caam - fix signals handling
crypto: sha1-ssse3 - Disable avx2
crypto: atmel - only treat EBUSY as transient if backlog
crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
mm: fix overflow check in expand_upwards()
tpm: Issue a TPM2_Shutdown for TPM2 devices.
Add "shutdown" to "struct class".
tpm: Provide strong locking for device removal
tpm: Get rid of chip->pdev
selftests/capabilities: Fix the test_execve test
mnt: Make propagate_umount less slow for overlapping mount propagation trees
mnt: In propgate_umount handle visiting mounts in any order
mnt: In umount propagation reparent in a separate pass
vt: fix unchecked __put_user() in tioclinux ioctls
exec: Limit arg stack to at most 75% of _STK_LIM
s390: reduce ELF_ET_DYN_BASE
powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
arm: move ELF_ET_DYN_BASE to 4MB
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
checkpatch: silence perl 5.26.0 unescaped left brace warnings
fs/dcache.c: fix spin lockup issue on nlru->lock
mm/list_lru.c: fix list_lru_count_node() to be race free
kernel/extable.c: mark core_kernel_text notrace
tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
parisc/mm: Ensure IRQs are off in switch_mm()
parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
parisc: use compat_sys_keyctl()
parisc: Report SIGSEGV instead of SIGBUS when running out of stack
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
cfg80211: Check if PMKID attribute is of expected size
cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
rds: tcp: use sock_create_lite() to create the accept socket
vrf: fix bug_on triggered by rx when destroying a vrf
net: ipv6: Compare lwstate in detecting duplicate nexthops
ipv6: dad: don't remove dynamic addresses if link is down
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
bpf: prevent leaking pointer via xadd on unpriviledged
net: prevent sign extension in dev_get_stats()
tcp: reset sk_rx_dst in tcp_disconnect()
net: dp83640: Avoid NULL pointer dereference.
ipv6: avoid unregistering inet6_dev for loopback
net/phy: micrel: configure intterupts after autoneg workaround
net: sched: Fix one possible panic when no destroy callback
net_sched: fix error recovery at qdisc creation
Linux 4.4.77
saa7134: fix warm Medion 7134 EEPROM read
x86/mm/pat: Don't report PAT on CPUs that don't support it
ext4: check return value of kstrtoull correctly in reserved_clusters_store
staging: comedi: fix clean-up of comedi_class in comedi_init()
staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
tcp: fix tcp_mark_head_lost to check skb len before fragmenting
md: fix super_offset endianness in super_1_rdev_size_change
md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
perf tools: Use readdir() instead of deprecated readdir_r() again
perf tests: Remove wrong semicolon in while loop in CQM test
perf trace: Do not process PERF_RECORD_LOST twice
perf dwarf: Guard !x86_64 definitions under #ifdef else clause
perf pmu: Fix misleadingly indented assignment (whitespace)
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf tools: Remove duplicate const qualifier
perf script: Use readdir() instead of deprecated readdir_r()
perf thread_map: Use readdir() instead of deprecated readdir_r()
perf tools: Use readdir() instead of deprecated readdir_r()
perf bench numa: Avoid possible truncation when using snprintf()
perf tests: Avoid possible truncation with dirent->d_name + snprintf
perf scripting perl: Fix compile error with some perl5 versions
perf thread_map: Correctly size buffer used with dirent->dt_name
perf intel-pt: Use __fallthrough
perf top: Use __fallthrough
tools strfilter: Use __fallthrough
tools string: Use __fallthrough in perf_atoll()
tools include: Add a __fallthrough statement
mqueue: fix a use-after-free in sys_mq_notify()
RDMA/uverbs: Check port number supplied by user verbs cmds
KEYS: Fix an error code in request_master_key()
ath10k: override CE5 config for QCA9377
x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
x86/tools: Fix gcc-7 warning in relocs.c
gfs2: Fix glock rhashtable rcu bug
USB: serial: qcserial: new Sierra Wireless EM7305 device ID
USB: serial: option: add two Longcheer device ids
pinctrl: sh-pfc: Update info pointer after SoC-specific init
pinctrl: mxs: atomically switch mux and drive strength config
pinctrl: sunxi: Fix SPDIF function name for A83T
pinctrl: meson: meson8b: fix the NAND DQS pins
pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
sysctl: don't print negative flag for proc_douintvec
mac80211_hwsim: Replace bogus hrtimer clockid
usb: Fix typo in the definition of Endpoint[out]Request
usb: usbip: set buffer pointers to NULL after free
Add USB quirk for HVR-950q to avoid intermittent device resets
USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
usb: dwc3: replace %p with %pK
drm/virtio: don't leak bo on drm_gem_object_init failure
tracing/kprobes: Allow to create probe with a module name starting with a digit
mm: fix classzone_idx underflow in shrink_zones()
bgmac: reset & enable Ethernet core before using it
driver core: platform: fix race condition with driver_override
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
Linux 4.4.76
KVM: nVMX: Fix exception injection
KVM: x86: zero base3 of unusable segments
KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
KVM: x86: fix emulation of RSM and IRET instructions
cpufreq: s3c2416: double free on driver init error path
iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
iommu: Handle default domain attach failure
iommu/vt-d: Don't over-free page table directories
ocfs2: o2hb: revert hb threshold to keep compatible
x86/mm: Fix flush_tlb_page() on Xen
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
ARM: 8685/1: ensure memblock-limit is pmd-aligned
ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
watchdog: bcm281xx: Fix use of uninitialized spinlock.
xfrm: Oops on error in pfkey_msg2xfrm_state()
xfrm: NULL dereference on allocation failure
xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
jump label: fix passing kbuild_cflags when checking for asm goto support
ravb: Fix use-after-free on `ifconfig eth0 down`
sctp: check af before verify address in sctp_addr_id2transport
net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
perf probe: Fix to show correct locations for events on modules
be2net: fix status check in be_cmd_pmac_add()
s390/ctl_reg: make __ctl_load a full memory barrier
swiotlb: ensure that page-sized mappings are page-aligned
coredump: Ensure proper size of sparse core files
x86/mpx: Use compatible types in comparison to fix sparse error
mac80211: initialize SMPS field in HT capabilities
spi: davinci: use dma_mapping_error()
scsi: lpfc: avoid double free of resource identifiers
HID: i2c-hid: Add sleep between POWER ON and RESET
kernel/panic.c: add missing \n
ibmveth: Add a proper check for the availability of the checksum features
vxlan: do not age static remote mac entries
virtio_net: fix PAGE_SIZE > 64k
vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
drm/amdgpu: check ring being ready before using
net: dsa: Check return value of phy_connect_direct()
amd-xgbe: Check xgbe_init() return code
platform/x86: ideapad-laptop: handle ACPI event 1
scsi: virtio_scsi: Reject commands when virtqueue is broken
xen-netfront: Fix Rx stall during network stress and OOM
swiotlb-xen: update dev_addr after swapping pages
virtio_console: fix a crash in config_work_handler
Btrfs: fix truncate down when no_holes feature is enabled
gianfar: Do not reuse pages from emergency reserve
powerpc/eeh: Enable IO path on permanent error
net: bgmac: Remove superflous netif_carrier_on()
net: bgmac: Start transmit queue in bgmac_open
net: bgmac: Fix SOF bit checking
bgmac: Fix reversed test of build_skb() return value.
mtd: bcm47xxpart: don't fail because of bit-flips
bgmac: fix a missing check for build_skb
mtd: bcm47xxpart: limit scanned flash area on BCM47XX (MIPS) only
MIPS: ralink: fix MT7628 wled_an pinmux gpio
MIPS: ralink: fix MT7628 pinmux typos
MIPS: ralink: Fix invalid assignment of SoC type
MIPS: ralink: fix USB frequency scaling
MIPS: ralink: MT7688 pinmux fixes
net: korina: Fix NAPI versus resources freeing
MIPS: ath79: fix regression in PCI window initialization
net: mvneta: Fix for_each_present_cpu usage
ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
qla2xxx: Fix erroneous invalid handle message
scsi: lpfc: Set elsiocb contexts to NULL after freeing it
scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
KVM: x86: fix fixing of hypercalls
mm: numa: avoid waiting on freed migrated pages
block: fix module reference leak on put_disk() call for cgroups throttle
sysctl: enable strict writes
usb: gadget: f_fs: Fix possibe deadlock
drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
ALSA: hda - set input_path bitmap to zero after moving it to new place
ALSA: hda - Fix endless loop of codec configure
MIPS: Fix IRQ tracing & lockdep when rescheduling
MIPS: pm-cps: Drop manual cache-line alignment of ready_count
MIPS: Avoid accidental raw backtrace
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
drm/ast: Handle configuration without P2A bridge
NFSv4: fix a reference leak caused WARNING messages
netfilter: synproxy: fix conntrackd interaction
netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
rtnetlink: add IFLA_GROUP to ifla_policy
ipv6: Do not leak throw route references
sfc: provide dummy definitions of vswitch functions
net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
decnet: always not take dst->__refcnt when inserting dst into hash table
net/mlx5: Wait for FW readiness before initializing command interface
ipv6: fix calling in6_ifa_hold incorrectly for dad work
igmp: add a missing spin_lock_init()
igmp: acquire pmc lock for ip_mc_clear_src()
net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx
Fix an intermittent pr_emerg warning about lo becoming free.
af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
net: Zero ifla_vf_info in rtnl_fill_vfinfo()
decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
net: don't call strlen on non-terminated string in dev_set_alias()
ipv6: release dst on error in ip6_dst_lookup_tail
Linux 4.4.75
nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
nvme/quirk: Add a delay before checking for adapter readiness
net: phy: fix marvell phy status reading
net: phy: Initialize mdio clock at probe function
usb: gadget: f_fs: avoid out of bounds access on comp_desc
powerpc/slb: Force a full SLB flush when we insert for a bad EA
mtd: spi-nor: fix spansion quad enable
of: Add check to of_scan_flat_dt() before accessing initial_boot_params
rxrpc: Fix several cases where a padded len isn't checked in ticket decode
USB: usbip: fix nonconforming hub descriptor
drm/amdgpu: adjust default display clock
drm/amdgpu/atom: fix ps allocation size for EnableDispPowerGating
drm/radeon: add a quirk for Toshiba Satellite L20-183
drm/radeon: add a PX quirk for another K53TK variant
iscsi-target: Reject immediate data underflow larger than SCSI transfer length
target: Fix kref->refcount underflow in transport_cmd_finish_abort
time: Fix clock->read(clock) race around clocksource changes
Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list
powerpc/kprobes: Pause function_graph tracing during jprobes handling
signal: Only reschedule timers on signals timers have sent
HID: Add quirk for Dell PIXART OEM mouse
CIFS: Improve readdir verbosity
KVM: PPC: Book3S HV: Preserve userspace HTM state properly
lib/cmdline.c: fix get_options() overflow while parsing ranges
autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
fs/exec.c: account for argv/envp pointers
Linux 4.4.74
mm: fix new crash in unmapped_area_topdown()
Allow stack to grow up to address space limit
mm: larger stack guard gap, between vmas
alarmtimer: Rate limit periodic intervals
MIPS: Fix bnezc/jialc return address calculation
usb: dwc3: exynos fix axius clock error path to do cleanup
alarmtimer: Prevent overflow of relative timers
genirq: Release resources in __setup_irq() error path
swap: cond_resched in swap_cgroup_prepare()
mm/memory-failure.c: use compound_head() flags for huge pages
USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
usb: r8a66597-hcd: decrease timeout
usb: r8a66597-hcd: select a different endpoint on timeout
USB: gadget: dummy_hcd: fix hub-descriptor removable fields
pvrusb2: reduce stack usage pvr2_eeprom_analyze()
usb: core: fix potential memory leak in error path during hcd creation
USB: hub: fix SS max number of ports
iio: proximity: as3935: recalibrate RCO after resume
staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
mac80211: fix IBSS presp allocation size
mac80211: fix CSA in IBSS mode
mac80211/wpa: use constant time memory comparison for MACs
mac80211: don't look at the PM bit of BAR frames
vb2: Fix an off by one error in 'vb2_plane_vaddr'
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
can: gs_usb: fix memory leak in gs_cmd_reset()
configfs: Fix race between create_link and configfs_rmdir
Linux 4.4.73
sparc64: make string buffers large enough
s390/kvm: do not rely on the ILC on kvm host protection fauls
xtensa: don't use linux IRQ #0
tipc: ignore requests when the connection state is not CONNECTED
proc: add a schedule point in proc_pid_readdir()
romfs: use different way to generate fsid for BLOCK or MTD
sctp: sctp_addr_id2transport should verify the addr before looking up assoc
r8152: avoid start_xmit to schedule napi when napi is disabled
r8152: fix rtl8152_post_reset function
r8152: re-schedule napi for tx
nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
ravb: unmap descriptors when freeing rings
drm/ast: Fixed system hanged if disable P2A
drm/nouveau: Don't enabling polling twice on runtime resume
parisc, parport_gsc: Fixes for printk continuation lines
net: adaptec: starfire: add checks for dma mapping errors
pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
net/mlx4_core: Avoid command timeouts during VF driver device shutdown
drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
drm/nouveau: prevent userspace from deleting client object
ipv6: fix flow labels when the traffic class is non-0
FS-Cache: Initialise stores_lock in netfs cookie
fscache: Clear outstanding writes when disabling a cookie
fscache: Fix dead object requeue
ethtool: do not vzalloc(0) on registers dump
log2: make order_base_2() behave correctly on const input value zero
kasan: respect /proc/sys/kernel/traceoff_on_warning
jump label: pass kbuild_cflags when checking for asm goto support
PM / runtime: Avoid false-positive warnings from might_sleep_if()
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
i2c: piix4: Fix request_region size
sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
sierra_net: Skip validating irrelevant fields for IDLE LSIs
net: hns: Fix the device being used for dma mapping during TX
NET: mkiss: Fix panic
NET: Fix /proc/net/arp for AX.25
ipv6: Inhibit IPv4-mapped src address on the wire.
ipv6: Handle IPv4-mapped src to in6addr_any dst.
net: xilinx_emaclite: fix receive buffer overflow
net: xilinx_emaclite: fix freezes due to unordered I/O
Call echo service immediately after socket reconnect
staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation
partitions/msdos: FreeBSD UFS2 file systems are not recognized
s390/vmem: fix identity mapping
Linux 4.4.72
arm64: ensure extension of smp_store_release value
arm64: armv8_deprecated: ensure extension of addr
usercopy: Adjust tests to deal with SMAP/PAN
RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
arm64: entry: improve data abort handling of tagged pointers
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
Make __xfs_xattr_put_listen preperly report errors.
NFSv4: Don't perform cached access checks before we've OPENed the file
NFS: Ensure we revalidate attributes before using execute_ok()
mm: consider memblock reservations for deferred memory initialization sizing
net: better skb->sender_cpu and skb->napi_id cohabitation
serial: sh-sci: Fix panic when serial console and DMA are enabled
tty: Drop krefs for interrupted tty lock
drivers: char: mem: Fix wraparound check to allow mappings up to the end
ASoC: Fix use-after-free at card unregistration
ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
ALSA: timer: Fix race between read and ioctl
drm/nouveau/tmr: fully separate alarm execution/pending lists
drm/vmwgfx: Make sure backup_handle is always valid
drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
perf/core: Drop kernel samples even though :u is specified
powerpc/hotplug-mem: Fix missing endian conversion of aa_index
powerpc/numa: Fix percpu allocations to be NUMA aware
powerpc/eeh: Avoid use after free in eeh_handle_special_event()
scsi: qla2xxx: don't disable a not previously enabled PCI device
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
btrfs: fix memory leak in update_space_info failure path
btrfs: use correct types for page indices in btrfs_page_exists_in_range
cxl: Fix error path on bad ioctl
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
ufs: set correct ->s_maxsize
ufs: restore maintaining ->i_blocks
fix ufs_isblockset()
ufs: restore proper tail allocation
fs: add i_blocksize()
cpuset: consider dying css as offline
Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
drm/msm: Expose our reservation object when exporting a dmabuf.
target: Re-add check to reject control WRITEs with overflow data
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
random: properly align get_random_int_hash
drivers: char: random: add get_random_long()
iio: proximity: as3935: fix AS3935_INT mask
iio: light: ltr501 Fix interchanged als/ps register field
staging/lustre/lov: remove set_fs() call from lov_getstripe()
usb: chipidea: debug: check before accessing ci_role
usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
usb: gadget: f_mass_storage: Serialize wake and sleep execution
ext4: fix fdatasync(2) after extent manipulation operations
ext4: keep existing extra fields when inode expands
ext4: fix SEEK_HOLE
xen-netfront: cast grant table reference first to type int
xen-netfront: do not cast grant table reference to signed short
xen/privcmd: Support correctly 64KB page granularity when mapping memory
dmaengine: ep93xx: Always start from BASE0
dmaengine: usb-dmac: Fix DMAOR AE bit definition
KVM: async_pf: avoid async pf injection when in guest mode
arm: KVM: Allow unaligned accesses at HYP
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
kvm: async_pf: fix rcu_irq_enter() with irqs enabled
nfsd: Fix up the "supattr_exclcreat" attributes
nfsd4: fix null dereference on replay
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
crypto: gcm - wait for crypto op not signal safe
KEYS: fix freeing uninitialized memory in key_update()
KEYS: fix dereferencing NULL payload with nonzero length
ptrace: Properly initialize ptracer_cred on fork
serial: ifx6x60: fix use-after-free on module unload
arch/sparc: support NR_CPUS = 4096
sparc64: delete old wrap code
sparc64: new context wrap
sparc64: add per-cpu mm of secondary contexts
sparc64: redefine first version
sparc64: combine activate_mm and switch_mm
sparc64: reset mm cpumask after wrap
sparc: Machine description indices can vary
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
net: bridge: start hello timer only if device is up
net: ethoc: enable NAPI before poll may be scheduled
net: ping: do not abuse udp_poll()
ipv6: Fix leak in ipv6_gso_segment().
vxlan: fix use-after-free on deletion
tcp: disallow cwnd undo when switching congestion control
cxgb4: avoid enabling napi twice to the same queue
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
bnx2x: Fix Multi-Cos
Linux 4.4.71
xfs: only return -errno or success from attr ->put_listent
xfs: in _attrlist_by_handle, copy the cursor back to userspace
xfs: fix unaligned access in xfs_btree_visit_blocks
xfs: bad assertion for delalloc an extent that start at i_size
xfs: fix indlen accounting error on partial delalloc conversion
xfs: wait on new inodes during quotaoff dquot release
xfs: update ag iterator to support wait on new inodes
xfs: support ability to wait on new inodes
xfs: fix up quotacheck buffer list error handling
xfs: prevent multi-fsb dir readahead from reading random blocks
xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
xfs: fix over-copying of getbmap parameters from userspace
xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
xfs: Fix missed holes in SEEK_HOLE implementation
mlock: fix mlock count can not decrease in race condition
mm/migrate: fix refcount handling when !hugepage_migration_supported()
drm/gma500/psb: Actually use VBT mode when it is found
slub/memcg: cure the brainless abuse of sysfs attributes
ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
pcmcia: remove left-over %Z format
drm/radeon: Unbreak HPD handling for r600+
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
scsi: mpt3sas: Force request partial completion alignment
HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
i2c: i2c-tiny-usb: fix buffer not being DMA capable
vlan: Fix tcp checksum offloads in Q-in-Q vlans
net: phy: marvell: Limit errata to 88m1101
netem: fix skb_orphan_partial()
ipv4: add reference counting to metrics
sctp: fix ICMP processing if skb is non-linear
tcp: avoid fastopen API to be used on AF_UNSPEC
virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
be2net: Fix offload features for Q-in-Q packets
ipv6: fix out of bound writes in __ip6_append_data()
bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
qmi_wwan: add another Lenovo EM74xx device ID
bridge: netlink: check vlan_default_pvid range
ipv6: Check ip6_find_1stfragopt() return value properly.
ipv6: Prevent overrun when parsing v6 header options
net: Improve handling of failures on link and route dumps
tcp: eliminate negative reordering in tcp_clean_rtx_queue
sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
sctp: fix src address selection if using secondary addresses for ipv6
tcp: avoid fragmenting peculiar skbs in SACK
s390/qeth: avoid null pointer dereference on OSN
s390/qeth: unbreak OSM and OSN support
s390/qeth: handle sysfs error during initialization
ipv6/dccp: do not inherit ipv6_mc_list from parent
dccp/tcp: do not inherit mc_list from parent
sparc: Fix -Wstringop-overflow warning
Bug: 62730977
Change-Id: Ifca755d82f9e4b11016f6660298c2c1b073bfb3a
Signed-off-by: Thierry Strudel <tstrudel@google.com>
|
||
|
|
044470266a |
fs: add i_blocksize()
commit 93407472a21b82f39c955ea7787e5bc7da100642 upstream. Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
a2ac6d40d3 |
fs/buffer.c: Revoke LRU when trying to drop buffers
When a buffer is added to the LRU list, a reference is taken which is not dropped until the buffer is evicted from the LRU list. This is the correct behavior, however this LRU reference will prevent the buffer from being dropped. This means that the buffer can't actually be dropped until it is selected for eviction. There's no bound on the time spent on the LRU list, which means that the buffer may be undroppable for very long periods of time. Given that migration involves dropping buffers, the associated page is now unmigratible for long periods of time as well. CMA relies on being able to migrate a specific range of pages, so these these types of failures make CMA significantly less reliable, especially under high filesystem usage. Rather than waiting for the LRU algorithm to eventually kick out the buffer, explicitly remove the buffer from the LRU list when trying to drop it. There is still the possibility that the buffer could be added back on the list, but that indicates the buffer is still in use and would probably have other 'in use' indicates to prevent dropping. Change-Id: I253f4ee2069e190c1115afc421dadd27a7fa87dc Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
|
|
a627c73479 |
block/fs: make tracking dirty task debug only
Adding a new element "tsk_dirty" to struct page increases the size of mem_map/vmemmap, restrict this to a debug only functionality to save few MB of memory. Considering a system with 1G of RAM, there will be nearly 262144 pages and thus that many number of page structures in mem_map/vmemmap. With pointer size of 8 bytes on a 64 bit system, adding this pointer to "struct page" means an increase of "2MB" for mem_map. CRs-Fixed: 738692 Change-Id: Idf3217dcbe17cf1ab4d462d2aa8d39da1ffd8b13 Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org> [venkatg@codeaurora.org: Fixed trivial merge conflict] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org> |
||
|
|
014929f975 |
block/fs: keep track of the task that dirtied the page
Background writes happen in the context of a background thread. It is very useful to identify the actual task that generated the request instead of background task that submited the request. Hence keep track of the task when a page gets dirtied and dump this task info while tracing. Not all the pages in the bio are dirtied by the same task but most likely it will be, since the sectors accessed on the device must be adjacent. Change-Id: I6afba85a2063dd3350a0141ba87cf8440ce9f777 Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org> [venkatg@codeaurora.org: Fixed trivial merge conflicts] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org> |
||
|
|
5c50002963 |
vfs: remove unused wrapper block_page_mkwrite()
The function currently called "__block_page_mkwrite()" used to be called
"block_page_mkwrite()" until a wrapper for this function was added by:
commit
|
||
|
|
c62d25556b |
mm, fs: introduce mapping_gfp_constraint()
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
6cf66b4caf |
fs: use helper bio_add_page() instead of open coding on bi_io_vec
Call pre-defined helper bio_add_page() instead of open coding for iterating through bi_io_vec[]. Doing that, it's possible to make some parts in filesystems and mm/page_io.c simpler than before. Acked-by: Dave Kleikamp <shaggy@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
b7c44ed9d2 |
block: manipulate bio->bi_flags through helpers
Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too. It was a bit of a mess of atomic vs non-atomic access. With BIO_UPTODATE gone, we don't have any risk of concurrent access to the flags. So relax the restriction and don't make any of them atomic. The flags that do have serialization issues (reffed and chained), we already handle those separately. Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
4246a0b63b |
block: add a bi_error field to struct bio
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
d2e73fcceb |
buffer: remove unusued 'ret' variable
Merge hickup on my part, due to a clash between the writeback changes and the EOPNOTSUPP removal in _submit_bh(). Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
2a81490811 |
writeback: implement foreign cgroup inode detection
As concurrent write sharing of an inode is expected to be very rare
and memcg only tracks page ownership on first-use basis severely
confining the usefulness of such sharing, cgroup writeback tracks
ownership per-inode. While the support for concurrent write sharing
of an inode is deemed unnecessary, an inode being written to by
different cgroups at different points in time is a lot more common,
and, more importantly, charging only by first-use can too readily lead
to grossly incorrect behaviors (single foreign page can lead to
gigabytes of writeback to be incorrectly attributed).
To resolve this issue, cgroup writeback detects the majority dirtier
of an inode and will transfer the ownership to it. To avoid
unnnecessary oscillation, the detection mechanism keeps track of
history and gives out the switch verdict only if the foreign usage
pattern is stable over a certain amount of time and/or writeback
attempts.
The detection mechanism has fairly low space and computation overhead.
It adds 8 bytes to struct inode (one int and two u16's) and minimal
amount of calculation per IO. The detection mechanism converges to
the correct answer usually in several seconds of IO time when there's
a clear majority dirtier. Even when there isn't, it can reach an
acceptable answer fairly quickly under most circumstances.
Please see wb_detach_inode() for more details.
This patch only implements detection. Following patches will
implement actual switching.
v2: wbc_account_io() now checks whether the wbc is associated with a
wb before dereferencing it. This can happen when pageout() is
writing pages directly without going through the usual writeback
path. As pageout() path is single-threaded, we don't want it to
be blocked behind a slow cgroup and ultimately want it to delegate
actual writing to the usual writeback path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
||
|
|
b16b1deb55 |
writeback: make writeback_control track the inode being written back
Currently, for cgroup writeback, the IO submission paths directly
associate the bio's with the blkcg from inode_to_wb_blkcg_css();
however, it'd be necessary to keep more writeback context to implement
foreign inode writeback detection. wbc (writeback_control) is the
natural fit for the extra context - it persists throughout the
writeback of each inode and is passed all the way down to IO
submission paths.
This patch adds wbc_attach_and_unlock_inode(), wbc_detach_inode(), and
wbc_attach_fdatawrite_inode() which are used to associate wbc with the
inode being written back. IO submission paths now use wbc_init_bio()
instead of directly associating bio's with blkcg themselves. This
leaves inode_to_wb_blkcg_css() w/o any user. The function is removed.
wbc currently only tracks the associated wb (bdi_writeback). Future
patches will add more for foreign inode detection. The association is
established under i_lock which will be depended upon when migrating
foreign inodes to other wb's.
As currently, once established, inode to wb association never changes,
going through wbc when initializing bio's doesn't cause any behavior
changes.
v2: submit_blk_blkcg() now checks whether the wbc is associated with a
wb before dereferencing it. This can happen when pageout() is
writing pages directly without going through the usual writeback
path. As pageout() path is single-threaded, we don't want it to
be blocked behind a slow cgroup and ultimately want it to delegate
actual writing to the usual writeback path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
||
|
|
bafc0dba1e |
buffer, writeback: make __block_write_full_page() honor cgroup writeback
[__]block_write_full_page() is used to implement ->writepage in various filesystems. All writeback logic is now updated to handle cgroup writeback and the block cgroup to issue IOs for is encoded in writeback_control and can be retrieved from the inode; however, [__]block_write_full_page() currently ignores the blkcg indicated by inode and issues all bio's without explicit blkcg association. This patch adds submit_bh_blkcg() which associates the bio with the specified blkio cgroup before issuing and uses it in __block_write_full_page() so that the issued bio's are associated with inode_to_wb_blkcg_css(inode). v2: Updated for per-inode wb association. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
c4843a7593 |
memcg: add per cgroup dirty page accounting
When modifying PG_Dirty on cached file pages, update the new
MEM_CGROUP_STAT_DIRTY counter. This is done in the same places where
global NR_FILE_DIRTY is managed. The new memcg stat is visible in the
per memcg memory.stat cgroupfs file. The most recent past attempt at
this was http://thread.gmane.org/gmane.linux.kernel.cgroups/8632
The new accounting supports future efforts to add per cgroup dirty
page throttling and writeback. It also helps an administrator break
down a container's memory usage and provides evidence to understand
memcg oom kills (the new dirty count is included in memcg oom kill
messages).
The ability to move page accounting between memcg
(memory.move_charge_at_immigrate) makes this accounting more
complicated than the global counter. The existing
mem_cgroup_{begin,end}_page_stat() lock is used to serialize move
accounting with stat updates.
Typical update operation:
memcg = mem_cgroup_begin_page_stat(page)
if (TestSetPageDirty()) {
[...]
mem_cgroup_update_page_stat(memcg)
}
mem_cgroup_end_page_stat(memcg)
Summary of mem_cgroup_end_page_stat() overhead:
- Without CONFIG_MEMCG it's a no-op
- With CONFIG_MEMCG and no inter memcg task movement, it's just
rcu_read_lock()
- With CONFIG_MEMCG and inter memcg task movement, it's
rcu_read_lock() + spin_lock_irqsave()
A memcg parameter is added to several routines because their callers
now grab mem_cgroup_begin_page_stat() which returns the memcg later
needed by for mem_cgroup_update_page_stat().
Because mem_cgroup_begin_page_stat() may disable interrupts, some
adjustments are needed:
- move __mark_inode_dirty() from __set_page_dirty() to its caller.
__mark_inode_dirty() locking does not want interrupts disabled.
- use spin_lock_irqsave(tree_lock) rather than spin_lock_irq() in
__delete_from_page_cache(), replace_page_cache_page(),
invalidate_complete_page2(), and __remove_mapping().
text data bss dec hex filename
8925147 1774832 1785856 12485835 be84cb vmlinux-!CONFIG_MEMCG-before
8925339 1774832 1785856 12486027 be858b vmlinux-!CONFIG_MEMCG-after
+192 text bytes
8965977 1784992 1785856 12536825 bf4bf9 vmlinux-CONFIG_MEMCG-before
|
||
|
|
11f81becca |
page_writeback: revive cancel_dirty_page() in a restricted form
cancel_dirty_page() had some issues and
|
||
|
|
f6454b049d |
block: fix returnvar.cocci warnings
Remove unneeded variable used to store return value. Generated by: scripts/coccinelle/misc/returnvar.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
b25de9d6da |
block: remove BIO_EOPNOTSUPP
Since the big barrier rewrite/removal in 2007 we never fail FLUSH or FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag to help propagating those to the buffer_head layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
b9ea25152e |
page_writeback: clean up mess around cancel_dirty_page()
This patch replaces cancel_dirty_page() with a helper function account_page_cleaned() which only updates counters. It's called from truncate_complete_page() and from try_to_free_buffers() (hack for ext3). Page is locked in both cases, page-lock protects against concurrent dirtiers: see commit |
||
|
|
432f16e64f |
fs: clarify rate limit suppressed buffer I/O errors
When quiet_error applies rate limiting to buffer_io_error calls, what the they apply to is unclear because the name is so generic, particularly if the messages are interleaved with others: [ 1936.063572] quiet_error: 664293 callbacks suppressed [ 1936.065297] Buffer I/O error on dev sdr, logical block 257429952, lost async page write [ 1936.067814] Buffer I/O error on dev sdr, logical block 257429953, lost async page write Also, the function uses printk_ratelimit(), although printk.h includes a comment advising "Please don't use... Instead use printk_ratelimited()." Change buffer_io_error to check the BH_Quiet bit itself, drop the printk_ratelimit call, and print using printk_ratelimited. This makes the messages look like: [ 387.208839] buffer_io_error: 676394 callbacks suppressed [ 387.210693] Buffer I/O error on dev sdr, logical block 211291776, lost async page write [ 387.213432] Buffer I/O error on dev sdr, logical block 211291777, lost async page write Signed-off-by: Robert Elliott <elliott@hp.com> Reviewed-by: Webb Scales <webbnh@hp.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
b744c2ac4b |
fs: merge I/O error prints into one line
buffer.c uses two printk calls to print these messages: [67353.422338] Buffer I/O error on device sdr, logical block 212868488 [67353.422338] lost page write due to I/O error on sdr In a busy system, they may be interleaved with other prints, losing the context for the second message. Merge them into one line with one printk call so the prints are atomic. Also, differentiate between async page writes, sync page writes, and async page reads. Also, shorten "device" to "dev" to match the block layer prints: [67353.467906] blk_update_request: critical target error, dev sdr, sector 1707107328 Also, use %llu rather than %Lu. Resulting prints look like: [ 1356.437006] blk_update_request: critical target error, dev sdr, sector 1719693992 [ 1361.383522] quiet_error: 659876 callbacks suppressed [ 1361.385816] Buffer I/O error on dev sdr, logical block 256902912, lost async page write [ 1361.385819] Buffer I/O error on dev sdr, logical block 256903644, lost async page write Signed-off-by: Robert Elliott <elliott@hp.com> Reviewed-by: Webb Scales <webbnh@hp.com> Signed-off-by: Jens Axboe <axboe@fb.com> |
||
|
|
c2661b8060 |
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: "A large number of cleanups and bug fixes, with some (minor) journal optimizations" [ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ] * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits) ext4: check s_chksum_driver when looking for bg csum presence ext4: move error report out of atomic context in ext4_init_block_bitmap() ext4: Replace open coded mdata csum feature to helper function ext4: delete useless comments about ext4_move_extents ext4: fix reservation overflow in ext4_da_write_begin ext4: add ext4_iget_normal() which is to be used for dir tree lookups ext4: don't orphan or truncate the boot loader inode ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT ext4: optimize block allocation on grow indepth ext4: get rid of code duplication ext4: fix over-defensive complaint after journal abort ext4: fix return value of ext4_do_update_inode ext4: fix mmap data corruption when blocksize < pagesize vfs: fix data corruption when blocksize < pagesize for mmaped data ext4: fold ext4_nojournal_sops into ext4_sops ext4: support freezing ext2 (nojournal) file systems ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs() ext4: don't check quota format when there are no quota files jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list jbd2: avoid pointless scanning of checkpoint lists ... |
||
|
|
9470dd5d35 |
fs: check bh blocknr earlier when searching lru
It's very common for the buffer heads in the lru to have different block
numbers. By comparing the blocknr before the bdev and size we can
reduce the cost of searching in the very common case where all the
entries have the same bdev and size.
In quick hot cache cycle counting tests on a single fs workstation this
cut the cost of a miss by about 20%.
A diff of the disassembly shows the reordering of the bdev and blocknr
comparisons. This is in such a tiny loop that skipping one comparison
is a meaningful portion of the total work being done:
1628: 83 c1 01 add $0x1,%ecx
162b: 83 f9 08 cmp $0x8,%ecx
162e: 74 60 je 1690 <__find_get_block+0xa0>
1630: 89 c8 mov %ecx,%eax
1632: 65 4c 8b 04 c5 00 00 mov %gs:0x0(,%rax,8),%r8
1639: 00 00
163b: 4d 85 c0 test %r8,%r8
163e: 4c 89 c3 mov %r8,%rbx
1641: 74 e5 je 1628 <__find_get_block+0x38>
- 1643: 4d 3b 68 30 cmp 0x30(%r8),%r13
+ 1643: 4d 3b 68 18 cmp 0x18(%r8),%r13
1647: 75 df jne 1628 <__find_get_block+0x38>
- 1649: 4d 3b 60 18 cmp 0x18(%r8),%r12
+ 1649: 4d 3b 60 30 cmp 0x30(%r8),%r12
164d: 75 d9 jne 1628 <__find_get_block+0x38>
164f: 49 39 50 20 cmp %rdx,0x20(%r8)
1653: 75 d3 jne 1628 <__find_get_block+0x38>
Signed-off-by: Zach Brown <zab@zabbo.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
77c688ac87 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro: "The big thing in this pile is Eric's unmount-on-rmdir series; we finally have everything we need for that. The final piece of prereqs is delayed mntput() - now filesystem shutdown always happens on shallow stack. Other than that, we have several new primitives for iov_iter (Matt Wilcox, culled from his XIP-related series) pushing the conversion to ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c cleanups and fixes (including the external name refcounting, which gives consistent behaviour of d_move() wrt procfs symlinks for long and short names alike) and assorted cleanups and fixes all over the place. This is just the first pile; there's a lot of stuff from various people that ought to go in this window. Starting with unionmount/overlayfs mess... ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits) fs/file_table.c: Update alloc_file() comment vfs: Deduplicate code shared by xattr system calls operating on paths reiserfs: remove pointless forward declaration of struct nameidata don't need that forward declaration of struct nameidata in dcache.h anymore take dname_external() into fs/dcache.c let path_init() failures treated the same way as subsequent link_path_walk() fix misuses of f_count() in ppp and netlink ncpfs: use list_for_each_entry() for d_subdirs walk vfs: move getname() from callers to do_mount() gfs2_atomic_open(): skip lookups on hashed dentry [infiniband] remove pointless assignments gadgetfs: saner API for gadgetfs_create_file() f_fs: saner API for ffs_sb_create_file() jfs: don't hash direct inode [s390] remove pointless assignment of ->f_op in vmlogrdr ->open() ecryptfs: ->f_op is never NULL android: ->f_op is never NULL nouveau: __iomem misannotations missing annotation in fs/file.c fs: namespace: suppress 'may be used uninitialized' warnings ... |
||
|
|
86cf78d73d |
fs/buffer.c: increase the buffer-head per-CPU LRU size
Increase the buffer-head per-CPU LRU size to allow efficient filesystem operations that access many blocks for each transaction. For example, creating a file in a large ext4 directory with quota enabled will access multiple buffer heads and will overflow the LRU at the default 8-block LRU size: * parent directory inode table block (ctime, nlinks for subdirs) * new inode bitmap * inode table block * 2 quota blocks * directory leaf block (not reused, but pollutes one cache entry) * 2 levels htree blocks (only one is reused, other pollutes cache) * 2 levels indirect/index blocks (only one is reused) The buffer-head per-CPU LRU size is raised to 16, as it shows in metadata performance benchmarks up to 10% gain for create, 4% for lookup and 7% for destroy. Signed-off-by: Liang Zhen <liang.zhen@intel.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4db96b71e3 |
vfs: guard end of device for mpage interface
Add guard_bio_eod() check for mpage code in order to allow us to do IO even on the odd last sectors of a device, even if the block size is some multiple of the physical sector size. Using mpage_readpages() for block device requires this guard check. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
59d43914ed |
vfs: make guard_bh_eod() more generic
This patchset implements readpages() operation for block device by using mpage_readpages() which can create multipage BIOs instead of BIOs for each page and reduce system CPU time consumption. This patch (of 3): guard_bh_eod() is used in submit_bh() to allow us to do IO even on the odd last sectors of a device, even if the block size is some multiple of the physical sector size. This makes guard_bh_eod() more generic and renames it guard_bio_eod() so that we can use it without struct buffer_head argument. The reason for this change is that using mpage_readpages() for block device requires to add this guard check in mpage code. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c2ca0fcd20 |
fs: make cont_expand_zero interruptible
This patch makes it possible to kill a process looping in cont_expand_zero. A process may spend a lot of time in this function, so it is desirable to be able to kill it. It happened to me that I wanted to copy a piece data from the disk to a file. By mistake, I used the "seek" parameter to dd instead of "skip". Due to the "seek" parameter, dd attempted to extend the file and became stuck doing so - the only possibility was to reset the machine or wait many hours until the filesystem runs out of space and cont_expand_zero fails. We need this patch to be able to terminate the process. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
90a8020278 |
vfs: fix data corruption when blocksize < pagesize for mmaped data
->page_mkwrite() is used by filesystems to allocate blocks under a page which is becoming writeably mmapped in some process' address space. This allows a filesystem to return a page fault if there is not enough space available, user exceeds quota or similar problem happens, rather than silently discarding data later when writepage is called. However VFS fails to call ->page_mkwrite() in all the cases where filesystems need it when blocksize < pagesize. For example when blocksize = 1024, pagesize = 4096 the following is problematic: ftruncate(fd, 0); pwrite(fd, buf, 1024, 0); map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0); map[0] = 'a'; ----> page_mkwrite() for index 0 is called ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */ mremap(map, 1024, 10000, 0); map[4095] = 'a'; ----> no page_mkwrite() called At the moment ->page_mkwrite() is called, filesystem can allocate only one block for the page because i_size == 1024. Otherwise it would create blocks beyond i_size which is generally undesirable. But later at ->writepage() time, we also need to store data at offset 4095 but we don't have block allocated for it. This patch introduces a helper function filesystems can use to have ->page_mkwrite() called at all the necessary moments. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org |
||
|
|
f2d5a94436 |
Fix nasty 32-bit overflow bug in buffer i/o code.
On 32-bit architectures, the legacy buffer_head functions are not always handling the sector number with the proper 64-bit types, and will thus fail on 4TB+ disks. Any code that uses __getblk() (and thus bread(), breadahead(), sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop in __getblk_slow() with an infinite stream of errors logged to dmesg like this: __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648 b_state=0x00000020, b_size=512 device sda1 blocksize: 512 Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the top 32-bits are missing (in this case the 0x1 at the top). This is because grow_dev_page() is broken and has a 32-bit overflow due to shifting the page index value (a pgoff_t - which is just 32 bits on 32-bit architectures) left-shifted as the block number. But the top bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit before the shift. This patch fixes this issue by type casting "index" to sector_t before doing the left shift. Note this is not a theoretical bug but has been seen in the field on a 4TiB hard drive with logical sector size 512 bytes. This patch has been verified to fix the infinite loop problem on 3.17-rc5 kernel using a 4TB disk image mounted using "-o loop". Without this patch doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 100% reproducibly whilst with the patch it works fine as expected. Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3b5e6454aa |
fs/buffer.c: support buffer cache allocations with gfp modifiers
A buffer cache is allocated from movable area because it is referred for a while and released soon. But some filesystems are taking buffer cache for a long time and it can disturb page migration. New APIs are introduced to allocate buffer cache with user specific flag. *_gfp APIs are for user want to set page allocation flag for page cache allocation. And *_unmovable APIs are for the user wants to allocate page cache from non-movable area. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> |
||
|
|
743162013d |
sched: Remove proliferation of wait_on_bit() action functions
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().
So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.
Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.
All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.
The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"
The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.
A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).
Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
||
|
|
2457aec637 |
mm: non-atomically mark page accessed during page cache allocation where possible
aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after. Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage. The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.
The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page. This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.
The primary APIs of concern in this care are the following and are used
by most filesystems.
find_get_page
find_lock_page
find_or_create_page
grab_cache_page_nowait
grab_cache_page_write_begin
All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not. Then
old API is preserved but is basically a thin wrapper around this core
function.
Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job. There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted. This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change. It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.
The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations. The size of the
file is 1/10th physical memory to avoid dirty page balancing. In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO. The sync results are expected to be
more stable. The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.
The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts. Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison. As async results were variable
do to writback timings, I'm only reporting the maximum figures. The sync
results were stable enough to make the mean and stddev uninteresting.
The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.
async dd
3.15.0-rc3 3.15.0-rc3
vanilla accessed-v2
ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%)
tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%)
btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%)
ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%)
xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%)
The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.
samples percentage
ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
e7470ee89f |
fs: buffer: do not use unnecessary atomic operations when discarding buffers
Discarding buffers uses a bunch of atomic operations when discarding buffers because ...... I can't think of a reason. Use a cmpxchg loop to clear all the necessary flags. In most (all?) cases this will be a single atomic operations. [akpm@linux-foundation.org: move BUFFER_FLAGS_DISCARD into the .c file] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
1b938c0827 |
fs/buffer.c: remove block_write_full_page_endio()
The last in-tree caller of block_write_full_page_endio() was removed in January 2013. It's time to remove the EXPORT_SYMBOL, which leaves block_write_full_page() as the only caller of block_write_full_page_endio(), so inline block_write_full_page_endio() into block_write_full_page(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4e857c58ef |
arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
5166701b36 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
|
||
|
|
c186afb4db |
switch ->is_partially_uptodate() to saner arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
|
|
d4263348f7 | Merge branch 'master' into for-next | ||
|
|
e227867f12 |
treewide: Fix typo in Documentation/DocBook
This patch fix spelling typo in Documentation/DocBook. It is because .html and .xml files are generated by make htmldocs, I have to fix a typo within the source files. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
|
|
227d53b397 |
mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.
aio_migratepage [disable interrupt]
migrate_page_copy
clear_page_dirty_for_io
set_page_dirty
__set_page_dirty_buffers
__set_page_dirty
spin_lock_irq
This mean, current aio migration is a deadlockable. spin_lock_irqsave
is a safer alternative and we should use it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Rientjes rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
ca6673b02e |
block: Replace __this_cpu_ptr with raw_cpu_ptr
__this_cpu_ptr is being phased out. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
4f024f3797 |
block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6 |
||
|
|
84235de394 |
fs: buffer: move allocation failure loop into the allocator
Buffer allocation has a very crude indefinite loop around waking the flusher threads and performing global NOFS direct reclaim because it can not handle allocation failures. The most immediate problem with this is that the allocation may fail due to a memory cgroup limit, where flushers + direct reclaim might not make any progress towards resolving the situation at all. Because unlike the global case, a memory cgroup may not have any cache at all, only anonymous pages but no swap. This situation will lead to a reclaim livelock with insane IO from waking the flushers and thrashing unrelated filesystem cache in a tight loop. Use __GFP_NOFAIL allocations for buffers for now. This makes sure that any looping happens in the page allocator, which knows how to orchestrate kswapd, direct reclaim, and the flushers sensibly. It also allows memory cgroups to detect allocations that can't handle failure and will allow them to ultimately bypass the limit if reclaim can not make progress. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b45972265f |
mm: vmscan: take page buffers dirty and locked state into account
Page reclaim keeps track of dirty and under writeback pages and uses it to determine if wait_iff_congested() should stall or if kswapd should begin writing back pages. This fails to account for buffer pages that can be under writeback but not PageWriteback which is the case for filesystems like ext3 ordered mode. Furthermore, PageDirty buffer pages can have all the buffers clean and writepage does no IO so it should not be accounted as congested. This patch adds an address_space operation that filesystems may optionally use to check if a page is really dirty or really under writeback. An implementation is provided for for buffer_heads is added and used for block operations and ext3 in ordered mode. By default the page flags are obeyed. Credit goes to Jan Kara for identifying that the page flags alone are not sufficient for ext3 and sanity checking a number of ideas on how the problem could be addressed. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Zlatko Calusic <zcalusic@bitsync.net> Cc: dormando <dormando@rydia.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d47992f86b |
mm: change invalidatepage prototype to accept length
Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> |
||
|
|
4de13d7aa8 |
Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block
Pull block core updates from Jens Axboe: - Major bit is Kents prep work for immutable bio vecs. - Stable candidate fix for a scheduling-while-atomic in the queue bypass operation. - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging discard bios. - Tejuns changes to convert the writeback thread pool to the generic workqueue mechanism. - Runtime PM framework, SCSI patches exists on top of these in James' tree. - A few random fixes. * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits) relay: move remove_buf_file inside relay_close_buf partitions/efi.c: replace useless kzalloc's by kmalloc's fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read() block: fix max discard sectors limit blkcg: fix "scheduling while atomic" in blk_queue_bypass_start Documentation: cfq-iosched: update documentation help for cfq tunables writeback: expose the bdi_wq workqueue writeback: replace custom worker pool implementation with unbound workqueue writeback: remove unused bdi_pending_list aoe: Fix unitialized var usage bio-integrity: Add explicit field for owner of bip_buf block: Add an explicit bio flag for bios that own their bvec block: Add bio_alloc_pages() block: Convert some code to bio_for_each_segment_all() block: Add bio_for_each_segment_all() bounce: Refactor __blk_queue_bounce to not use bi_io_vec raid1: use bio_copy_data() pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage pktcdvd: use bio_copy_data() block: Add bio_copy_data() ... |
||
|
|
149b306089 |
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: "Mostly performance and bug fixes, plus some cleanups. The one new feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which allows installation of a hidden inode designed for boot loaders." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits) ext4: fix type-widening bug in inode table readahead code ext4: add check for inodes_count overflow in new resize ioctl ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG ext4: fix online resizing for ext3-compat file systems jbd2: trace when lock_buffer in do_get_write_access takes a long time ext4: mark metadata blocks using bh flags buffer: add BH_Prio and BH_Meta flags ext4: mark all metadata I/O with REQ_META ext4: fix readdir error in case inline_data+^dir_index. ext4: fix readdir error in the case of inline_data+dir_index jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset ext4: mext_insert_extents should update extent block checksum ext4: move quota initialization out of inode allocation transaction ext4: reserve xattr index for Rich ACL support jbd2: reduce journal_head size ext4: clear buffer_uninit flag when submitting IO ext4: use io_end for multiple bios ext4: make ext4_bio_write_page() use BH_Async_Write flags ext4: Use kstrtoul() instead of parse_strtoul() ext4: defragmentation code cleanup ... |