kernel_google_wahoo

Author	SHA1	Message	Date
Nathan Chancellor	83a3e3539c	Merge 4.4.155 into android-msm-wahoo-4.4 Changes in 4.4.155: (48 commits) net: 6lowpan: fix reserved space for single frames net: mac802154: tx: expand tailroom if necessary 9p/net: Fix zero-copy path in the 9p virtio transport net: lan78xx: Fix misplaced tasklet_schedule() call spi: davinci: fix a NULL pointer dereference drm/i915/userptr: reject zero user_size powerpc/fadump: handle crash memory ranges array index overflow powerpc/pseries: Fix endianness while restoring of r3 in MCE handler. fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed 9p/virtio: fix off-by-one error in sg list bounds check net/9p/client.c: version pointer uninitialized net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree() x86/mm/pat: Fix L1TF stable backport for CPA, 2nd call dm cache metadata: save in-core policy_hint_size to on-disk superblock iio: ad9523: Fix displayed phase iio: ad9523: Fix return value for ad952x_store() vmw_balloon: fix inflation of 64-bit GFNs vmw_balloon: do not use 2MB without batching vmw_balloon: VMCI_DOORBELL_SET does not check status vmw_balloon: fix VMCI use when balloon built into kernel tracing: Do not call start/stop() functions when tracing_on does not change tracing/blktrace: Fix to allow setting same value kthread, tracing: Don't expose half-written comm when creating kthreads uprobes: Use synchronize_rcu() not synchronize_sched() 9p: fix multiple NULL-pointer-dereferences PM / sleep: wakeup: Fix build error caused by missing SRCU support pnfs/blocklayout: off by one in bl_map_stripe() ARM: tegra: Fix Tegra30 Cardhu PCA954x reset mm/tlb: Remove tlb_remove_table() non-concurrent condition iommu/vt-d: Add definitions for PFSID iommu/vt-d: Fix dev iotlb pfsid use osf_getdomainname(): use copy_to_user() sys: don't hold uts_sem while accessing userspace memory userns: move user access out of the mutex ubifs: Fix memory leak in lprobs self-check Revert "UBIFS: Fix potential integer overflow in allocation" ubifs: Check data node size before truncate ubifs: Fix synced_i_size calculation for xattr inodes pwm: tiehrpwm: Fix disabling of output of PWMs fb: fix lost console when the user unplugs a USB adapter udlfb: set optimal write delay getxattr: use correct xattr length bcache: release dc->writeback_lock properly in bch_writeback_thread() perf auxtrace: Fix queue resize fs/quota: Fix spectre gadget in do_quotactl x86/io: add interface to reserve io memtype for a resource range. (v1.1) drm/drivers: add support for using the arch wc mapping API. Linux 4.4.155 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>	2018-09-09 11:22:27 -07:00
Snild Dolkow	f6db350c9a	kthread, tracing: Don't expose half-written comm when creating kthreads commit 3e536e222f2930534c252c1cc7ae799c725c5ff9 upstream. There is a window for racing when printing directly to task->comm, allowing other threads to see a non-terminated string. The vsnprintf function fills the buffer, counts the truncated chars, then finally writes the \0 at the end. creator other vsnprintf: fill (not terminated) count the rest trace_sched_waking(p): ... memcpy(comm, p->comm, TASK_COMM_LEN) write \0 The consequences depend on how 'other' uses the string. In our case, it was copied into the tracing system's saved cmdlines, a buffer of adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be): crash-arm64> x/1024s savedcmd->saved_cmdlines \| grep 'evenk' 0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12" ...and a strcpy out of there would cause stack corruption: [224761.522292] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffff9bf9783c78 crash-arm64> kbt \| grep 'comm\\|trace_print_context' #6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396) comm (char [16]) = "irq/497-pwr_even" crash-arm64> rd 0xffffffd4d0e17d14 8 ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_ ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16: ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`.. ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`.......... The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was likely needed because of this same bug. Solved by vsnprintf:ing to a local buffer, then using set_task_comm(). This way, there won't be a window where comm is not terminated. Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com Cc: stable@vger.kernel.org Fixes: `bc0c38d139` ("ftrace: latency tracer infrastructure") Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Snild Dolkow <snild@sony.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [backported to 3.18 / 4.4 by Snild] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-09-09 20:04:34 +02:00
Thierry Strudel	b11ab24fe6	Merged linux-4.4.70 into android-msm-wahoo-4.4 Linux 4.4.70 drivers: char: mem: Check for address space wraparound with mmap() nfsd: encoders mustn't use unitialized values in error cases drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2 PCI: Freeze PME scan before suspending devices PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms tracing/kprobes: Enforce kprobes teardown after testing osf_wait4(): fix infoleak genirq: Fix chained interrupt data ordering uwb: fix device quirk on big-endian hosts metag/uaccess: Check access_ok in strncpy_from_user metag/uaccess: Fix access_ok() iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD. staging: rtl8192e: fix 2 byte alignment of register BSSIDR. mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp xc2028: Fix use-after-free bug properly arm64: documentation: document tagged pointer stack constraints arm64: uaccess: ensure extension of access_ok() addr arm64: xchg: hazard against entire exchange variable ARM: dts: at91: sama5d3_xplained: not all ADC channels are available ARM: dts: at91: sama5d3_xplained: fix ADC vref powerpc/64e: Fix hang when debugging programs with relocated kernel powerpc/pseries: Fix of_node_put() underflow during DLPAR remove powerpc/book3s/mce: Move add_taint() later in virtual mode cx231xx-cards: fix NULL-deref at probe cx231xx-audio: fix NULL-deref at probe cx231xx-audio: fix init error path dvb-frontends/cxd2841er: define symbol_rate_min/max in T/C fe-ops zr364xx: enforce minimum size when reading header dib0700: fix NULL-deref at probe s5p-mfc: Fix unbalanced call to clock management gspca: konica: add missing endpoint sanity check ceph: fix recursion between ceph_set_acl() and __ceph_setattr() iio: proximity: as3935: fix as3935_write ipx: call ipxitf_put() in ioctl error path USB: hub: fix non-SS hub-descriptor handling USB: hub: fix SS hub-descriptor handling USB: serial: io_ti: fix div-by-zero in set_termios USB: serial: mct_u232: fix big-endian baud-rate handling USB: serial: qcserial: add more Lenovo EM74xx device IDs usb: serial: option: add Telit ME910 support USB: iowarrior: fix info ioctl on big-endian hosts usb: musb: tusb6010_omap: Do not reset the other direction's packet size ttusb2: limit messages to buffer size mceusb: fix NULL-deref at probe usbvision: fix NULL-deref at probe net: irda: irda-usb: fix firmware name on big-endian hosts usb: host: xhci-mem: allocate zeroed Scratchpad Buffer xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton usb: host: xhci-plat: propagate return value of platform_get_irq() sched/fair: Initialize throttle_count for new task-groups lazily sched/fair: Do not announce throttled next buddy in dequeue_task_fair() fscrypt: avoid collisions when presenting long encrypted filenames f2fs: check entire encrypted bigname when finding a dentry fscrypt: fix context consistency check when key(s) unavailable net: qmi_wwan: Add SIMCom 7230E ext4 crypto: fix some error handling ext4 crypto: don't let data integrity writebacks fail with ENOMEM USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs USB: serial: ftdi_sio: fix setting latency for unprivileged users pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes iio: dac: ad7303: fix channel description of: fix sparse warning in of_pci_range_parser_one proc: Fix unbalanced hard link numbers cdc-acm: fix possible invalid access when processing notification drm/nouveau/tmr: handle races with hw when updating the next alarm time drm/nouveau/tmr: avoid processing completed alarms when adding a new one drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm drm/nouveau/tmr: ack interrupt before processing alarms drm/nouveau/therm: remove ineffective workarounds for alarm bugs drm/amdgpu: Make display watermark calculations more accurate drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations. ath9k_htc: fix NULL-deref at probe ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device s390/cputime: fix incorrect system time s390/kdump: Add final note regulator: tps65023: Fix inverted core enable logic. KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation KVM: x86: Fix load damaged SSEx MXCSR register ima: accept previously set IMA_NEW_FILE mwifiex: pcie: fix cmd_buf use-after-free in remove/reset rtlwifi: rtl8821ae: setup 8812ae RFE according to device type md: update slab_cache before releasing new stripes when stripes resizing dm space map disk: fix some book keeping in the disk space map dm thin metadata: call precommit before saving the roots dm bufio: make the parameter "retain_bytes" unsigned long dm cache metadata: fail operations if fail_io mode has been established dm bufio: check new buffer allocation watermark every 30 seconds dm bufio: avoid a possible ABBA deadlock dm raid: select the Kconfig option CONFIG_MD_RAID0 dm btree: fix for dm_btree_find_lowest_key() infiniband: call ipv6 route lookup via the stub interface tpm_crb: check for bad response size ARM: tegra: paz00: Mark panel regulator as enabled on boot USB: core: replace %p with %pK char: lp: fix possible integer overflow in lp_setup() watchdog: pcwd_usb: fix NULL-deref at probe USB: ene_usb6250: fix DMA to the stack usb: misc: legousbtower: Fix memory leak usb: misc: legousbtower: Fix buffers on stack Linux 4.4.69 ipmi: Fix kernel panic at ipmi_ssif_thread() wlcore: Add RX_BA_WIN_SIZE_CHANGE_EVENT event wlcore: Pass win_size taken from ieee80211_sta to FW mac80211: RX BA support for sta max_rx_aggregation_subframes mac80211: pass block ack session timeout to to driver mac80211: pass RX aggregation window size to driver Bluetooth: hci_intel: add missing tty-device sanity check Bluetooth: hci_bcm: add missing tty-device sanity check Bluetooth: Fix user channel for 32bit userspace on 64bit kernel tty: pty: Fix ldisc flush after userspace become aware of the data already serial: omap: suspend device on probe errors serial: omap: fix runtime-pm handling on unbind serial: samsung: Use right device for DMA-mapping calls arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses padata: free correct variable CIFS: add misssing SFM mapping for doublequote cifs: fix CIFS_IOC_GET_MNT_INFO oops CIFS: fix mapping of SFM_SPACE and SFM_PERIOD SMB3: Work around mount failure when using SMB3 dialect to Macs Set unicode flag on cifs echo request to avoid Mac error fs/block_dev: always invalidate cleancache in invalidate_bdev() ceph: fix memory leak in __ceph_setxattr() fs/xattr.c: zero out memory copied to userspace in getxattr ext4: evict inline data when writing to memory map IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level IB/mlx4: Fix ib device initialization error flow IB/IPoIB: ibX: failed to create mcg debug file IB/core: Fix sysfs registration error flow vfio/type1: Remove locked page accounting workqueue dm era: save spacemap metadata root after the pre-commit crypto: algif_aead - Require setkey before accept(2) block: fix blk_integrity_register to use template's interval_exp if not 0 KVM: arm/arm64: fix races in kvm_psci_vcpu_on KVM: x86: fix user triggerable warning in kvm_apic_accept_events() um: Fix PTRACE_POKEUSER on x86_64 x86, pmem: Fix cache flushing for iovec write < 8 bytes selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup usb: hub: Do not attempt to autosuspend disconnected devices usb: hub: Fix error loop seen after hub communication errors usb: Make sure usb/phy/of gets built-in usb: misc: add missing continue in switch staging: comedi: jr3_pci: cope with jiffies wraparound staging: comedi: jr3_pci: fix possible null pointer dereference staging: gdm724x: gdm_mux: fix use-after-free on module unload staging: vt6656: use off stack for out buffer USB transfers. staging: vt6656: use off stack for in buffer USB transfers. USB: Proper handling of Race Condition when two USB class drivers try to call init_usb_class simultaneously USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit usb: host: xhci: print correct command ring address iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement target: Convert ACL change queue_depth se_session reference usage target/fileio: Fix zero-length READ and WRITE handling target: Fix compare_and_write_callback handling for non GOOD status xen: adjust early dom0 p2m handling to xen hypervisor behavior Linux 4.4.68 block: get rid of blk_integrity_revalidate() drm/ttm: fix use-after-free races in vm fault handling f2fs: sanity check segment count bnxt_en: allocate enough space for ->ntp_fltr_bmap ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf ipv6: initialize route null entry in addrconf_init() rtnetlink: NUL-terminate IFLA_PHYS_PORT_NAME string ipv4, ipv6: ensure raw socket message is big enough to hold an IP header tcp: do not inherit fastopen_req from parent tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 tcp: do not underestimate skb->truesize in tcp_trim_head() ALSA: hda - Fix deadlock of controller device lock at unbinding staging: emxx_udc: remove incorrect __init annotations staging: wlan-ng: add missing byte order conversion brcmfmac: Make skb header writable before use brcmfmac: Ensure pointer correctly set if skb data location changes MIPS: R2-on-R6 MULTU/MADDU/MSUBU emulation bugfix scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m serial: 8250_omap: Fix probe and remove for PM runtime phy: qcom-usb-hs: Add depends on EXTCON USB: serial: io_edgeport: fix descriptor error handling USB: serial: mct_u232: fix modem-status error handling USB: serial: quatech2: fix control-message error handling USB: serial: ftdi_sio: fix latency-timer error handling USB: serial: ark3116: fix open error handling USB: serial: ti_usb_3410_5052: fix control-message error handling USB: serial: io_edgeport: fix epic-descriptor handling USB: serial: ssu100: fix control-message error handling USB: serial: digi_acceleport: fix incomplete rx sanity check USB: serial: keyspan_pda: fix receive sanity checks usb: chipidea: Handle extcon events properly usb: chipidea: Only read/write OTGSC from one place usb: host: ohci-exynos: Decrese node refcount on exynos_ehci_get_phy() error paths usb: host: ehci-exynos: Decrese node refcount on exynos_ehci_get_phy() error paths KVM: nVMX: do not leak PML full vmexit to L1 KVM: nVMX: initialize PML fields in vmcs02 Revert "KVM: nested VMX: disable perf cpuid reporting" x86/platform/intel-mid: Correct MSI IRQ line for watchdog device kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed clk: Make x86/ conditional on CONFIG_COMMON_CLK x86/pci-calgary: Fix iommu_free() comparison of unsigned expression >= 0 x86/ioapic: Restore IO-APIC irq_chip retrigger callback mwifiex: Avoid skipping WEP key deletion for AP mwifiex: remove redundant dma padding in AMSDU mwifiex: debugfs: Fix (sometimes) off-by-1 SSID print ARM: OMAP5 / DRA7: Fix HYP mode boot for thumb2 build leds: ktd2692: avoid harmless maybe-uninitialized warning power: supply: bq24190_charger: Handle fault before status on interrupt power: supply: bq24190_charger: Don't read fault register outside irq_handle_thread() power: supply: bq24190_charger: Call power_supply_changed() for relevant component power: supply: bq24190_charger: Install irq_handler_thread() at end of probe() power: supply: bq24190_charger: Call set_mode_host() on pm_resume() power: supply: bq24190_charger: Fix irq trigger to IRQF_TRIGGER_FALLING powerpc/powernv: Fix opal_exit tracepoint opcode cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores ARM: 8452/3: PJ4: make coprocessor access sequences buildable in Thumb2 mode 9p: fix a potential acl leak Linux 4.4.67 dm ioctl: prevent stack leak in dm ioctl call nfsd: stricter decoding of write-like NFSv2/v3 ops nfsd4: minor NFSv2/v3 write decoding cleanup ext4/fscrypto: avoid RCU lookup in d_revalidate ext4 crypto: use dget_parent() in ext4_d_revalidate() ext4 crypto: revalidate dentry after adding or removing the key ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY IB/ehca: fix maybe-uninitialized warnings IB/qib: rename BITS_PER_PAGE to RVT_BITS_PER_PAGE netlink: Allow direct reclaim for fallback allocation 8250_pci: Fix potential use-after-free in error path scsi: cxlflash: Improve EEH recovery time scsi: cxlflash: Fix to avoid EEH and host reset collisions scsi: cxlflash: Scan host only after the port is ready for I/O net: tg3: avoid uninitialized variable warning mtd: avoid stack overflow in MTD CFI code drbd: avoid redefinition of BITS_PER_PAGE ALSA: ppc/awacs: shut up maybe-uninitialized warning ASoC: intel: Fix PM and non-atomic crash in bytcr drivers Handle mismatched open calls timerfd: Protect the might cancel mechanism proper Linux 4.4.66 ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram ARCv2: save r30 on kernel entry as gcc uses it for code-gen nfsd: check for oversized NFSv2/v3 arguments Input: i8042 - add Clevo P650RS to the i8042 reset list p9_client_readdir() fix MIPS: Avoid BUG warning in arch_check_elf MIPS: KGDB: Use kernel context for sleeping threads ALSA: seq: Don't break snd_use_lock_sync() loop by timeout ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type ipv6: check raw payload size correctly in ioctl ipv6: check skb->protocol before lookup for nexthop macvlan: Fix device ref leak when purging bc_queue ip6mr: fix notification device destruction netpoll: Check for skb->queue_mapping net: ipv6: RTF_PCPU should not be settable from userspace dp83640: don't recieve time stamps twice tcp: clear saved_syn in tcp_disconnect() sctp: listen on the sock only when it's state is listening or closed net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given l2tp: fix PPP pseudo-wire auto-loading l2tp: take reference on sessions being dumped net/packet: fix overflow in check for tp_reserve net/packet: fix overflow in check for tp_frame_nr l2tp: purge socket queues in the .destruct() callback net: phy: handle state correctly in phy_stop_machine net: neigh: guard against NULL solicit() method sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write() sparc64: kern_addr_valid regression xen/x86: don't lose event interrupts usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize regulator: core: Clear the supply pointer if enabling fails RDS: Fix the atomicity for congestion map update net_sched: close another race condition in tcf_mirred_release() net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata MIPS: Fix crash registers on non-crashing CPUs md:raid1: fix a dead loop when read from a WriteMostly disk ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea() drm/amdgpu: fix array out of bounds crypto: testmgr - fix out of bound read in __test_aead() clk: sunxi: Add apb0 gates for H3 ARM: OMAP2+: timer: add probe for clocksources xc2028: unlock on error in xc2028_set_config() f2fs: do more integrity verification for superblock Linux 4.4.65 perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race ping: implement proper locking staging/android/ion : fix a race condition in the ion driver vfio/pci: Fix integer overflows, bitmask check tipc: check minimum bearer MTU netfilter: nfnetlink: correctly validate length of batch messages xc2028: avoid use after free mnt: Add a per mount namespace limit on the number of mounts tipc: fix socket timer deadlock tipc: fix random link resets while adding a second bearer gfs2: avoid uninitialized variable warning hostap: avoid uninitialized variable use in hfa384x_get_rid tty: nozomi: avoid a harmless gcc warning tipc: correct error in node fsm tipc: re-enable compensation for socket receive buffer double counting tipc: make dist queue pernet tipc: make sure IPv6 header fits in skb headroom Linux 4.4.64 tipc: fix crash during node removal block: fix del_gendisk() vs blkdev_ioctl crash x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions hv: don't reset hv_context.tsc_page on crash Drivers: hv: balloon: account for gaps in hot add regions Drivers: hv: balloon: keep track of where ha_region starts Tools: hv: kvp: ensure kvp device fd is closed on exec kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction ubi/upd: Always flush after prepared for an update mac80211: reject ToDS broadcast data frames mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card ACPI / power: Avoid maybe-uninitialized warning Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled VSOCK: Detach QP check should filter out non matching QPs. Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg() Drivers: hv: get rid of timeout in vmbus_open() Drivers: hv: don't leak memory in vmbus_establish_gpadl() s390/mm: fix CMMA vs KSM vs others CIFS: remove bad_network_name flag cifs: Do not send echoes before Negotiate is complete ring-buffer: Have ring_buffer_iter_empty() return true when empty tracing: Allocate the snapshot buffer before enabling probe KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings KEYS: Change the name of the dead type to ".dead" to prevent user access KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings Linux 4.4.63 MIPS: fix Select HAVE_IRQ_EXIT_ON_IRQ_STACK patch. sctp: deny peeloff operation on asocs with threads sleeping on it net: ipv6: check route protocol when deleting routes tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done SUNRPC: fix refcounting problems with auth_gss messages. ibmveth: calculate gso_segs for large packets catc: Use heap buffer for memory size test catc: Combine failure cleanup code in catc_probe() rtl8150: Use heap buffers for all register access pegasus: Use heap buffers for all register access virtio-console: avoid DMA from stack dvb-usb-firmware: don't do DMA on stack dvb-usb: don't use stack for firmware load mm: Tighten x86 /dev/mem with zeroing reads rtc: tegra: Implement clock handling platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event ext4: fix inode checksum calculation problem if i_extra_size is small dvb-usb-v2: avoid use-after-free ath9k: fix NULL pointer dereference crypto: ahash - Fix EINPROGRESS notification callback powerpc: Disable HFSCR[TM] if TM is not supported zram: do not use copy_page with non-page aligned address kvm: fix page struct leak in handle_vmon Revert "MIPS: Lantiq: Fix cascaded IRQ setup" char: lack of bool string made CONFIG_DEVPORT always on char: Drop bogus dependency of DEVPORT on !M68K ftrace: Fix removing of second function probe irqchip/irq-imx-gpcv2: Fix spinlock initialization libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat xen, fbfront: fix connecting to backend scsi: sd: Fix capacity calculation with 32-bit sector_t scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable scsi: sr: Sanity check returned mode data iscsi-target: Drop work-around for legacy GlobalSAN initiator iscsi-target: Fix TMR reference leak during session shutdown acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison) x86/vdso: Plug race between mapping and ELF header setup x86/vdso: Ensure vdso32_enabled gets set to valid values only perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32() Input: xpad - add support for Razer Wildcat gamepad CIFS: store results of cifs_reopen_file to avoid infinite wait drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one drm/nouveau/mpeg: mthd returns true on success now thp: fix MADV_DONTNEED vs clear soft dirty race cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups Linux 4.4.62 ibmveth: set correct gso_size and gso_type net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions net/mlx4_core: Fix racy CQ (Completion Queue) free net/mlx4_en: Fix bad WQE issue usb: hub: Wait for connection to be reestablished after port reset blk-mq: Avoid memory reclaim when remapping queues net/packet: fix overflow in check for priv area size crypto: caam - fix RNG deinstantiation error checking MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK MIPS: Switch to the irq_stack in interrupts MIPS: Only change $28 to thread_info if coming from user mode MIPS: Stack unwinding while on IRQ stack MIPS: Introduce irq_stack mtd: bcm47xxpart: fix parsing first block after aligned TRX usb: dwc3: gadget: delay unmap of bounced requests drm/i915: Stop using RP_DOWN_EI on Baytrail drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3 Linux 4.4.61 mm/mempolicy.c: fix error handling in set_mempolicy and mbind. MIPS: Flush wrong invalid FTLB entry for huge page MIPS: Lantiq: fix missing xbar kernel panic MIPS: End spinlocks with .insn MIPS: ralink: Fix typos in rt3883 pinctrl MIPS: Force o32 fp64 support on 32bit MIPS64r6 kernels s390/uaccess: get_user() should zero on failure (again) s390/decompressor: fix initrd corruption caused by bss clear nios2: reserve boot memory for device tree powerpc: Don't try to fix up misaligned load-with-reservation instructions powerpc/mm: Add missing global TLB invalidate if cxl is active metag/usercopy: Add missing fixups metag/usercopy: Fix src fixup in from user rapf loops metag/usercopy: Set flags before ADDZ metag/usercopy: Zero rest of buffer from copy_from_user metag/usercopy: Add early abort to copy_to_user metag/usercopy: Fix alignment error checking metag/usercopy: Drop unused macros ring-buffer: Fix return value check in test_ringbuffer() ptrace: fix PTRACE_LISTEN race corrupting task->state Reset TreeId to zero on SMB2 TREE_CONNECT iio: bmg160: reset chip when probing arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm staging: android: ashmem: lseek failed due to no FMODE_LSEEK. sysfs: be careful of error returns from ops->show() drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl() drm/vmwgfx: Remove getparam error message drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl() drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl() drm/vmwgfx: Type-check lookups of fence objects Bug: 62730977 Change-Id: I4458200bbc977cf55a134fd9fd08627604e36d95 Signed-off-by: Thierry Strudel <tstrudel@google.com>	2017-09-20 15:50:18 -07:00
Oleg Nesterov	717b77c07f	UPSTREAM: kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function get_task_struct(tsk) no longer pins tsk->stack so all users of to_live_kthread() should do try_get_task_stack/put_task_stack to protect "struct kthread" which lives on kthread's stack. TODO: Kill to_live_kthread(), perhaps we can even kill "struct kthread" too, and rework kthread_stop(), it can use task_work_add() to sync with the exiting kernel thread. Message-Id: <20160629180357.GA7178@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jann Horn <jann@thejh.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cb9b16bbc19d4aea4507ab0552e4644c1211d130.1474003868.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Bug: 64672701 Change-Id: I2872658e56dcb1ab4173c490ef8f52affa54a404 (cherry picked from commit 23196f2e5f5d810578a772785807dcdc2b9fdce9) Signed-off-by: Zubin Mithra <zsm@google.com>	2017-08-21 16:04:11 +01:00
Petr Mladek	0025d3d13e	BACKPORT: kthread: allow to cancel kthread work We are going to use kthread workers more widely and sometimes we will need to make sure that the work is neither pending nor running. This patch implements cancel_*_sync() operations as inspired by workqueues. Well, we are synchronized against the other operations via the worker lock, we use del_timer_sync() and a counter to count parallel cancel operations. Therefore the implementation might be easier. First, we check if a worker is assigned. If not, the work has newer been queued after it was initialized. Second, we take the worker lock. It must be the right one. The work must not be assigned to another worker unless it is initialized in between. Third, we try to cancel the timer when it exists. The timer is deleted synchronously to make sure that the timer call back is not running. We need to temporary release the worker->lock to avoid a possible deadlock with the callback. In the meantime, we set work->canceling counter to avoid any queuing. Fourth, we try to remove the work from a worker list. It might be the list of either normal or delayed works. Fifth, if the work is running, we call kthread_flush_work(). It might take an arbitrary time. We need to release the worker-lock again. In the meantime, we again block any queuing by the canceling counter. As already mentioned, the check for a pending kthread work is done under a lock. In compare with workqueues, we do not need to fight for a single PENDING bit to block other operations. Therefore we do not suffer from the thundering storm problem and all parallel canceling jobs might use kthread_flush_work(). Any queuing is blocked until the counter gets zero. Change-Id: I8a8ece0f93c828f311d0ad5c88d80db2388e4808 Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry-picked from 37be45d49dec2a411e29d50c9597cfe8184b5645) [major changes to the original patch while cherry-picking; only rebased the sync variant] Signed-off-by: Juri Lelli <juri.lelli@arm.com>	2017-05-13 13:02:56 -07:00
Tejun Heo	3144d81a77	cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream. Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-21 09:30:04 +02:00
Andrew Morton	e9f069868d	kernel/kthread.c:kthread_create_on_node(): clarify documentation - Make it clear that the `node' arg refers to memory allocations only: kthread_create_on_node() does not pin the new thread to that node's CPUs. - Encourage the use of NUMA_NO_NODE. [nzimmer@sgi.com: use NUMA_NO_NODE in kthread_create() also] Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Tejun Heo <tj@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Linus Torvalds	a1d8561172	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: `9d89c257df` ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into switch_to() sched, score: Remove finish_arch_switch() sched, avr32: Remove finish_arch_switch() sched, MIPS: Get rid of finish_arch_switch() sched, arm: Remove finish_arch_switch() sched/fair: Clean up load average references sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Remove task and group entity load when they are dead sched/fair: Init cfs_rq's sched_entity load average ...	2015-08-31 20:26:22 -07:00
Peter Zijlstra	25834c73f9	sched: Fix a race between __kthread_bind() and sched_setaffinity() Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY without locks, a caller might observe an old value and race with the set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo it: __kthread_bind() do_set_cpus_allowed() <SYSCALL> sched_setaffinity() if (p->flags & PF_NO_SETAFFINITIY) set_cpus_allowed_ptr() p->flags \|= PF_NO_SETAFFINITY Fix the bug by putting everything under the regular scheduler locks. This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dedekind1@gmail.com Cc: juri.lelli@arm.com Cc: mgorman@suse.de Cc: riel@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-12 12:06:09 +02:00
David Kershner	18896451ea	kthread: export kthread functions The s-Par visornic driver, currently in staging, processes a queue being serviced by the an s-Par service partition. We can get a message that something has happened with the Service Partition, when that happens, we must not access the channel until we get a message that the service partition is back again. The visornic driver has a thread for processing the channel, when we get the message, we need to be able to park the thread and then resume it when the problem clears. We can do this with kthread_park and unpark but they are not exported from the kernel, this patch exports the needed functions. Signed-off-by: David Kershner <david.kershner@unisys.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-08-07 04:39:41 +03:00
Nishanth Aravamudan	109228389a	kernel/kthread.c: partial revert of `81c98869fa` ("kthread: ensure locality of task_struct allocations") After discussions with Tejun, we don't want to spread the use of cpu_to_mem() (and thus knowledge of allocators/NUMA topology details) into callers, but would rather ensure the callees correctly handle memoryless nodes. With the previous patches ("topology: add support for node_to_mem_node() to determine the fallback node" and "slub: fallback to node_to_mem_node() node if allocating on memoryless node") adding and using node_to_mem_node(), we can safely undo part of the change to the kthread logic from `81c98869fa`. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Han Pingtian <hanpt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Lameter <cl@linux.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-09 22:25:51 -04:00
Lai Jiangshan	ed1403ec2b	kthread_work: wake up worker only when the worker is idle If the worker is already executing a work item when another is queued, we can safely skip wakeup without worrying about stalling queue thus avoiding waking up the busy worker spuriously. Spurious wakeups should be fine but still isn't nice and avoiding it is trivial here. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-07-28 14:07:52 -04:00
Tetsuo Handa	8fe6929cfd	kthread: fix return value of kthread_create() upon SIGKILL. Commit `786235eeba` ("kthread: make kthread_create() killable") meant for allowing kthread_create() to abort as soon as killed by the OOM-killer. But returning -ENOMEM is wrong if killed by SIGKILL from userspace. Change kthread_create() to return -EINTR upon SIGKILL. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-04 16:53:51 -07:00
Nishanth Aravamudan	81c98869fa	kthread: ensure locality of task_struct allocations In the presence of memoryless nodes, numa_node_id() will return the current CPU's NUMA node, but that may not be where we expect to allocate from memory from. Instead, we should rely on the fallback code in the memory allocator itself, by using NUMA_NO_NODE. Also, when calling kthread_create_on_node(), use the nearest node with memory to the cpu in question, rather than the node it is running on. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Anton Blanchard <anton@samba.org> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-03 16:20:49 -07:00
Tetsuo Handa	786235eeba	kthread: make kthread_create() killable Any user process callers of wait_for_completion() except global init process might be chosen by the OOM killer while waiting for completion() call by some other process which does memory allocation. See CVE-2012-4398 "kernel: request_module() OOM local DoS" can happen. When such users are chosen by the OOM killer when they are waiting for completion() in TASK_UNINTERRUPTIBLE, the system will be kept stressed due to memory starvation because the OOM killer cannot kill such users. kthread_create() is one of such users and this patch fixes the problem for kthreadd by making kthread_create() killable - the same approach used for fixing CVE-2012-4398. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:08:59 +09:00
Tejun Heo	cd42d559e4	kthread: implement probe_kthread_data() One of the problems that arise when converting dedicated custom threadpool to workqueue is that the shared worker pool used by workqueue anonimizes each worker making it more difficult to identify what the worker was doing on which target from the output of sysrq-t or debug dump from oops, BUG() and friends. For example, after writeback is converted to use workqueue instead of priviate thread pool, there's no easy to tell which backing device a writeback work item was working on at the time of task dump, which, according to our writeback brethren, is important in tracking down issues with a lot of mounted file systems on a lot of different devices. This patchset implements a way for a work function to mark its execution instance so that task dump of the worker task includes information to indicate what the work item was doing. An example WARN dump would look like the following. WARNING: at fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0() Modules linked in: CPU: 0 Pid: 28 Comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 Workqueue: writeback bdi_writeback_workfn (flush-8:16) ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8 ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0 ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08 Call Trace: [<ffffffff81c61855>] dump_stack+0x19/0x1b [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0 ... This patch: Implement probe_kthread_data() which returns kthread_data if accessible. The function is equivalent to kthread_data() except that the specified @task may not be a kthread or its vfork_done is already cleared rendering struct kthread inaccessible. In the former case, probe_kthread_data() may return any value. In the latter, NULL. This will be used to safely print debug information without affecting synchronization in the normal paths. Workqueue debug info printing on dump_stack() and friends will make use of it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Linus Torvalds	46d9be3e5e	Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...	2013-04-29 19:07:40 -07:00
Oleg Nesterov	b5c5442bb6	kthread: kill task_get_live_kthread() task_get_live_kthread() looks confusing and unneeded. It does get_task_struct() but only kthread_stop() needs this, it can be called even if the calller doesn't have a reference when we know that this kthread can't exit until we do kthread_stop(). kthread_park() and kthread_unpark() do not need get_task_struct(), the callers already have the reference. And it can not help if we can race with the exiting kthread anyway, kthread_park() can hang forever in this case. Change kthread_park() and kthread_unpark() to use to_live_kthread(), change kthread_stop() to do get_task_struct() by hand and remove task_get_live_kthread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:25 -07:00
Oleg Nesterov	4ecdafc808	kthread: introduce to_live_kthread() "k->vfork_done != NULL" with a barrier() after to_kthread(k) in task_get_live_kthread(k) looks unclear, and sub-optimal because we load ->vfork_done twice. All we need is to ensure that we do not return to_kthread(NULL). Add a new trivial helper which loads/checks ->vfork_done once, this also looks more understandable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:25 -07:00
Thomas Gleixner	f2530dc71c	kthread: Prevent unpark race which puts threads on the wrong cpu The smpboot threads rely on the park/unpark mechanism which binds per cpu threads on a particular core. Though the functionality is racy: CPU0 CPU1 CPU2 unpark(T) wake_up_process(T) clear(SHOULD_PARK) T runs leave parkme() due to !SHOULD_PARK bind_to(CPU2) BUG_ON(wrong CPU) We cannot let the tasks move themself to the target CPU as one of those tasks is actually the migration thread itself, which requires that it starts running on the target cpu right away. The solution to this problem is to prevent wakeups in park mode which are not from unpark(). That way we can guarantee that the association of the task to the target cpu is working correctly. Add a new task state (TASK_PARKED) which prevents other wakeups and use this state explicitly for the unpark wakeup. Peter noticed: Also, since the task state is visible to userspace and all the parked tasks are still in the PID space, its a good hint in ps and friends that these tasks aren't really there for the moment. The migration thread has another related issue. CPU0 CPU1 Bring up CPU2 create_thread(T) park(T) wait_for_completion() parkme() complete() sched_set_stop_task() schedule(TASK_PARKED) The sched_set_stop_task() call is issued while the task is on the runqueue of CPU1 and that confuses the hell out of the stop_task class on that cpu. So we need the same synchronizaion before sched_set_stop_task(). Reported-by: Dave Jones <davej@redhat.com> Reported-and-tested-by: Dave Hansen <dave@sr71.net> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Acked-by: Peter Ziljstra <peterz@infradead.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: dhillf@gmail.com Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-04-12 14:18:43 +02:00
Tejun Heo	14a40ffccd	sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY PF_THREAD_BOUND was originally used to mark kernel threads which were bound to a specific CPU using kthread_bind() and a task with the flag set allows cpus_allowed modifications only to itself. Workqueue is currently abusing it to prevent userland from meddling with cpus_allowed of workqueue workers. What we need is a flag to prevent userland from messing with cpus_allowed of certain kernel tasks. In kernel, anyone can (incorrectly) squash the flag, and, for worker-type usages, restricting cpus_allowed modification to the task itself doesn't provide meaningful extra proection as other tasks can inject work items to the task anyway. This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY. sched_setaffinity() checks the flag and return -EINVAL if set. set_cpus_allowed_ptr() is no longer affected by the flag. This will allow simplifying workqueue worker CPU affinity management. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>	2013-03-19 13:45:20 -07:00
Lai Jiangshan	aee4faa499	kthread: use N_MEMORY instead N_HIGH_MEMORY N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-12 17:38:33 -08:00
Linus Torvalds	4e21fc138b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull third pile of kernel_execve() patches from Al Viro: "The last bits of infrastructure for kernel_thread() et.al., with alpha/arm/x86 use of those. Plus sanitizing the asm glue and do_notify_resume() on alpha, fixing the "disabled irq while running task_work stuff" breakage there. At that point the rest of kernel_thread/kernel_execve/sys_execve work can be done independently for different architectures. The only pending bits that do depend on having all architectures converted are restrictred to fs/* and kernel/* - that'll obviously have to wait for the next cycle. I thought we'd have to wait for all of them done before we start eliminating the longjump-style insanity in kernel_execve(), but it turned out there's a very simple way to do that without flagday-style changes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to saner kernel_execve() semantics arm: switch to saner kernel_execve() semantics x86, um: convert to saner kernel_execve() semantics infrastructure for saner ret_from_kernel_thread semantics make sure that kernel_thread() callbacks call do_exit() themselves make sure that we always have a return path from kernel_execve() ppc: eeh_event should just use kthread_run() don't bother with kernel_thread/kernel_execve for launching linuxrc alpha: get rid of switch_stack argument of do_work_pending() alpha: don't bother passing switch_stack separately from regs alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c alpha: simplify TIF_NEED_RESCHED handling	2012-10-13 10:05:52 +09:00
Al Viro	a74fb73c12	infrastructure for saner ret_from_kernel_thread semantics * allow kernel_execve() leave the actual return to userland to caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers updated accordingly. * architecture that does select GENERIC_KERNEL_EXECVE in its Kconfig should have its ret_from_kernel_thread() do this: call schedule_tail call the callback left for it by copy_thread(); if it ever returns, that's because it has just done successful kernel_execve() jump to return from syscall IOW, its only difference from ret_from_fork() is that it does call the callback. * such an architecture should also get rid of ret_from_kernel_execve() and __ARCH_WANT_KERNEL_EXECVE This is the last part of infrastructure patches in that area - from that point on work on different architectures can live independently. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-12 13:35:07 -04:00
Thomas Gleixner	2a1d446019	kthread: Implement park/unpark facility To avoid the full teardown/setup of per cpu kthreads in the case of cpu hot(un)plug, provide a facility which allows to put the kthread into a park position and unpark it when the cpu comes online again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120716103948.236618824@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-08-13 17:01:06 +02:00
Tejun Heo	46f3d97621	kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed kthread_worker provides minimalistic workqueue-like interface for users which need a dedicated worker thread (e.g. for realtime priority). It has basic queue, flush_work, flush_worker operations which mostly match the workqueue counterparts; however, due to the way flush_work() is implemented, it has a noticeable difference of not allowing work items to be freed while being executed. While the current users of kthread_worker are okay with the current behavior, the restriction does impede some valid use cases. Also, removing this difference isn't difficult and actually makes the code easier to understand. This patch reimplements flush_kthread_work() such that it uses a flush_work item instead of queue/done sequence numbers. Signed-off-by: Tejun Heo <tj@kernel.org>	2012-07-22 10:15:28 -07:00
Tejun Heo	9a2e03d8ed	kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation Make the following two non-functional changes. * Separate out insert_kthread_work() from queue_kthread_work(). * Relocate struct kthread_flush_work and kthread_flush_work_fn() definitions above flush_kthread_work(). v2: Added lockdep_assert_held() in insert_kthread_work() as suggested by Andy Walls. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andy Walls <awalls@md.metrocast.net>	2012-07-22 10:11:01 -07:00
Tejun Heo	34b087e483	freezer: kill unused set_freezable_with_signal() There's no in-kernel user of set_freezable_with_signal() left. Mixing TIF_SIGPENDING with kernel threads can lead to nasty corner cases as kernel threads never travel signal delivery path on their own. e.g. the current implementation is buggy in the cancelation path of __thaw_task(). It calls recalc_sigpending_and_wake() in an attempt to clear TIF_SIGPENDING but the function never clears it regardless of sigpending state. This means that signallable freezable kthreads may continue executing with !freezing() && stuck TIF_SIGPENDING, which can be troublesome. This patch removes set_freezable_with_signal() along with PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer. User tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious sigpending is dealt with in the usual signal delivery path. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com>	2011-11-23 09:28:17 -08:00
Tejun Heo	8a32c441c1	freezer: implement and use kthread_freezable_should_stop() Writeback and thinkpad_acpi have been using thaw_process() to prevent deadlock between the freezer and kthread_stop(); unfortunately, this is inherently racy - nothing prevents freezing from happening between thaw_process() and kthread_stop(). This patch implements kthread_freezable_should_stop() which enters refrigerator if necessary but is guaranteed to return if kthread_stop() is invoked. Both thaw_process() users are converted to use the new function. Note that this deadlock condition exists for many of freezable kthreads. They need to be converted to use the new should_stop or freezable workqueue. Tested with synthetic test case. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br> Cc: Jens Axboe <axboe@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com>	2011-11-21 12:32:23 -08:00
Paul Gortmaker	9984de1a5a	kernel: Map most files to use export.h instead of module.h The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 09:20:12 -04:00
KOSAKI Motohiro	1e1b6c511d	cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed The rule is, we have to update tsk->rt.nr_cpus_allowed if we change tsk->cpus_allowed. Otherwise RT scheduler may confuse. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-28 17:02:57 +02:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Eric Dumazet	207205a2ba	kthread: NUMA aware kthread_create_on_node() All kthreads being created from a single helper task, they all use memory from a single node for their kernel stack and task struct. This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter to parameters already used by kthread_create(). This parameter serves in allocating memory for the new kthread on its memory node if possible. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-22 17:44:01 -07:00
Peter Zijlstra	c9b5f501ef	sched: Constify function scope static struct sched_param usage Function-scope statics are discouraged because they are easily overlooked and can cause subtle bugs/races due to their global (non-SMP safe) nature. Linus noticed that we did this for sched_param - at minimum make the const. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: Message-ID: <AANLkTinotRxScOHEb0HgFgSpGPkq_6jKTv5CfvnQM=ee@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 15:55:45 +01:00
Ingo Molnar	27066fd484	Merge commit 'v2.6.37' into sched/core Merge reason: Merge the final .37 tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-05 14:14:46 +01:00
Yong Zhang	4f32e9b1f8	kthread_work: make lockdep happy spinlock in kthread_worker and wait_queue_head in kthread_work both should be lockdep sensible, so change the interface to make it suiltable for CONFIG_LOCKDEP. tj: comment update Reported-by: Nicolas <nicolas.mailhot@laposte.net> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Andy Walls <awalls@md.metrocast.net> Tested-by: Andy Walls <awalls@md.metrocast.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-22 10:27:53 +01:00
KOSAKI Motohiro	fe7de49f9d	sched: Make sched_param argument static in sched_setscheduler() callers Andrew Morton pointed out almost all sched_setscheduler() callers are using fixed parameters and can be converted to static. It reduces runtime memory use a little. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: James Morris <jmorris@namei.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-23 17:56:48 +02:00
Tejun Heo	82805ab77d	kthread: implement kthread_data() Implement kthread_data() which takes @task pointing to a kthread and returns @data specified when creating the kthread. The caller is responsible for ensuring the validity of @task when calling this function. Signed-off-by: Tejun Heo <tj@kernel.org>	2010-06-29 10:07:09 +02:00
Tejun Heo	b56c0d8937	kthread: implement kthread_worker Implement simple work processor for kthread. This is to ease using kthread. Single thread workqueue used to be used for things like this but workqueue won't guarantee fixed kthread association anymore to enable worker sharing. This can be used in cases where specific kthread association is necessary, for example, when it should have RT priority or be assigned to certain cgroup. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>	2010-06-29 10:07:09 +02:00
Miao Xie	5ab116c934	cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node cpuset_mem_spread_node() returns an offline node, and causes an oops. This patch fixes it by initializing task->mems_allowed to node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing memory hotplug. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Reported-by: Nick Piggin <npiggin@suse.de> Tested-by: Nick Piggin <npiggin@suse.de> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-24 16:31:21 -07:00
Anton Blanchard	301ba0457f	kthread, sched: Remove reference to kthread_create_on_cpu kthread_create_on_cpu doesn't exist so update a comment in kthread.c to reflect this. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100209040740.GB3702@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-02-09 11:47:39 +01:00
Peter Zijlstra	881232b70b	sched: Move kthread_bind() back to kthread.c Since kthread_bind() lost its dependencies on sched.c, move it back where it came from. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170518.039524041@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-16 19:01:57 +01:00
Mike Galbraith	b84ff7d6f1	sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c Eric Paris reported that commit `f685ceacab` causes boot time PREEMPT_DEBUG complaints. [ 4.590699] BUG: using smp_processor_id() in preemptible [00000000] code: rmmod/1314 [ 4.593043] caller is task_hot+0x86/0xd0 Since kthread_bind() messes with scheduler internals, move the body to sched.c, and lock the runqueue. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Tested-by: Eric Paris <eparis@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1256813310.7574.3.camel@marge.simson.net> [ v2: fix !SMP build and clean up ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-03 07:25:00 +01:00
Mike Galbraith	61cbe54d94	sched: Keep kthreads at default priority Removes kthread/workqueue priority boost, they increase worst-case desktop latencies. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-09 17:30:06 +02:00
Oleg Nesterov	9ae260270c	update the comment in kthread_stop() Commit `63706172f3` ("kthreads: rework kthread_stop()") removed the limitation that the thread function mysr not call do_exit() itself, but forgot to update the comment. Since that commit it is OK to use kthread_stop() even if kthread can exit itself. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-27 12:15:46 -07:00
Oleg Nesterov	63706172f3	kthreads: rework kthread_stop() Based on Eric's patch which in turn was based on my patch. kthread_stop() has the nasty problems: - it runs unpredictably long with the global semaphore held. - it deadlocks if kthread itself does kthread_stop() before it obeys the kthread_should_stop() request. - it is not useable if kthread exits on its own, see for example the ugly "wait_to_die:" hack in migration_thread() - it is not possible to just tell kthread it should stop, we must always wait for its exit. With this patch kthread() allocates all neccesary data (struct kthread) on its own stack, globals kthread_stop_xxx are deleted. ->vfork_done is used as a pointer into "struct kthread", this means kthread_stop() can easily wait for kthread's exit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:54 -07:00
Oleg Nesterov	cdd140bdd6	kthreads: simplify the startup synchronization We use two completions two create the kernel thread, this is a bit ugly. kthread() wakes up create_kthread() via ->started, then create_kthread() wakes up the caller kthread_create() via ->done. But kthread() does not need to wait for kthread(), it can just return. Instead kthread() itself can wake up the caller of kthread_create(). Kill kthread_create_info->started, ->done is enough. This improves the scalability a bit and sijmplifies the code. The only problem if kernel_thread() fails, in that case create_kthread() must do complete(&create->done). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:53 -07:00
Miao Xie	58568d2a82	cpuset,mm: update tasks' mems_allowed in time Fix allocating page cache/slab object on the unallowed node when memory spread is set by updating tasks' mems_allowed after its cpuset's mems is changed. In order to update tasks' mems_allowed in time, we must modify the code of memory policy. Because the memory policy is applied in the process's context originally. After applying this patch, one task directly manipulates anothers mems_allowed, and we use alloc_lock in the task_struct to protect mems_allowed and memory policy of the task. But in the fast path, we didn't use lock to protect them, because adding a lock may lead to performance regression. But if we don't add a lock,the task might see no nodes when changing cpuset's mems_allowed to some non-overlapping set. In order to avoid it, we set all new allowed nodes, then clear newly disallowed ones. [lee.schermerhorn@hp.com: The rework of mpol_new() to extract the adjusting of the node mask to apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind() with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local allocation. Fix this by adding the check for MPOL_PREFERRED and empty node mask to mpol_new_mpolicy(). Remove the now unneeded 'nodes = NULL' from mpol_new(). Note that mpol_new_mempolicy() is always called with a non-NULL 'nodes' parameter now that it has been removed from mpol_new(). Therefore, we don't need to test nodes for NULL before testing it for 'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to verify this assumption.] [lee.schermerhorn@hp.com: I don't think the function name 'mpol_new_mempolicy' is descriptive enough to differentiate it from mpol_new(). This function applies cpuset set context, usually constraining nodes to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag is set, it also translates the nodes. So I settled on 'mpol_set_nodemask()', because the comment block for mpol_new() mentions that we need to call this function to "set nodes". Some additional minor line length, whitespace and typo cleanup.] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Paul Menage <menage@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:31 -07:00
Steven Rostedt	ad8d75fff8	tracing/events: move trace point headers into include/trace/events Impact: clean up Create a sub directory in include/trace called events to keep the trace point headers in their own separate directory. Only headers that declare trace points should be defined in this directory. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-14 22:05:43 -04:00
Steven Rostedt	a8d154b009	tracing: create automated trace defines This patch lowers the number of places a developer must modify to add new tracepoints. The current method to add a new tracepoint into an existing system is to write the trace point macro in the trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or DECLARE_TRACE, then they must add the same named item into the C file with the macro DEFINE_TRACE(name) and then add the trace point. This change cuts out the needing to add the DEFINE_TRACE(name). Every file that uses the tracepoint must still include the trace/<type>.h file, but the one C file must also add a define before the including of that file. #define CREATE_TRACE_POINTS #include <trace/mytrace.h> This will cause the trace/mytrace.h file to also produce the C code necessary to implement the trace point. Note, if more than one trace/<type>.h is used to create the C code it is best to list them all together. #define CREATE_TRACE_POINTS #include <trace/foo.h> #include <trace/bar.h> #include <trace/fido.h> Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with the cleaner solution of the define above the includes over my first design to have the C code include a "special" header. This patch converts sched, irq and lockdep and skb to use this new method. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-14 12:57:28 -04:00

1 2

81 Commits