Commit Graph

48624 Commits

Author SHA1 Message Date
Ben Hutchings
7bbf3cae65 ipv4: Remove inet_lro library
There are no longer any in-tree drivers that use it.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 16:15:46 -05:00
Yuval Mintz
fc48b7a614 qed/qede: use 8.7.3.0 FW.
This patch moves the qed* driver into utilizing the 8.7.3.0 FW.
This new FW is required for a lot of new SW features, including:
  - Vlan filtering offload
  - Encapsulation offload support
  - HW ingress aggregations
As well as paving the way for the possibility of adding storage protocols
in the future.

V2:
 - Fix kbuild test robot error/warnings.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 16:04:42 -05:00
John Fastabend
1c78c64e9c net: add tc offload feature flag
Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
a1b7c5fd7f net: sched: add cls_u32 offload hooks for netdevs
This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:36 -05:00
John Fastabend
16e5cc6471 net: rework setup_tc ndo op to consume general tc operand
This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
John Fastabend
e4c6734eaa net: rework ndo tc op to consume additional qdisc handle parameter
The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
David S. Miller
6cd21d7941 Merge tag 'wireless-drivers-next-for-davem-2016-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:

====================
Major changes:

wl12xx

* add device tree support for SPI

mwifiex

* add debugfs file to read chip information
* add MSIx support for newer pcie chipsets (8997 onwards)
* add schedule scan support
* add WoWLAN net-detect support
* firmware dump support for w8997 chipset

iwlwifi

* continue the work on multiple Rx queues
* add support for beacon storing used in low power states
* use the regular firmware image of WoWLAN
* fix 8000 devices for Big Endian machines
* more firmware debug hooks
* add support for P2P Client snoozing
* make the beacon filtering for AP mode configurable
* fix transmit queues overflow with LSO

libertas

* add support for setting power save via cfg80211
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:38:29 -05:00
Keller, Jacob E
d4ab428627 ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops
incorrectly allow SCHANNELS when it would conflict with the settings
from SRXFH. This occurs because it is not possible for drivers to
understand whether their Rx flow indirection table has been configured
or is in the default state. In addition, drivers currently behave in
various ways when increasing the number of Rx channels.

Some drivers will always destroy the Rx flow indirection table when this
occurs, whether it has been set by the user or not. Other drivers will
attempt to preserve the table even if the user has never modified it
from the default driver settings. Neither of these situation is
desirable because it leads to unexpected behavior or loss of user
configuration.

The correct behavior is to simply return -EINVAL when SCHANNELS would
conflict with the current Rx flow table settings. However, it should
only do so if the current settings were modified by the user. If we
required that the new settings never conflict with the current (default)
Rx flow settings, we would force users to first reduce their Rx flow
settings and then reduce the number of Rx channels.

This patch proposes a solution implemented in net/core/ethtool.c which
ensures that all drivers behave correctly. It checks whether the RXFH
table has been configured to non-default settings, and stores this
information in a private netdev flag. When the number of channels is
requested to change, it first ensures that the current Rx flow table is
not going to assign flows to now disabled channels.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 15:19:49 -05:00
Edward Cree
e8ae7b000e Documentation/networking: add checksum-offloads.txt to explain LCO
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-12 05:52:16 -05:00
Edward Cree
179bc67f69 net: local checksum offload for encapsulation
The arithmetic properties of the ones-complement checksum mean that a
 correctly checksummed inner packet, including its checksum, has a ones
 complement sum depending only on whatever value was used to initialise
 the checksum field before checksumming (in the case of TCP and UDP,
 this is the ones complement sum of the pseudo header, complemented).
Consequently, if we are going to offload the inner checksum with
 CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
 packed data not covered by the inner checksum, and the initial value of
 the inner checksum field.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-12 05:52:15 -05:00
Jesper Dangaard Brouer
15fad714be net: bulk free SKBs that were delay free'ed due to IRQ context
The network stack defers SKBs free, in-case free happens in IRQ or
when IRQs are disabled. This happens in __dev_kfree_skb_irq() that
writes SKBs that were free'ed during IRQ to the softirq completion
queue (softnet_data.completion_queue).

These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ
in function net_tx_action().  Take advantage of this a use the skb
defer and flush API, as we are already in softirq context.

For modern drivers this rarely happens. Although most drivers do call
dev_kfree_skb_any(), which detects the situation and calls
__dev_kfree_skb_irq() when needed.  This due to netpoll can call from
IRQ context.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 11:59:09 -05:00
Jesper Dangaard Brouer
795bb1c00d net: bulk free infrastructure for NAPI context, use napi_consume_skb
Discovered that network stack were hitting the kmem_cache/SLUB
slowpath when freeing SKBs.  Doing bulk free with kmem_cache_free_bulk
can speedup this slowpath.

NAPI context is a bit special, lets take advantage of that for bulk
free'ing SKBs.

In NAPI context we are running in softirq, which gives us certain
protection.  A softirq can run on several CPUs at once.  BUT the
important part is a softirq will never preempt another softirq running
on the same CPU.  This gives us the opportunity to access per-cpu
variables in softirq context.

Extend napi_alloc_cache (before only contained page_frag_cache) to be
a struct with a small array based stack for holding SKBs.  Introduce a
SKB defer and flush API for accessing this.

Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any()
when running in NAPI context.  A small trick to handle/detect if we
are called from netpoll is to see if budget is 0.  In that case, we
need to invoke dev_consume_skb_irq().

Joint work with Alexander Duyck.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 11:59:09 -05:00
Nikolay Borisov
165094afce igmp: Namespacify igmp_qrv sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 09:59:22 -05:00
Nikolay Borisov
87a8a2ae65 igmp: Namespaceify igmp_llm_reports sysctl knob
This was initially introduced in df2cf4a78e ("IGMP: Inhibit
reports for local multicast groups") by defining the sysctl in the
ipv4_net_table array, however it was never implemented to be
namespace aware. Fix this by changing the code accordingly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 09:59:22 -05:00
Nikolay Borisov
166b6b2d6f igmp: Namespaceify igmp_max_msf sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 09:59:22 -05:00
Nikolay Borisov
815c527007 igmp: Namespaceify igmp_max_memberships sysctl knob
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 09:59:22 -05:00
Alexander Duyck
08b64fcca9 net: Store checksum result for offloaded GSO checksums
This patch makes it so that we can offload the checksums for a packet up
to a certain point and then begin computing the checksums via software.
Setting this up is fairly straight forward as all we need to do is reset
the values stored in csum and csum_start for the GSO context block.

One complication for this is remote checksum offload.  In order to allow
the inner checksums to be offloaded while computing the outer checksum
manually we needed to have some way of indicating that the offload wasn't
real.  In order to do that I replaced CHECKSUM_PARTIAL with
CHECKSUM_UNNECESSARY in the case of us computing checksums for the outer
header while skipping computing checksums for the inner headers.  We clean
up the ip_summed flag and set it to either CHECKSUM_PARTIAL or
CHECKSUM_NONE once we hand the packet off to the next lower level.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 08:55:33 -05:00
Alexander Duyck
7644345622 net: Move GSO csum into SKB_GSO_CB
This patch moves the checksum maintained by GSO out of skb->csum and into
the GSO context block in order to allow for us to work on outer checksums
while maintaining the inner checksum offsets in the case of the inner
checksum being offloaded, while the outer checksums will be computed.

While updating the code I also did a minor cleanu-up on gso_make_checksum.
The change is mostly to make it so that we store the values and compute the
checksum instead of computing the checksum and then storing the values we
needed to update.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 08:55:33 -05:00
Johannes Berg
7a02bf892d ipv6: add option to drop unsolicited neighbor advertisements
In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_unsolicited_na".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:36 -05:00
Johannes Berg
abbc30436d ipv6: add option to drop unicast encapsulated in L2 multicast
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:36 -05:00
Craig Gallek
d9b3fca273 tcp: __tcp_hdrlen() helper
tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
This splits the size calculation into a helper function that can be
used if a struct tcphdr is already available.

Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 03:54:14 -05:00
Yuchung Cheng
ddf1af6fa0 tcp: new delivery accounting
This patch changes the accounting of how many packets are
newly acked or sacked when the sender receives an ACK.

The current approach basically computes

   newly_acked_sacked = (prior_packets - prior_sacked) -
                        (tp->packets_out - tp->sacked_out)

   where prior_packets and prior_sacked out are snapshot
   at the beginning of the ACK processing.

The new approach tracks the delivery information via a new
TCP state variable "delivered" which monotically increases
as new packets are delivered in order or out-of-order.

The reason for this change is that the current approach is
brittle that produces negative or inaccurate estimate.

   1) For non-SACK connections, an ACK that advances the SND.UNA
   could reset the DUPACK counters (tp->sacked_out) in
   tcp_process_loss() or tcp_fastretrans_alert(). This inflates
   the inflight suddenly and causes under-estimate or even
   negative estimate. Here is a real example:

                   before   after (processing ACK)
   packets_out     75       73
   sacked_out      23        0
   ca state        Loss     Open

   The old approach computes (75-23) - (73 - 0) = -21 delivered
   while the new approach computes 1 delivered since it
   considers the 2nd-24th packets are delivered OOO.

   2) MSS change would re-count packets_out and sacked_out so
   the estimate is in-accurate and can even become negative.
   E.g., the inflight is doubled when MSS is halved.

   3) Spurious retransmission signaled by DSACK is not accounted

The new approach is simpler and more robust. For SACK connections,
tp->delivered increments as packets are being acked or sacked in
SACK and ACK processing.

For non-sack connections, it's done in tcp_remove_reno_sacks() and
tcp_add_reno_sack(). When an ACK advances the SND.UNA, tp->delivered
is incremented by the number of packets ACKed (less the current
number of DUPACKs received plus one packet hole).  Upon receiving
a DUPACK, tp->delivered is incremented assuming one out-of-order
packet is delivered.

Upon receiving a DSACK, tp->delivered is incremtened assuming one
retransmission is delivered in tcp_sacktag_write_queue().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07 14:09:51 -05:00
Rafał Miłecki
61dba73cdb bcma: add support for BCM47094
It's another SoC with 32 GPIOs and simplified watchdog handling. It was
tested on D-Link DIR-885L.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-02-06 13:46:59 +02:00
Rafał Miłecki
b3c47afbf5 bcma: support PMU present as separated bus core
On recent Broadcom chipsets PMU is present as separated core and it
can't be accessed using ChipCommon anymore as it fails with e.g.:
[    0.000577] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf1000604

Solve it by using a new (PMU) core pointer set to ChipCommon or PMU
depending on the hardware capabilities.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-02-06 13:36:11 +02:00
Rafał Miłecki
67edf354fa bcma: use _PMU_ in all names of PMU registers
PMU (Power Management Unit) seems to be a separated piece of hardware,
just accessed using ChipCommon core registers. In recent Broadcom
chipsets PMU is not bounded to CC but available as separated core.

To make code cleaner & easier to review (for a correct R/W access) use
clearer names.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-02-06 13:36:11 +02:00
Rafał Miłecki
7267bcda33 bcma: identify bus cores (devices) found on BCM47189
Add missing defines and print proper names.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-02-06 13:34:50 +02:00
Alexei Starovoitov
15a07b3381 bpf: add lookup/update support for per-cpu hash and array maps
The functions bpf_map_lookup_elem(map, key, value) and
bpf_map_update_elem(map, key, value, flags) need to get/set
values from all-cpus for per-cpu hash and array maps,
so that user space can aggregate/update them as necessary.

Example of single counter aggregation in user space:
  unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
  long values[nr_cpus];
  long value = 0;

  bpf_lookup_elem(fd, key, values);
  for (i = 0; i < nr_cpus; i++)
    value += values[i];

The user space must provide round_up(value_size, 8) * nr_cpus
array to get/set values, since kernel will use 'long' copy
of per-cpu values to try to copy good counters atomically.
It's a best-effort, since bpf programs and user space are racing
to access the same memory.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 03:34:36 -05:00
Alexei Starovoitov
a10423b87a bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map
Primary use case is a histogram array of latency
where bpf program computes the latency of block requests or other
events and stores histogram of latency into array of 64 elements.
All cpus are constantly running, so normal increment is not accurate,
bpf_xadd causes cache ping-pong and this per-cpu approach allows
fastest collision-free counters.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 03:34:36 -05:00
Kim Jones
ba905f5e2f ethtool: Declare netdev_rss_key as __read_mostly.
netdev_rss_key is written to once and thereafter is read by
drivers when they are initialising. The fact that it is mostly
read and not written to makes it a candidate for a __read_mostly
declaration.

Signed-off-by: Kim Jones <kim-marie.jones@intel.com>
Signed-off-by: Alan Carey <alan.carey@intel.com>
Acked-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 03:13:49 -05:00
Jarod Wilson
bb63daf9ef team: track sum of rx_nohandler for all slaves
CC: Jiri Pirko <jiri@resnulli.us>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 02:59:51 -05:00
Jarod Wilson
6e7333d315 net: add rx_nohandler stat counter
This adds an rx_nohandler stat counter, along with a sysfs statistics
node, and copies the counter out via netlink as well.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 02:59:51 -05:00
Linus Torvalds
34229b2774 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This looks like a lot but it's a mixture of regression fixes as well
  as fixes for longer standing issues.

   1) Fix on-channel cancellation in mac80211, from Johannes Berg.

   2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
      module, from Eric Dumazet.

   3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
      Dumazet.

   4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
      bound, from Craig Gallek.

   5) GRO key comparisons don't take lightweight tunnels into account,
      from Jesse Gross.

   6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
      Dumazet.

   7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
      register them, otherwise the NEWLINK netlink message is missing
      the proper attributes.  From Thadeu Lima de Souza Cascardo.

   8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
      Schimmel

   9) Handle fragments properly in ipv4 easly socket demux, from Eric
      Dumazet.

  10) Don't ignore the ifindex key specifier on ipv6 output route
      lookups, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
  tcp: avoid cwnd undo after receiving ECN
  irda: fix a potential use-after-free in ircomm_param_request
  net: tg3: avoid uninitialized variable warning
  net: nb8800: avoid uninitialized variable warning
  net: vxge: avoid unused function warnings
  net: bgmac: clarify CONFIG_BCMA dependency
  net: hp100: remove unnecessary #ifdefs
  net: davinci_cpdma: use dma_addr_t for DMA address
  ipv6/udp: use sticky pktinfo egress ifindex on connect()
  ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
  netlink: not trim skb for mmaped socket when dump
  vxlan: fix a out of bounds access in __vxlan_find_mac
  net: dsa: mv88e6xxx: fix port VLAN maps
  fib_trie: Fix shift by 32 in fib_table_lookup
  net: moxart: use correct accessors for DMA memory
  ipv4: ipconfig: avoid unused ic_proto_used symbol
  bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
  bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
  bnxt_en: Ring free response from close path should use completion ring
  net_sched: drr: check for NULL pointer in drr_dequeue
  ...
2016-02-01 15:56:08 -08:00
Linus Torvalds
29a8ea4fbe Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved
     area for storing a struct page array.

  2/ Fixes for dax operations on a raw block device to prevent pagecache
     collisions with dax mappings.

  3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null
     pointer de-reference.

  These have received build success notification from the kbuild robot
  across 153 configs and pass the latest ndctl tests"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  phys_to_pfn_t: use phys_addr_t
  mm: fix pfn_t to page conversion in vm_insert_mixed
  block: use DAX for partition table reads
  block: revert runtime dax control of the raw block device
  fs, block: force direct-I/O for dax-enabled block devices
  devm_memremap_pages: fix vmem_altmap lifetime + alignment handling
  libnvdimm, pfn: fix restoring memmap location
  libnvdimm: fix mode determination for e820 devices
2016-02-01 15:21:20 -08:00
Linus Torvalds
54e3f3e302 Merge tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
 "Here are some small tty/serial driver fixes for 4.5-rc2.

  They resolve a number of reported problems (the ioctl one specifically
  has been pointed out by numerous people) and one patch adds some new
  device ids for the 8250_pci driver.  All have been in linux-next
  successfully"

* tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_pci: Add Intel Broadwell ports
  staging/speakup: Use tty_ldisc_ref() for paste kworker
  n_tty: Fix unsafe reference to "other" ldisc
  tty: Fix unsafe ldisc reference via ioctl(TIOCGETD)
  tty: Retry failed reopen if tty teardown in-progress
  tty: Wait interruptibly for tty lock on reopen
2016-01-31 17:09:39 -08:00
Linus Torvalds
dc799d0179 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "The timer departement delivers:

   - a regression fix for the NTP code along with a proper selftest
   - prevent a spurious timer interrupt in the NOHZ lowres code
   - a fix for user space interfaces returning the remaining time on
     architectures with CONFIG_TIME_LOW_RES=y
   - a few patches to fix COMPILE_TEST fallout"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/nohz: Set the correct expiry when switching to nohz/lowres mode
  clocksource: Fix dependencies for archs w/o HAS_IOMEM
  clocksource: Select CLKSRC_MMIO where needed
  tick/sched: Hide unused oneshot timer code
  kselftests: timers: Add adjtimex SETOFFSET validity tests
  ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
  itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
  posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
  timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
  hrtimer: Handle remaining time proper for TIME_LOW_RES
  clockevents/tcb_clksrc: Prevent disabling an already disabled clock
2016-01-31 15:49:06 -08:00
Linus Torvalds
29d14f0835 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "This is much bigger than typical fixes, but Peter found a category of
  races that spurred more fixes and more debugging enhancements.  Work
  started before the merge window, but got finished only now.

  Aside of that this contains the usual small fixes to perf and tools.
  Nothing particular exciting"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  perf: Remove/simplify lockdep annotation
  perf: Synchronously clean up child events
  perf: Untangle 'owner' confusion
  perf: Add flags argument to perf_remove_from_context()
  perf: Clean up sync_child_event()
  perf: Robustify event->owner usage and SMP ordering
  perf: Fix STATE_EXIT usage
  perf: Update locking order
  perf: Remove __free_event()
  perf/bpf: Convert perf_event_array to use struct file
  perf: Fix NULL deref
  perf/x86: De-obfuscate code
  perf/x86: Fix uninitialized value usage
  perf: Fix race in perf_event_exit_task_context()
  perf: Fix orphan hole
  perf stat: Do not clean event's private stats
  perf hists: Fix HISTC_MEM_DCACHELINE width setting
  perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
  perf tests: Remove wrong semicolon in while loop in CQM test
  perf: Synchronously free aux pages in case of allocation failure
  ...
2016-01-31 15:38:27 -08:00
Linus Torvalds
30e4c9ad04 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull IRQ fixes from Ingo Molnar:
 "Mostly irqchip driver fixes, but also an irq core crash fix and a
  build fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mxs: Add missing set_handle_irq()
  irqchip/atmel-aic: Fix wrong bit operation for IRQ priority
  irqchip/gic-v3-its: Recompute the number of pages on page size change
  base: Export platform_msi_domain_[alloc,free]_irqs
  of: MSI: Simplify irqdomain lookup
  irqdomain: Allow domain lookup with DOMAIN_BUS_WIRED token
  irqchip: Fix dependencies for archs w/o HAS_IOMEM
  irqchip/s3c24xx: Mark init_eint as __maybe_unused
  genirq: Validate action before dereferencing it in handle_irq_event_percpu()
2016-01-31 14:48:58 -08:00
Dan Williams
76e9f0ee52 phys_to_pfn_t: use phys_addr_t
A dma_addr_t is potentially smaller than a phys_addr_t on some archs.
Don't truncate the address when doing the pfn conversion.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Matthew Wilcox <willy@linux.intel.com>
[willy: fix pfn_t_to_phys as well]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-31 09:10:19 -08:00
Dan Williams
d1a5f2b4d8 block: use DAX for partition table reads
Avoid populating pagecache when the block device is in DAX mode.
Otherwise these page cache entries collide with the fsync/msync
implementation and break data durability guarantees.

Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30 13:35:32 -08:00
Dan Williams
9f4736fe7c block: revert runtime dax control of the raw block device
Dynamically enabling DAX requires that the page cache first be flushed
and invalidated.  This must occur atomically with the change of DAX mode
otherwise we confuse the fsync/msync tracking and violate data
durability guarantees.  Eliminate the possibilty of DAX-disabled to
DAX-enabled transitions for now and revisit this for the next cycle.

Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30 13:35:31 -08:00
Dan Williams
65f87ee718 fs, block: force direct-I/O for dax-enabled block devices
Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
to a block-device special file.  Both regular files and device special
files can use the common filp->f_mapping->host lookup to determing is
DAX is enabled.

Otherwise, we confuse the DAX code that does not expect to find live
data in the page cache:

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
    __delete_from_page_cache+0x9f6/0xb60()
    Modules linked in:
    CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
     00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
     ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
     ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
    Call Trace:
     [<     inline     >] __dump_stack lib/dump_stack.c:15
     [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
     [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
     [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
     [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
     [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
     [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
     [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
     [<     inline     >] wp_pfn_shared mm/memory.c:2208
     [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
     [<     inline     >] handle_pte_fault mm/memory.c:3323
     [<     inline     >] __handle_mm_fault mm/memory.c:3417
     [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
     [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
     [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
     [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
     [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
     [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
    arch/x86/entry/entry_64.S:185
    ---[ end trace dae21e0f85f1f98c ]---

Fixes: 5a023cdba5 ("block: enable dax for raw block devices")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30 13:33:59 -08:00
Linus Torvalds
ad233aca21 Merge branch 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb patchlet from Konrad Rzeszutek Wilk:
 "One trivial patch.

  Another patch (from Fengguang) is already in your tree courtesy of
  Andrew Morton - but I would prefer not to rebase my tree.  Hence the
  diff is very small"

* 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Make linux/swiotlb.h standalone includible
  MAINTAINERS: add git URL for swiotlb
2016-01-29 15:19:42 -08:00
Linus Torvalds
2973737012 Merge branch 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm
Pull cleancache cleanups from Konrad Rzeszutek Wilk:
 "Simple cleanups"

* 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
  include/linux/cleancache.h: Clean up code
  cleancache: constify cleancache_ops structure
2016-01-29 15:13:48 -08:00
Linus Torvalds
f6a239a927 Merge tag 'iommu-fixes-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
 "Five patches queued up:

   - Two patches for the AMD and Intel IOMMU drivers to fix alias
     handling and ATS handling.

   - Fix build error with arm io-pgtable code

   - Two documentation fixes"

* tag 'iommu-fixes-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Update struct iommu_ops comments
  iommu/vt-d: Fix link to Intel IOMMU Specification
  iommu/amd: Correct the wrong setting of alias DTE in do_attach
  iommu/vt-d: Don't skip PCI devices when disabling IOTLB
  iommu/io-pgtable-arm: Fix io-pgtable-arm build failure
2016-01-29 15:05:49 -08:00
Magnus Damm
0d9bacb6c8 iommu: Update struct iommu_ops comments
Update the comments around struct iommu_ops to match
current state and fix a few typos while at it.

Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-01-29 12:34:29 +01:00
Peter Zijlstra
c6e5b73242 perf: Synchronously clean up child events
The orphan cleanup workqueue doesn't always catch orphans, for example,
if they never schedule after they are orphaned. IOW, the event leak is
still very real. It also wouldn't work for kernel counters.

Doing it synchonously is a little hairy due to lock inversion issues,
but is made to work.

Patch based on work by Alexander Shishkin.

Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:35 +01:00
Alexei Starovoitov
e03e7ee34f perf/bpf: Convert perf_event_array to use struct file
Robustify refcounting.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160126045947.GA40151@ast-mbp.thefacebook.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 08:35:25 +01:00
Linus Torvalds
26cd83670f Merge tag 'trace-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull minor tracing fixes from Steven Rostedt:
 "This includes three minor fixes, mostly due to cut-and-paste issues.

  The first is a cut and paste issue that changed the amount of stack to
  skip when tracing a stack dump from 0 to 6, which basically made the
  stack disappear for small stack traces.

  The second fix is just removing an unused field in a struct that is no
  longer used, and currently just wastes space.

  The third is another cut-and-paste fix that had a tracepoint recording
  the wrong field (it was recording the previous field a second time)"

* tag 'trace-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/dma-buf/fence: Fix timeline str value on fence_annotate_wait_on
  ftrace: Remove unused nr_trampolines var
  tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()
2016-01-28 17:00:50 -08:00
Chen Gang
a39bb9a055 include/linux/cleancache.h: Clean up code
Let cleancache_fs_enabled() call cleancache_fs_enabled_mapping()
directly.

Remove redundant variable ret in cleancache_get_page().

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-27 09:10:29 -05:00
Julia Lawall
b3c6de492b cleancache: constify cleancache_ops structure
The cleancache_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-27 09:09:57 -05:00