Commit Graph

49637 Commits

Author SHA1 Message Date
Maor Gottlieb
d63cd28608 net/mlx5: Add user chosen levels when allocating flow tables
Currently, consumers of the flow steering infrastructure can't
choose their own flow table levels and are limited to one
flow table per level. This just waste levels.
Instead, we introduce here the possibility to use multiple
flow tables in a level. The user is free to connect these
flow tables, while following the rule (FTEs in FT of level x
could only point to FTs of level y where y > x).

In addition this patch switch the order of the create/destroy
flow tables of the NIC(vlan and main).

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:09 -04:00
Maor Gottlieb
d745098ced net/mlx5: Introduce modify flow rule destination
This API is used for modifying the flow rule destination.
This is needed for modifying the pointed flow table by the
traffic type classifier rules to point on the aRFS tables.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:08 -04:00
Nikolay Aleksandrov
f4b05d27ec net: constify is_skb_forwardable's arguments
is_skb_forwardable is not supposed to change anything so constify its
arguments

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:13:36 -04:00
Soheil Hassas Yeganeh
0a2cf20c3f tcp: remove SKBTX_ACK_TSTAMP since it is redundant
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when
the timestamp of the TCP acknowledgement should be reported on
error queue. Since accessing skb_shinfo is likely to incur a
cache-line miss at the time of receiving the ack, the
txstamp_ack bit was added in tcp_skb_cb, which is set iff
the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
SKBTX_ACK_TSTAMP flag redundant.

Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
everywhere.

Note that this frees one bit in shinfo->tx_flags.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:06:10 -04:00
Eric Dumazet
501e7ef569 net-rfs: fix false sharing accessing sd->input_queue_head
sd->input_queue_head is incremented for each processed packet
in process_backlog(), and read from other cpus performing
Out Of Order avoidance in get_rps_cpu()

Moving this field in a separate cache line keeps it mostly
hot for the cpu in process_backlog(), as other cpus will
only read it.

In a stress test, process_backlog() was consuming 6.80 % of cpu cycles,
and the patch reduced the cost to 0.65 %

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 21:55:45 -04:00
David S. Miller
c0cc53162a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes in the conflicts.

In the macsec case, the change of the default ID macro
name overlapped with the 64-bit netlink attribute alignment
fixes in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 15:43:10 -04:00
Linus Torvalds
f28f20da70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig
    Gallak.

 2) Bug fixes for the new macsec facility (missing kmalloc NULL checks,
    missing locking around netdev list traversal, etc.) from Sabrina
    Dubroca.

 3) Fix handling of host routes on ifdown in ipv6, from David Ahern.

 4) Fix double-fdput in bpf verifier.  From Jann Horn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
  bpf: fix double-fdput in replace_map_fd_with_map_ptr()
  net: ipv6: Delete host routes on an ifdown
  Revert "ipv6: Revert optional address flusing on ifdown."
  net/mlx4_en: fix spurious timestamping callbacks
  net: dummy: remove note about being Y by default
  cxgbi: fix uninitialized flowi6
  ipv6: Revert optional address flusing on ifdown.
  ipv4/fib: don't warn when primary address is missing if in_dev is dead
  net/mlx5: Add pci shutdown callback
  net/mlx5_core: Remove static from local variable
  net/mlx5e: Use vport MTU rather than physical port MTU
  net/mlx5e: Fix minimum MTU
  net/mlx5e: Device's mtu field is u16 and not int
  net/mlx5_core: Add ConnectX-5 to list of supported devices
  net/mlx5e: Fix MLX5E_100BASE_T define
  net/mlx5_core: Fix soft lockup in steering error flow
  qlcnic: Update version to 5.3.64
  net: stmmac: socfpga: Remove re-registration of reset controller
  macsec: fix netlink attribute validation
  macsec: add missing macsec prefix in uapi
  ...
2016-04-26 16:25:51 -07:00
Linus Torvalds
8ead9dd547 devpts: more pty driver interface cleanups
This is more prep-work for the upcoming pty changes.  Still just code
cleanup with no actual semantic changes.

This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode.  That allows us to remove all the "look up the dentry" code
for when we want to remove it again.

Together with moving the tty pointer from "inode->i_private" to
"dentry->d_fsdata" and getting rid of pointless inode locking, this
removes about 30 lines of code.  Not only is the end result smaller,
it's simpler and easier to understand.

The old code, for example, depended on the d_find_alias() to not just
find the dentry, but also to check that it is still hashed, which in
turn validated the tty pointer in the inode.

That is a _very_ roundabout way to say "invalidate the cached tty
pointer when the dentry is removed".

The new code just does

	dentry->d_fsdata = NULL;

in devpts_pty_kill() instead, invalidating the tty pointer rather more
directly and obviously.  Don't do something complex and subtle when the
obvious straightforward approach will do.

The rest of the patch (ie apart from code deletion and the above tty
pointer clearing) is just switching the calling convention to pass the
dentry or file pointer around instead of the inode.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-26 15:47:32 -07:00
Saeed Mahameed
1b223dd391 net/mlx5e: Fix checksum handling for non-stripped vlan packets
Now as rx-vlan offload can be disabled, packets can be received
with vlan tag not stripped, which means is_first_ethertype_ip will
return false, for that we need to check if the hardware reported
csum OK so we will report CHECKSUM_UNNECESSARY for those packets.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:03 -04:00
Gal Pressman
363501145e net/mlx5e: Add ethtool support for rxvlan-offload (vlan stripping)
Use ethtool -K <interface> rxvlan <on/off> to enable/disable
C-TAG vlan stripping by hardware.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Gal Pressman
bb64143eee net/mlx5e: Add ethtool support for dump module EEPROM
Add query MCIA, PMLP registers infrastructure and commands.
Add ethtool support for get_module_info() and get_module_eeprom()
callbacks.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Gal Pressman
da54d24ec3 net/mlx5e: Add ethtool support for interface identify (LED blinking)
Add the needed hardware command and mlx5_ifc structs for managing LED
control.
Add set_phys_id ethtool callback to support ethtool -p flag.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Eran Ben Elisha
94cb1ebbaf net/mlx5e: Add support for RXALL netdev feature
Introduce new access register named Ports Check Mask Register (PCMR) to
control all HW checks on port. With this register, the driver can
enable/disable Hardware FCS validation.

When RXALL is enabled/disabled using ndo_set_features, enable/disable
fcs check at HW.
User can change HW configuration using rx-all flag at ethtool.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Gal Pressman
121fcdc84d net/mlx5e: Add link down events counter
Expose link_down_events counter through ethtool -S.
This counter is read from PPort statistics, then proccessed and stored as
a special handling software counter.
This counter is stored along software counters since it is the only PPort
counter that it's size is not 64 bits.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:01 -04:00
Gal Pressman
9218b44dcc net/mlx5e: Statistics handling refactoring
Redesign ethtool statistics handling and reporting in the driver:
1. Move counters to a separate file (en_stats.h).
2. Remove unnecessary dependencies between stats and strings.
3. Use counter descriptors which hold a name and offset for each counter,
   and will be used to decide which counters will be exposed.

For example when adding a new software counter to ethtool, instead of:
1. Add to stats struct.
2. Add to strings struct in the same order.
3. Change macro defining number of software counters.
The only thing needed is to link the new counter to a counter descriptor.

VPort counters are a set of hardware traffic counters created automatically
for each virtual port opened.
PPort counters are a set of counters describing per physical port
performance statistics.
These counters are gathered from hardware register and divided to groups
according to different protocols.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:01 -04:00
Florian Westphal
f0cdf76c10 net: remove NETDEV_TX_LOCKED support
No more users in the tree, remove NETDEV_TX_LOCKED support.
Adds another hole in softnet_stats struct, but better than keeping
the unused collision counter around.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:53:05 -04:00
David S. Miller
c0b0479307 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2016-04-26

Here's another set of Bluetooth & 802.15.4 patches for the 4.7 kernel:

 - Cleanups & refactoring of ieee802154 & 6lowpan code
 - Security related additions to ieee802154 and mrf24j40 driver
 - Memory corruption fix to Bluetooth 6lowpan code
 - Race condition fix in vhci driver
 - Enhancements to the atusb 802.15.4 driver

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 13:15:56 -04:00
David S. Miller
6a923934c3 Revert "ipv6: Revert optional address flusing on ifdown."
This reverts commit 841645b5f2.

Ok, this puts the feature back.  I've decided to apply David A.'s
bug fix and run with that rather than make everyone wait another
whole release for this feature.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 11:47:41 -04:00
Sowmini Varadhan
6fa01ccd88 skbuff: Add pskb_extract() helper function
A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.

The existing code uses the sequence
	clone = skb_clone(..);
	pskb_pull(clone, off, ..);
	pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().

To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25 16:54:14 -04:00
Yuval Mintz
fe7cd2bfda qed*: Conditions for changing link
There's some inconsistency in current logic determining whether the
link settings of a given interface can be changed; I.e., in all modes
other than the so-called `deault' mode the interfaces are forbidden from
changing the configuration - but even this rule is not applied to all
user APIs that may change the configuration.

Instead, let the core-module [qed] decide whether an interface can change
the configuration by supporting a new API function. We also revise the
current rule, allowing all interfaces to change their configurations while
laying the infrastructure for future modes where an interface would be
blocked from making such a configuration.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25 15:59:16 -04:00
Yuval Mintz
d4967cf38f qed*: Align statistics names
There's a difference in statsitics' names starting at qed and
propagating to qede, where egress counters indicate ranges while ingress
counters indiciate high-end.
Align all statistcs to follow the same conventions - name indicates range.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25 15:59:15 -04:00
David S. Miller
841645b5f2 ipv6: Revert optional address flusing on ifdown.
This reverts the following three commits:

70af921db6
799977d9aa
f1705ec197

The feature was ill conceived, has terrible semantics, and has added
nothing but regressions to the already fragile ipv6 stack.

Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25 15:33:55 -04:00
Nicolas Dichtel
a558da0916 ieee802154: use nla_put_u64_64bit()
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-25 15:09:11 -04:00
Majd Dibbiny
5fc7197d3a net/mlx5: Add pci shutdown callback
This patch introduces kexec support for mlx5.
When switching kernels, kexec() calls shutdown, which unloads
the driver and cleans its resources.

In addition, remove unregister netdev from shutdown flow. This will
allow a clean shutdown, even if some netdev clients did not release their
reference from this netdev. Releasing The HW resources only is enough as
the kernel is shutting down

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:39 -04:00
Saeed Mahameed
cd255efff9 net/mlx5e: Use vport MTU rather than physical port MTU
Set and report vport MTU rather than physical MTU,
Driver will set both vport and physical port mtu and will
rely on the query of vport mtu.

SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.

Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:39 -04:00
Saeed Mahameed
046339eaab net/mlx5e: Device's mtu field is u16 and not int
For set/query MTU port firmware commands the MTU field
is 16 bits, here I changed all the "int mtu" parameters
of the functions wrapping those firmware commands to be u16.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:38 -04:00
David S. Miller
11afbff861 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, mostly from Florian Westphal to sort out the lack of sufficient
validation in x_tables and connlabel preparation patches to add
nf_tables support. They are:

1) Ensure we don't go over the ruleset blob boundaries in
   mark_source_chains().

2) Validate that target jumps land on an existing xt_entry. This extra
   sanitization comes with a performance penalty when loading the ruleset.

3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.

4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.

5) Make sure the minimal possible target size in x_tables.

6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.

7) Check that standard target size is valid.

8) More sanitization to ensure that the target_offset field is correct.

9) Add xt_check_entry_match() to validate that matches are well-formed.

10-12) Three patch to reduce the number of parameters in
    translate_compat_table() for {arp,ip,ip6}tables by using a container
    structure.

13) No need to return value from xt_compat_match_from_user(), so make
    it void.

14) Consolidate translate_table() so it can be used by compat code too.

15) Remove obsolete check for compat code, so we keep consistent with
    what was already removed in the native layout code (back in 2007).

16) Get rid of target jump validation from mark_source_chains(),
    obsoleted by #2.

17) Introduce xt_copy_counters_from_user() to consolidate counter
    copying, and use it from {arp,ip,ip6}tables.

18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump
    functions.

19) Move nf_connlabel_match() to xt_connlabel.

20) Skip event notification if connlabel did not change.

21) Update of nf_connlabels_get() to make the upcoming nft connlabel
    support easier.

23) Remove spinlock to read protocol state field in conntrack.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 00:12:08 -04:00
Linus Torvalds
913f201083 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
 "Specifics in this pull request:

   - Fixes in mediatek and OF thermal drivers

   - Fixes in power_allocator governor

   - More fixes of unsigned to int type change in thermal_core.c.

  These change have been CI tested using KernelCI bot. \o/"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: fix Mediatek thermal controller build
  thermal: consistently use int for trip temp
  thermal: fix mtk_thermal build dependency
  thermal: minor mtk_thermal.c cleanups
  thermal: power_allocator: req_range multiplication should be a 64 bit type
  thermal: of: add __init attribute
2016-04-23 17:15:39 -07:00
Nicolas Dichtel
e9bbe898cb libnl: nla_put_net64(): align on a 64-bit area
nla_data() is now aligned on a 64-bit area.

The temporary function nla_put_be64_32bit() is removed in this patch.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23 20:13:24 -04:00
David S. Miller
1602f49b58 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23 18:51:33 -04:00
Linus Torvalds
c5edde3a81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix memory leak in iwlwifi, from Matti Gottlieb.

 2) Add missing registration of netfilter arp_tables into initial
    namespace, from Florian Westphal.

 3) Fix potential NULL deref in DecNET routing code.

 4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry
    Ivanov.

 5) Fix dst ref counting in VRF, from David Ahern.

 6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck.

 7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause.

 8) Ravalidate IPV6 datagram socket cached routes properly, particularly
    with UDP, from Martin KaFai Lau.

 9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang.

10) Fix stats typing in bcmgenet driver, from Eric Dumazet.

11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing,
    from Joe Stringer.

12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown.

13) atl2 doesn't actually support scatter-gather, so don't advertise the
    feature.  From Ben Hucthings.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
  openvswitch: use flow protocol when recalculating ipv6 checksums
  Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
  atl2: Disable unimplemented scatter/gather feature
  net/mlx4_en: Split SW RX dropped counter per RX ring
  net/mlx4_core: Don't allow to VF change global pause settings
  net/mlx4_core: Avoid repeated calls to pci enable/disable
  net/mlx4_core: Implement pci_resume callback
  net: phy: spi_ks8895: Don't leak references to SPI devices
  net: ethernet: davinci_emac: Fix platform_data overwrite
  net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
  qede: Fix single MTU sized packet from firmware GRO flow
  qede: Fix setting Skb network header
  qede: Fix various memory allocation error flows for fastpath
  tcp: Merge tx_flags and tskey in tcp_shifted_skb
  tcp: Merge tx_flags and tskey in tcp_collapse_retrans
  drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
  tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
  openvswitch: Orphan skbs before IPv6 defrag
  Revert "Prevent NUll pointer dereference with two PHYs on cpsw"
  VSOCK: Only check error on skb_recv_datagram when skb is NULL
  ...
2016-04-21 12:57:34 -07:00
Hannes Frederic Sowa
681e683ff3 geneve: break dependency with netdev drivers
Equivalent to "vxlan: break dependency with netdev drivers", don't
autoload geneve module in case the driver is loaded. Instead make the
coupling weaker by using netdevice notifiers as proxy.

Cc: Jesse Gross <jesse@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:35:44 -04:00
Hannes Frederic Sowa
b7aade1548 vxlan: break dependency with netdev drivers
Currently all drivers depend and autoload the vxlan module because how
vxlan_get_rx_port is linked into them. Remove this dependency:

By using a new event type in the netdevice notifier call chain we proxy
the request from the drivers to flush and resetup the vxlan ports not
directly via function call but by the already existing netdevice
notifier call chain.

I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so.
We don't need to save those ids, as the event type field is an unsigned
long and using specialized event types for this purpose seemed to be a
more elegant way. This also comes in beneficial if in future we want to
add offloading knobs for vxlan.

Cc: Jesse Gross <jesse@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:35:44 -04:00
Tariq Toukan
461017cb00 net/mlx5e: Support RX multi-packet WQE (Striding RQ)
Introduce the feature of multi-packet WQE (RX Work Queue Element)
referred to as (MPWQE or Striding RQ), in which WQEs are larger
and serve multiple packets each.

Every WQE consists of many strides of the same size, every received
packet is aligned to a beginning of a stride and is written to
consecutive strides within a WQE.

In the regular approach, each regular WQE is big enough to be capable
of serving one received packet of any size up to MTU or 64K in case of
device LRO is enabled, making it very wasteful when dealing with
small packets or device LRO is enabled.

For its flexibility, MPWQE allows a better memory utilization
(implying improvements in CPU utilization and packet rate) as packets
consume strides according to their size, preserving the rest of
the WQE to be available for other packets.

MPWQE default configuration:
	Num of WQEs	= 16
	Strides Per WQE = 2048
	Stride Size	= 64 byte

The default WQEs memory footprint went from 1024*mtu (~1.5MB) to
16 * 2048 * 64 = 2MB per ring.
However, HW LRO can now be supported at no additional cost in memory
footprint, and hence we turn it on by default and get an even better
performance.

Performance tested on ConnectX4-Lx 50G.
To isolate the feature under test, the numbers below were measured with
HW LRO turned off. We verified that the performance just improves when
LRO is turned back on.

* Netperf single TCP stream:
- BW raised by 10-15% for representative packet sizes:
  default, 64B, 1024B, 1478B, 65536B.

* Netperf multi TCP stream:
- No degradation, line rate reached.

* Pktgen: packet rate raised by 2-10% for traffic of different message
sizes: 64B, 128B, 256B, 1024B, and 1500B.

* Pktgen: packet loss in bursts of small messages (64byte),
single stream:
- | num packets | packets loss before | packets loss after
  |     2K      |       ~ 1K          |       0
  |     8K      |       ~ 6K          |       0
  |     16K     |       ~13K          |       0
  |     32K     |       ~28K          |       0
  |     64K     |       ~57K          |     ~24K

As expected as the driver can receive as many small packets (<=64B) as
the number of total strides in the ring (default = 2048 * 16) vs. 1024
(default ring size regardless of packets size) before this feature.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan
237cd21809 net/mlx5: Introduce device queue counters
A queue counter can collect several statistics for one or more
hardware queues (QPs, RQs, etc ..) that the counter is attached to.

For Ethernet it will provide an "out of buffer" counter which
collects the number of all packets that are dropped due to lack
of software buffers.

Here we add device commands to alloc/query/dealloc queue counters.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Daniel Jurgens
4bfd2e6e53 net/mlx4_core: Avoid repeated calls to pci enable/disable
Maintain the PCI status and provide wrappers for enabling and disabling
the PCI device.  Performing the actions more than once without doing
its opposite results in warning logs.

This occurred when EEH hotplugged the device causing a warning for
disabling an already disabled device.

Fixes: 2ba5fbd62b ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:02:40 -04:00
Alexander Duyck
b1c20f0b97 netdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE
This patch folds NETIF_F_ALL_TSO into the bitmask for NETIF_F_GSO_SOFTWARE.
The idea is to avoid duplication of defines since the only difference
between the two was the GSO_UDP bit.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:14:58 -04:00
Alexei Starovoitov
85b67bcb7e perf, bpf: minimize the size of perf_trace_() tracepoint handler
move trace_call_bpf() into helper function to minimize the size
of perf_trace_*() tracepoint handlers.
    text	   data	    bss	    dec	 	   hex	filename
10541679	5526646	2945024	19013349	1221ee5	vmlinux_before
10509422	5526646	2945024	18981092	121a0e4	vmlinux_after

It may seem that perf_fetch_caller_regs() can also be moved,
but that is incorrect, since ip/sp will be wrong.

bpf+tracepoint performance is not affected, since
perf_swevent_put_recursion_context() is now inlined.
export_symbol_gpl can also be dropped.

No measurable change in normal perf tracepoints.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:48:20 -04:00
Wei Ni
1d0fd42fa3 thermal: consistently use int for trip temp
The commit 17e8351a77 consistently use int for temperature,
however it missed a few in trip temperature and thermal_core.

In current codes, the trip->temperature used "unsigned long"
and zone->temperature used"int", if the temperature is negative
value, it will get wrong result when compare temperature with
trip temperature.

This patch can fix it.

Signed-off-by: Wei Ni <wni@nvidia.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-20 20:31:14 -07:00
Daniel Borkmann
bd570ff970 bpf: add event output helper for notifications/sampling/logging
This patch adds a new helper for cls/act programs that can push events
to user space applications. For networking, this can be f.e. for sampling,
debugging, logging purposes or pushing of arbitrary wake-up events. The
idea is similar to a43eec3042 ("bpf: introduce bpf_perf_event_output()
helper") and 39111695b1 ("samples: bpf: add bpf_perf_event_output example").

The eBPF program utilizes a perf event array map that user space populates
with fds from perf_event_open(), the eBPF program calls into the helper
f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw))
so that the raw data is pushed into the fd f.e. at the map index of the
current CPU.

User space can poll/mmap/etc on this and has a data channel for receiving
events that can be post-processed. The nice thing is that since the eBPF
program and user space application making use of it are tightly coupled,
they can define their own arbitrary raw data format and what/when they
want to push.

While f.e. packet headers could be one part of the meta data that is being
pushed, this is not a substitute for things like packet sockets as whole
packet is not being pushed and push is only done in a single direction.
Intention is more of a generically usable, efficient event pipe to applications.
Workflow is that tc can pin the map and applications can attach themselves
e.g. after cls/act setup to one or multiple map slots, demuxing is done by
the eBPF program.

Adding this facility is with minimal effort, it reuses the helper
introduced in a43eec3042 ("bpf: introduce bpf_perf_event_output() helper")
and we get its functionality for free by overloading its BPF_FUNC_ identifier
for cls/act programs, ctx is currently unused, but will be made use of in
future. Example will be added to iproute2's BPF example files.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19 20:26:11 -04:00
Konstantin Khlebnikov
607ea7cda6 net/ipv6/addrconf: simplify sysctl registration
Struct ctl_table_header holds pointer to sysctl table which could be used
for freeing it after unregistration. IPv4 sysctls already use that.
Remove redundant NULL assignment: ndev allocated using kzalloc.

This also saves some bytes: sysctl table could be shorter than
DEVCONF_MAX+1 if some options are disable in config.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19 20:13:19 -04:00
Linus Torvalds
9a0e3eea25 Merge branch 'ptmx-cleanup'
Merge the ptmx internal interface cleanup branch.

This doesn't change semantics, but it should be a sane basis for
eventually getting the multi-instance devpts code into some sane shape
where we can get rid of the kernel config option.  Which we can
hopefully get done next merge window..

* ptmx-cleanup:
  devpts: clean up interface to pty drivers
2016-04-19 16:36:18 -07:00
Linus Torvalds
12566cc35d Merge tag 'pci-v4.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
 "These are fixes for two issues:

   - The VPD parsing code we added for v4.6 keeps some devices from
     crashing, but also keeps cxgb4 from reading non-standard extra VPD
     data that is relies on.  Hariprasad added a way for the driver to
     specify how much VPD is valid.

   - The i.MX6 active-low reset GPIO support we added in v4.5 caused
     regressions on some boards, so we're reverting that.

  VPD:
    Add pci_set_vpd_size() (Hariprasad Shenai)
    cxgb4: Set VPD size so we can read both VPD structures (Hariprasad Shenai)

  Freescale i.MX6 host bridge driver:
    Revert "PCI: imx6: Add support for active-low reset GPIO" (Fabio Estevam)"

* tag 'pci-v4.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  cxgb4: Set VPD size so we can read both VPD structures
  PCI: Add pci_set_vpd_size() to set VPD size
  Revert "PCI: imx6: Add support for active-low reset GPIO"
2016-04-18 19:52:47 -07:00
Linus Torvalds
67245ff332 devpts: clean up interface to pty drivers
This gets rid of the horrible notion of having that

    struct inode *ptmx_inode

be the linchpin of the interface between the pty code and devpts.

By de-emphasizing the ptmx inode, a lot of things actually get cleaner,
and we will have a much saner way forward.  In particular, this will
allow us to associate with any particular devpts instance at open-time,
and not be artificially tied to one particular ptmx inode.

The patch itself is actually fairly straightforward, and apart from some
locking and return path cleanups it's pretty mechanical:

 - the interfaces that devpts exposes all take "struct pts_fs_info *"
   instead of "struct inode *ptmx_inode" now.

   NOTE! The "struct pts_fs_info" thing is a completely opaque structure
   as far as the pty driver is concerned: it's still declared entirely
   internally to devpts. So the pty code can't actually access it in any
   way, just pass it as a "cookie" to the devpts code.

 - the "look up the pts fs info" is now a single clear operation, that
   also does the reference count increment on the pts superblock.

   So "devpts_add/del_ref()" is gone, and replaced by a "lookup and get
   ref" operation (devpts_get_ref(inode)), along with a "put ref" op
   (devpts_put_ref()).

 - the pty master "tty->driver_data" field now contains the pts_fs_info,
   not the ptmx inode.

 - because we don't care about the ptmx inode any more as some kind of
   base index, the ref counting can now drop the inode games - it just
   gets the ref on the superblock.

 - the pts_fs_info now has a back-pointer to the super_block. That's so
   that we can easily look up the information we actually need. Although
   quite often, the pts fs info was actually all we wanted, and not having
   to look it up based on some magical inode makes things more
   straightforward.

In particular, now that "devpts_get_ref(inode)" operation should really
be the *only* place we need to look up what devpts instance we're
associated with, and we do it exactly once, at ptmx_open() time.

The other side of this is that one ptmx node could now be associated
with multiple different devpts instances - you could have a single
/dev/ptmx node, and then have multiple mount namespaces with their own
instances of devpts mounted on /dev/pts/.  And that's all perfectly sane
in a model where we just look up the pts instance at open time.

This will eventually allow us to get rid of our odd single-vs-multiple
pts instance model, but this patch in itself changes no semantics, only
an internal binding model.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-18 13:43:02 -07:00
Philippe Reynes
2d55173e71 phy: add generic function to support ksetting support
The old ethtool api (get_setting and set_setting) has
generic phy functions phy_ethtool_sset and phy_ethtool_gset.
To supprt the new ethtool api (get_link_ksettings and
set_link_ksettings), we add generic phy function
phy_ethtool_ksettings_get and phy_ethtool_ksettings_set.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18 14:45:08 -04:00
Philippe Reynes
6d62b4d5fa net: ethtool: export conversion function between u32 and link mode
The function convert_legacy_u32_to_link_mode and
convert_link_mode_to_legacy_u32 may be used outside
of ethtool.c. We rename them to ethtool_convert_...
and export them, so we could use them in others
drivers and modules.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18 14:45:08 -04:00
Linus Torvalds
b62dabfb4b Merge tag 'usb-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
 "Here are some small USB fixes for 4.6-rc4.

  Mostly xhci fixes for reported issues, a UAS bug that has hit a number
  of people, including stable tree users, and a few other minor things.

  All have been in linux-next for a while with no reported issues"

* tag 'usb-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: hcd: out of bounds access in for_each_companion
  USB: uas: Add a new NO_REPORT_LUNS quirk
  USB: uas: Limit qdepth at the scsi-host level
  doc: usb: Fix typo in gadget_multi documentation
  usb: host: xhci-plat: Make enum xhci_plat_type start at a non zero value
  xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers
  usb: xhci: fix wild pointers in xhci_mem_cleanup
  usb: host: xhci-plat: fix cannot work if R-Car Gen2/3 run on above 4GB phys
  usb: host: xhci: add a new quirk XHCI_NO_64BIT_SUPPORT
  xhci: resume USB 3 roothub first
  usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host
  cdc-acm: fix crash if flushed with nothing buffered
2016-04-16 20:48:14 -07:00
Alexander Duyck
756ca87441 netdev_features: Add NETIF_F_TSO_MANGLEID to NETIF_F_ALL_TSO
I realized that when I added NETIF_F_TSO_MANGLEID as a TSO type I forgot to
add it to NETIF_F_ALL_TSO.  This patch corrects that so the flag will be
included correctly.

The result should be minor as it was only used by a few drivers and in a
few specific cases such as when NETIF_F_SG was not supported on a device so
the TSO flags were cleared.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16 19:14:56 -04:00
Linus Torvalds
f3c9a1abbe Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Ross Zwisler:
 "Two fixes:

   - Fix memcpy_from_pmem() to fallback to memcpy() for architectures
     where CONFIG_ARCH_HAS_PMEM_API=n.

   - Add a comment explaining why we write data twice when clearing
     poison in pmem_do_bvec().

  This has passed a boot test on an X86_32 config, which was the
  architecture where issue #1 above was first noticed"

Dan Williams adds:
 "We're giving this multi-maintainer setup a shot, so expect libnvdimm
  pull requests from either Ross or I going forward"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, pmem: clarify the write+clear_poison+write flow
  pmem: fix BUG() error in pmem.h:48 on X86_32
2016-04-15 15:34:27 -07:00
Xin Long
52c52a61a3 sctp: add sctp_info dump api for sctp_diag
sctp_diag will dump some important details of sctp's assoc or ep, we use
sctp_info to describe them,  sctp_get_sctp_info to get them, and export
it to sctp_diag.ko.

v2->v3:
- we will not use list_for_each_safe in sctp_get_sctp_info, cause
  all the callers of it will use lock_sock.

- fix the holes in struct sctp_info with __reserved* field.
  because sctp_diag is a new feature, and sctp_info is just for now,
  it may be changed in the future.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15 17:29:35 -04:00