Commit Graph

41 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
ace7d3bc60 Merge 5.15.59 into android13-5.15-lts
Changes in 5.15.59
	Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
	Revert "ocfs2: mount shared volume without ha stack"
	ntfs: fix use-after-free in ntfs_ucsncmp()
	fs: sendfile handles O_NONBLOCK of out_fd
	secretmem: fix unhandled fault in truncate
	mm: fix page leak with multiple threads mapping the same page
	hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
	asm-generic: remove a broken and needless ifdef conditional
	s390/archrandom: prevent CPACF trng invocations in interrupt context
	nouveau/svm: Fix to migrate all requested pages
	drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
	watch_queue: Fix missing rcu annotation
	watch_queue: Fix missing locking in add_watch_to_object()
	tcp: Fix data-races around sysctl_tcp_dsack.
	tcp: Fix a data-race around sysctl_tcp_app_win.
	tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
	tcp: Fix a data-race around sysctl_tcp_frto.
	tcp: Fix a data-race around sysctl_tcp_nometrics_save.
	tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
	ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
	ice: do not setup vlan for loopback VSI
	scsi: ufs: host: Hold reference returned by of_parse_phandle()
	Revert "tcp: change pingpong threshold to 3"
	octeontx2-pf: Fix UDP/TCP src and dst port tc filters
	tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.
	tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
	tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
	scsi: core: Fix warning in scsi_alloc_sgtables()
	scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
	net: ping6: Fix memleak in ipv6_renew_options().
	ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
	net/tls: Remove the context from the list in tls_device_down
	igmp: Fix data-races around sysctl_igmp_qrv.
	net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii
	net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
	tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
	tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
	tcp: Fix a data-race around sysctl_tcp_autocorking.
	tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
	Documentation: fix sctp_wmem in ip-sysctl.rst
	macsec: fix NULL deref in macsec_add_rxsa
	macsec: fix error message in macsec_add_rxsa and _txsa
	macsec: limit replay window size with XPN
	macsec: always read MACSEC_SA_ATTR_PN as a u64
	net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()
	net: mld: fix reference count leak in mld_{query | report}_work()
	tcp: Fix data-races around sk_pacing_rate.
	net: Fix data-races around sysctl_[rw]mem(_offset)?.
	tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
	tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.
	tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
	tcp: Fix data-races around sysctl_tcp_reflect_tos.
	ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.
	i40e: Fix interface init with MSI interrupts (no MSI-X)
	sctp: fix sleep in atomic context bug in timer handlers
	octeontx2-pf: cn10k: Fix egress ratelimit configuration
	netfilter: nf_queue: do not allow packet truncation below transport header offset
	virtio-net: fix the race between refill work and close
	perf symbol: Correct address for bss symbols
	sfc: disable softirqs for ptp TX
	sctp: leave the err path free in sctp_stream_init to sctp_stream_free
	ARM: crypto: comment out gcc warning that breaks clang builds
	mm/hmm: fault non-owner device private entries
	page_alloc: fix invalid watermark check on a negative value
	ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
	EDAC/ghes: Set the DIMM label unconditionally
	docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
	locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
	x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
	Linux 5.15.59

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I00a5e80067e6edfcce68cc0dd54d9bf54c9dbfab
2022-08-15 18:28:55 +02:00
Greg Kroah-Hartman
817780c598 Merge 5.15.56 into android13-5.15-lts
Changes in 5.15.56
	ALSA: hda - Add fixup for Dell Latitidue E5430
	ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model
	ALSA: hda/realtek: Fix headset mic for Acer SF313-51
	ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671
	ALSA: hda/realtek: fix mute/micmute LEDs for HP machines
	ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221
	ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
	xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
	fix race between exit_itimers() and /proc/pid/timers
	mm: userfaultfd: fix UFFDIO_CONTINUE on fallocated shmem pages
	mm: split huge PUD on wp_huge_pud fallback
	tracing/histograms: Fix memory leak problem
	net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer
	ip: fix dflt addr selection for connected nexthop
	ARM: 9213/1: Print message about disabled Spectre workarounds only once
	ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction
	wifi: mac80211: fix queue selection for mesh/OCB interfaces
	cgroup: Use separate src/dst nodes when preloading css_sets for migration
	btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
	drm/panfrost: Put mapping instead of shmem obj on panfrost_mmu_map_fault_addr() error
	drm/panfrost: Fix shrinker list corruption by madvise IOCTL
	fs/remap: constrain dedupe of EOF blocks
	nilfs2: fix incorrect masking of permission flags for symlinks
	sh: convert nommu io{re,un}map() to static inline functions
	Revert "evm: Fix memleak in init_desc"
	xfs: only run COW extent recovery when there are no live extents
	xfs: don't include bnobt blocks when reserving free block pool
	xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks
	xfs: drop async cache flushes from CIL commits.
	reset: Fix devm bulk optional exclusive control getter
	ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count
	spi: amd: Limit max transfer and message size
	ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle
	ARM: 9210/1: Mark the FDT_FIXED sections as shareable
	net/mlx5e: kTLS, Fix build time constant test in TX
	net/mlx5e: kTLS, Fix build time constant test in RX
	net/mlx5e: Fix enabling sriov while tc nic rules are offloaded
	net/mlx5e: Fix capability check for updating vnic env counters
	net/mlx5e: Ring the TX doorbell on DMA errors
	drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
	ima: Fix a potential integer overflow in ima_appraise_measurement
	ASoC: sgtl5000: Fix noise on shutdown/remove
	ASoC: tas2764: Add post reset delays
	ASoC: tas2764: Fix and extend FSYNC polarity handling
	ASoC: tas2764: Correct playback volume range
	ASoC: tas2764: Fix amp gain register offset & default
	ASoC: Intel: Skylake: Correct the ssp rate discovery in skl_get_ssp_clks()
	ASoC: Intel: Skylake: Correct the handling of fmt_config flexible array
	net: stmmac: dwc-qos: Disable split header for Tegra194
	net: ethernet: ti: am65-cpsw: Fix devlink port register sequence
	sysctl: Fix data races in proc_dointvec().
	sysctl: Fix data races in proc_douintvec().
	sysctl: Fix data races in proc_dointvec_minmax().
	sysctl: Fix data races in proc_douintvec_minmax().
	sysctl: Fix data races in proc_doulongvec_minmax().
	sysctl: Fix data races in proc_dointvec_jiffies().
	tcp: Fix a data-race around sysctl_tcp_max_orphans.
	inetpeer: Fix data-races around sysctl.
	net: Fix data-races around sysctl_mem.
	cipso: Fix data-races around sysctl.
	icmp: Fix data-races around sysctl.
	ipv4: Fix a data-race around sysctl_fib_sync_mem.
	ARM: dts: at91: sama5d2: Fix typo in i2s1 node
	ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero
	arm64: dts: broadcom: bcm4908: Fix timer node for BCM4906 SoC
	arm64: dts: broadcom: bcm4908: Fix cpu node for smp boot
	netfilter: nf_log: incorrect offset to network header
	netfilter: nf_tables: replace BUG_ON by element length check
	drm/i915/gvt: IS_ERR() vs NULL bug in intel_gvt_update_reg_whitelist()
	xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
	lockd: set fl_owner when unlocking files
	lockd: fix nlm_close_files
	tracing: Fix sleeping while atomic in kdb ftdump
	drm/i915/selftests: fix a couple IS_ERR() vs NULL tests
	drm/i915/dg2: Add Wa_22011100796
	drm/i915/gt: Serialize GRDOM access between multiple engine resets
	drm/i915/gt: Serialize TLB invalidates with GT resets
	drm/i915/uc: correctly track uc_fw init failure
	drm/i915: Require the vm mutex for i915_vma_bind()
	bnxt_en: Fix bnxt_reinit_after_abort() code path
	bnxt_en: Fix bnxt_refclk_read()
	sysctl: Fix data-races in proc_dou8vec_minmax().
	sysctl: Fix data-races in proc_dointvec_ms_jiffies().
	icmp: Fix data-races around sysctl_icmp_echo_enable_probe.
	icmp: Fix a data-race around sysctl_icmp_ignore_bogus_error_responses.
	icmp: Fix a data-race around sysctl_icmp_errors_use_inbound_ifaddr.
	icmp: Fix a data-race around sysctl_icmp_ratelimit.
	icmp: Fix a data-race around sysctl_icmp_ratemask.
	raw: Fix a data-race around sysctl_raw_l3mdev_accept.
	tcp: Fix a data-race around sysctl_tcp_ecn_fallback.
	ipv4: Fix data-races around sysctl_ip_dynaddr.
	nexthop: Fix data-races around nexthop_compat_mode.
	net: ftgmac100: Hold reference returned by of_get_child_by_name()
	net: stmmac: fix leaks in probe
	ima: force signature verification when CONFIG_KEXEC_SIG is configured
	ima: Fix potential memory leak in ima_init_crypto()
	drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.
	drm/amd/pm: Prevent divide by zero
	sfc: fix use after free when disabling sriov
	ceph: switch netfs read ops to use rreq->inode instead of rreq->mapping->host
	seg6: fix skb checksum evaluation in SRH encapsulation/insertion
	seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
	seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
	sfc: fix kernel panic when creating VF
	net: atlantic: remove deep parameter on suspend/resume functions
	net: atlantic: remove aq_nic_deinit() when resume
	KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
	net/tls: Check for errors in tls_device_init
	ACPI: video: Fix acpi_video_handles_brightness_key_presses()
	mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
	btrfs: rename btrfs_bio to btrfs_io_context
	btrfs: zoned: fix a leaked bioc in read_zone_info
	ksmbd: use SOCK_NONBLOCK type for kernel_accept()
	powerpc/xive/spapr: correct bitmap allocation size
	vdpa/mlx5: Initialize CVQ vringh only once
	vduse: Tie vduse mgmtdev and its device
	virtio_mmio: Add missing PM calls to freeze/restore
	virtio_mmio: Restore guest page size on resume
	netfilter: br_netfilter: do not skip all hooks with 0 priority
	scsi: hisi_sas: Limit max hw sectors for v3 HW
	cpufreq: pmac32-cpufreq: Fix refcount leak bug
	platform/x86: hp-wmi: Ignore Sanitization Mode event
	firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
	firmware: sysfb: Add sysfb_disable() helper function
	fbdev: Disable sysfb device registration when removing conflicting FBs
	net: tipc: fix possible refcount leak in tipc_sk_create()
	NFC: nxp-nci: don't print header length mismatch on i2c error
	nvme-tcp: always fail a request when sending it failed
	nvme: fix regression when disconnect a recovering ctrl
	net: sfp: fix memory leak in sfp_probe()
	ASoC: ops: Fix off by one in range control validation
	pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux()
	ASoC: Realtek/Maxim SoundWire codecs: disable pm_runtime on remove
	ASoC: rt711-sdca-sdw: fix calibrate mutex initialization
	ASoC: Intel: sof_sdw: handle errors on card registration
	ASoC: rt711: fix calibrate mutex initialization
	ASoC: rt7*-sdw: harden jack_detect_handler
	ASoC: codecs: rt700/rt711/rt711-sdca: initialize workqueues in probe
	ASoC: SOF: Intel: hda-loader: Clarify the cl_dsp_init() flow
	ASoC: wcd938x: Fix event generation for some controls
	ASoC: Intel: bytcr_wm5102: Fix GPIO related probe-ordering problem
	ASoC: wm5110: Fix DRE control
	ASoC: rt711-sdca: fix kernel NULL pointer dereference when IO error
	ASoC: dapm: Initialise kcontrol data for mux/demux controls
	ASoC: cs47l15: Fix event generation for low power mux control
	ASoC: madera: Fix event generation for OUT1 demux
	ASoC: madera: Fix event generation for rate controls
	irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
	x86: Clear .brk area at early boot
	soc: ixp4xx/npe: Fix unused match warning
	ARM: dts: stm32: use the correct clock source for CEC on stm32mp151
	Revert "can: xilinx_can: Limit CANFD brp to 2"
	ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
	ALSA: usb-audio: Add quirk for Fiero SC-01
	ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0)
	nvme-pci: phison e16 has bogus namespace ids
	signal handling: don't use BUG_ON() for debugging
	USB: serial: ftdi_sio: add Belimo device ids
	usb: typec: add missing uevent when partner support PD
	usb: dwc3: gadget: Fix event pending check
	tty: serial: samsung_tty: set dma burst_size to 1
	vt: fix memory overlapping when deleting chars in the buffer
	serial: 8250: fix return error code in serial8250_request_std_resource()
	serial: stm32: Clear prev values before setting RTS delays
	serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
	serial: 8250: Fix PM usage_count for console handover
	x86/pat: Fix x86_has_pat_wp()
	drm/aperture: Run fbdev removal before internal helpers
	Linux 5.15.56

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib44efbedc8fea205005bd3ec2806ebb8cb19710d
2022-08-10 16:54:10 +02:00
Xin Long
830582c16b Documentation: fix sctp_wmem in ip-sysctl.rst
[ Upstream commit aa709da0e032cee7c202047ecd75f437bb0126ed ]

Since commit 1033990ac5 ("sctp: implement memory accounting on tx path"),
SCTP has supported memory accounting on tx path where 'sctp_wmem' is used
by sk_wmem_schedule(). So we should fix the description for this option in
ip-sysctl.rst accordingly.

v1->v2:
  - Improve the description as Marcelo suggested.

Fixes: 1033990ac5 ("sctp: implement memory accounting on tx path")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03 12:03:49 +02:00
Kuniyuki Iwashima
6b26fb2fe2 ipv4: Fix data-races around sysctl_ip_dynaddr.
[ Upstream commit e49e4aff7ec19b2d0d0957ee30e93dade57dab9e ]

While reading sysctl_ip_dynaddr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21 21:24:28 +02:00
Kuniyuki Iwashima
07b0caf8ae cipso: Fix data-races around sysctl.
[ Upstream commit dd44f04b9214adb68ef5684ae87a81ba03632250 ]

While reading cipso sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 446fda4f26 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21 21:24:21 +02:00
Maciej Żenczykowski
23a0beaf3c ANDROID: net: introduce ip_local_unbindable_ports sysctl
and associated inet_is_local_unbindable_port() helper function:
use it to make explicitly binding to an unbindable port return
-EPERM 'Operation not permitted'.

Autobind doesn't honour this new sysctl since:
  (a) you can simply set both if that's the behaviour you desire
  (b) there could be a use for preventing explicit while allowing auto
  (c) it's faster in the relatively critical path of doing port selection
      during connect() to only check one bitmap instead of both

Various ports may have special use cases which are not suitable for
use by general userspace applications. Currently, ports specified in
ip_local_reserved_ports sysctl will not be returned only in case of
automatic port assignment, but nothing prevents you from explicitly
binding to them - even from an entirely unprivileged process.

In certain cases it is desirable to prevent the host from assigning the
ports even in case of explicit binds, even from superuser processes.

Example use cases might be:
 - a port being stolen by the nic for remote serial console, remote
   power management or some other sort of debugging functionality
   (crash collection, gdb, direct access to some other microcontroller
   on the nic or motherboard, remote management of the nic itself).
 - a transparent proxy where packets are being redirected: in case
   a socket matches this connection, packets from this application
   would be incorrectly sent to one of the endpoints.

Initially I wanted to solve this problem via the simple one line:

static inline bool inet_port_requires_bind_service(struct net *net, unsigned short port) {
-       return port < net->ipv4.sysctl_ip_prot_sock;
+       return port < net->ipv4.sysctl_ip_prot_sock || inet_is_local_reserved_port(net, port);
}

However, this doesn't work for two reasons:
  (a) it changes userspace visible behaviour of the existing local
      reserved ports sysctl, and there appears to be enough documentation
      on the internet talking about setting it to make this a bad idea
  (b) it doesn't prevent privileged apps from using these ports,
      CAP_BIND_SERVICE is relatively likely to be available to, for example,
      a recursive DNS server so it can listed on port 53, which also needs
      to do src port randomization for outgoing queries due to security
      reasons (and it thus does manual port binding).

If we *know* that certain ports are simply unusable, then it's better
nothing even gets the opportunity to try to use them.  This way we at
least get a quick failure, instead of some sort of timeout (or possibly
even corruption of the data stream of the non-kernel based use case).

Test:
  vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports

  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))'
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))'
  vm:~# echo 3967 > /proc/sys/net/ipv4/ip_local_unbindable_ports
  vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports
  3967
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))'
  socket.error: (1, 'Operation not permitted')
  vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))'
  socket.error: (1, 'Operation not permitted')

Cc: Sean Tranchetti <stranche@codeaurora.org>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Linux SCTP <linux-sctp@vger.kernel.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Bug: 140404597
Change-Id: Ie96207bea90ae1345adf7b45724d0caf4d6e52c2
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
(cherry picked from commit 8a4b8ea595a515e6360dfccdb99cec2a1fed08c0)
2022-05-04 13:39:15 -07:00
David S. Miller
5af84df962 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts are simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23 16:13:06 +01:00
Wei Wang
213ad73d06 tcp: disable TFO blackhole logic by default
Multiple complaints have been raised from the TFO users on the internet
stating that the TFO blackhole logic is too aggressive and gets falsely
triggered too often.
(e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/)
Considering that most middleboxes no longer drop TFO packets, we decide
to disable the blackhole logic by setting
/proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default.

Fixes: cf1ef3f071 ("net/tcp_fastopen: Disable active side TFO in certain scenarios")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 22:50:31 -07:00
Justin Iurman
de8e80a54c ipv6: ioam: Documentation for new IOAM sysctls
Add documentation for new IOAM sysctls:
 - ioam6_id and ioam6_id_wide: two per-namespace sysctls
 - ioam6_enabled, ioam6_id and ioam6_id_wide: three per-interface sysctls

Example of IOAM configuration based on the following simple topology:

 _____              _____              _____
|     | eth0  eth0 |     | eth1  eth0 |     |
|  A  |.----------.|  B  |.----------.|  C  |
|_____|            |_____|            |_____|

1) Node and interface IDs can be configured for IOAM:

  # IOAM ID of A = 1, IOAM ID of A.eth0 = 11
  (A) sysctl -w net.ipv6.ioam6_id=1
  (A) sysctl -w net.ipv6.conf.eth0.ioam6_id=11

  # IOAM ID of B = 2, IOAM ID of B.eth0 = 21, IOAM ID of B.eth1 = 22
  (B) sysctl -w net.ipv6.ioam6_id=2
  (B) sysctl -w net.ipv6.conf.eth0.ioam6_id=21
  (B) sysctl -w net.ipv6.conf.eth1.ioam6_id=22

  # IOAM ID of C = 3, IOAM ID of C.eth0 = 31
  (C) sysctl -w net.ipv6.ioam6_id=3
  (C) sysctl -w net.ipv6.conf.eth0.ioam6_id=31

  Note that "_wide" IDs equivalents can be configured the same way.

2) Each node can be configured to form an IOAM domain. For instance,
   we allow IOAM from A to C only (not the reverse path), i.e. enable
   IOAM on ingress for B.eth0 and C.eth0:

  (B) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1
  (C) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1

3) An IOAM domain (e.g. ID=123) is defined and made known to each node:

  (A) ip ioam namespace add 123
  (B) ip ioam namespace add 123
  (C) ip ioam namespace add 123

4) Finally, an IOAM Pre-allocated Trace can be inserted in traffic sent
   by A when C (e.g. db02::2) is the destination:

  (A) ip -6 route add db02::2/128 encap ioam6 trace type 0x800000 ns 123
      size 12 dev eth0

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-21 08:14:33 -07:00
Xin Long
fea1d5b17f sctp: send the next probe immediately once the last one is acked
These is no need to wait for 'interval' period for the next probe
if the last probe is already acked in search state. The 'interval'
period waiting should be only for probe failure timeout and the
current pmtu check when it's in search complete state.

This change will shorten the probe time a lot in search state, and
also fix the document accordingly.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:58:03 -07:00
Xin Long
d1e462a7a5 sctp: add probe_interval in sysctl and sock/asoc/transport
PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'.
'n' is the interval for PLPMTUD probe timer in milliseconds, and it
can't be less than 5000 if it's not 0.

All asoc/transport's PLPMTUD in a new socket will be enabled by default.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 11:28:51 -07:00
Kuniyuki Iwashima
f9ac779f88 net: Introduce net.ipv4.tcp_migrate_req.
This commit adds a new sysctl option: net.ipv4.tcp_migrate_req. If this
option is enabled or eBPF program is attached, we will be able to migrate
child sockets from a listener to another in the same reuseport group after
close() or shutdown() syscalls.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210612123224.12525-2-kuniyu@amazon.co.jp
2021-06-15 18:01:05 +02:00
Ido Schimmel
73c2c5cbb1 ipv6: Add custom multipath hash policy
Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.

The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.

In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18 13:27:32 -07:00
Ido Schimmel
ed13923f98 ipv6: Add a sysctl to control multipath hash fields
A subsequent patch will add a new multipath hash policy where the packet
fields used for multipath hash calculation are determined by user space.
This patch adds a sysctl that allows user space to set these fields.

The packet fields are represented using a bitmask and are common between
IPv4 and IPv6 to allow user space to use the same numbering across both
protocols. For example, to hash based on standard 5-tuple:

 # sysctl -w net.ipv6.fib_multipath_hash_fields=0x0037
 net.ipv6.fib_multipath_hash_fields = 0x0037

To avoid introducing holes in 'struct netns_sysctl_ipv6', move the
'bindv6only' field after the multipath hash fields.

The kernel rejects unknown fields, for example:

 # sysctl -w net.ipv6.fib_multipath_hash_fields=0x1000
 sysctl: setting key "net.ipv6.fib_multipath_hash_fields": Invalid argument

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18 13:27:32 -07:00
Ido Schimmel
4253b4986f ipv4: Add custom multipath hash policy
Add a new multipath hash policy where the packet fields used for hash
calculation are determined by user space via the
fib_multipath_hash_fields sysctl that was introduced in the previous
patch.

The current set of available packet fields includes both outer and inner
fields, which requires two invocations of the flow dissector. Avoid
unnecessary dissection of the outer or inner flows by skipping
dissection if none of the outer or inner fields are required.

In accordance with the existing policies, when an skb is not available,
packet fields are extracted from the provided flow key. In which case,
only outer fields are considered.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18 13:27:32 -07:00
Ido Schimmel
ce5c9c20d3 ipv4: Add a sysctl to control multipath hash fields
A subsequent patch will add a new multipath hash policy where the packet
fields used for multipath hash calculation are determined by user space.
This patch adds a sysctl that allows user space to set these fields.

The packet fields are represented using a bitmask and are common between
IPv4 and IPv6 to allow user space to use the same numbering across both
protocols. For example, to hash based on standard 5-tuple:

 # sysctl -w net.ipv4.fib_multipath_hash_fields=0x0037
 net.ipv4.fib_multipath_hash_fields = 0x0037

The kernel rejects unknown fields, for example:

 # sysctl -w net.ipv4.fib_multipath_hash_fields=0x1000
 sysctl: setting key "net.ipv4.fib_multipath_hash_fields": Invalid argument

More fields can be added in the future, if needed.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18 13:27:32 -07:00
Jakub Kicinski
8203c7ce4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
 - keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
 - fix build after move to net_generic

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-17 11:08:07 -07:00
Nicolas Dichtel
292ecd9f5a doc: move seg6_flowlabel to seg6-sysctl.rst
Let's have all seg6 sysctl at the same place.

Fixes: a6dc6670cd ("ipv6: sr: Add documentation for seg_flowlabel sysctl")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:13:15 -07:00
Otto Hollmann
a7a80b17c7 net: document a side effect of ip_local_reserved_ports
If there is overlapp between ip_local_port_range and ip_local_reserved_ports with a huge reserved block, it will affect probability of selecting ephemeral ports, see file net/ipv4/inet_hashtables.c:723

    int __inet_hash_connect(
    ...
            for (i = 0; i < remaining; i += 2, port += 2) {
                    if (unlikely(port >= high))
                            port -= remaining;
                    if (inet_is_local_reserved_port(net, port))
                            continue;

    E.g. if there is reserved block of 10000 ports, two ports right after this block will be 5000 more likely selected than others.
    If this was intended, we can/should add note into documentation as proposed in this commit, otherwise we should think about different solution. One option could be mapping table of continuous port ranges. Second option could be letting user to modify step (port+=2) in above loop, e.g. using new sysctl parameter.

Signed-off-by: Otto Hollmann <otto.hollmann@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-01 15:58:36 -07:00
Andreas Roeseler
f1b8fa9fa5 net: add sysctl for enabling RFC 8335 PROBE messages
Section 8 of RFC 8335 specifies potential security concerns of
responding to PROBE requests, and states that nodes that support PROBE
functionality MUST be able to enable/disable responses and that
responses MUST be disabled by default

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-30 13:29:39 -07:00
David S. Miller
d489ded1a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-02-16 17:51:13 -08:00
Eric Dumazet
1d1be91254 tcp: fix tcp_rmem documentation
tcp_rmem[1] has been changed to 131072, we should update the documentation
to reflect this.

Fixes: a337531b94 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Zhibin Liu <zhibinliu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 14:15:47 -08:00
Jay Vosburgh
8cf5d8cc3e Documentation: networking: ip-sysctl: Document src_valid_mark sysctl
Provide documentation for src_valid_mark sysctl, which was added
in commit 28f6aeea3f ("net: restore ip source validation").

Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09 11:16:03 -08:00
Amit Cohen
6fad361ae9 IPv6: Extend 'fib_notify_on_flag_change' sysctl
Add the value '2' to 'fib_notify_on_flag_change' to allow sending
notifications only for failed route installation.

Separate value is added for such notifications because there are less of
them, so they do not impact performance and some users will find them more
important.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
648106c30a IPv4: Extend 'fib_notify_on_flag_change' sysctl
Add the value '2' to 'fib_notify_on_flag_change' to allow sending
notifications only for failed route installation.

Separate value is added for such notifications because there are less of
them, so they do not impact performance and some users will find them more
important.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
907eea4868 net: ipv6: Emit notification when fib hardware flags are changed
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but not necessarily in hardware.

The asynchronous nature of route installation in hardware can lead
to a routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

It is also possible for a route already installed in hardware to change
its action and therefore its flags. For example, a host route that is
trapping packets can be "promoted" to perform decapsulation following
the installation of an IPinIP/VXLAN tunnel.

Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed. The aim is to provide an indication to user-space
(e.g., routing daemons) about the state of the route in hardware.

Introduce a sysctl that controls this behavior.

Keep the default value at 0 (i.e., do not emit notifications) for several
reasons:
- Multiple RTM_NEWROUTE notification per-route might confuse existing
  routing daemons.
- Convergence reasons in routing daemons.
- The extra notifications will negatively impact the insertion rate.
- Not all users are interested in these notifications.

Move fib6_info_hw_flags_set() to C file because it is no longer a short
function.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:59 -08:00
Amit Cohen
680aea08e7 net: ipv4: Emit notification when fib hardware flags are changed
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel,
but not necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

It is also possible for a route already installed in hardware to change
its action and therefore its flags. For example, a host route that is
trapping packets can be "promoted" to perform decapsulation following
the installation of an IPinIP/VXLAN tunnel.

Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed. The aim is to provide an indication to user-space
(e.g., routing daemons) about the state of the route in hardware.

Introduce a sysctl that controls this behavior.

Keep the default value at 0 (i.e., do not emit notifications) for several
reasons:
- Multiple RTM_NEWROUTE notification per-route might confuse existing
  routing daemons.
- Convergence reasons in routing daemons.
- The extra notifications will negatively impact the insertion rate.
- Not all users are interested in these notifications.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:59 -08:00
Jakub Kicinski
d1e1355aef Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 14:21:31 -08:00
Vincent Bernat
3162820154 docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Link: https://lore.kernel.org/r/20210130190518.854806-1-vincent@bernat.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01 20:03:51 -08:00
Jakub Kicinski
c358f95205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/dev.c
  b552766c87 ("can: dev: prevent potential information leak in can_fill_info()")
  3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")
  0a042c6ec9 ("can: dev: move netlink related code into seperate file")

  Code move.

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  57ac4a31c4 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
  214baf2287 ("net/mlx5e: Support HTB offload")

  Adjacent code changes

net/switchdev/switchdev.c
  20776b465c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
  ffb68fc58e ("net: switchdev: remove the transaction structure from port object notifiers")
  bae33f2b5a ("net: switchdev: remove the transaction structure from port attributes")

  Transaction parameter gets dropped otherwise keep the fix.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28 17:09:31 -08:00
Praveen Chaudhary
6b2e04bc24 net: allow user to set metric on default route learned via Router Advertisement
For IPv4, default route is learned via DHCPv4 and user is allowed to change
metric using config etc/network/interfaces. But for IPv6, default route can
be learned via RA, for which, currently a fixed metric value 1024 is used.

Ideally, user should be able to configure metric on default route for IPv6
similar to IPv4. This patch adds sysctl for the same.

Logs:

For IPv4:

Config in etc/network/interfaces:
auto eth0
iface eth0 inet dhcp
    metric 4261413864

IPv4 Kernel Route Table:
$ ip route list
default via 172.21.47.1 dev eth0 metric 4261413864

FRR Table, if a static route is configured:
[In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.]
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       > - selected route, * - FIB route

S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03
K   0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m

i.e. User can prefer Default Router learned via Routing Protocol in IPv4.
Similar behavior is not possible for IPv6, without this fix.

After fix [for IPv6]:
sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705

IP monitor: [When IPv6 RA is received]
default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705  pref high

Kernel IPv6 routing table
$ ip -6 route list
default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high

FRR Table, if a static route is configured:
[In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.]
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       > - selected route, * - FIB route

S>* ::/0 [20/0] is directly connected, eth0, 00:00:06
K   ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m

If the metric is changed later, the effect will be seen only when next IPv6
RA is received, because the default route must be fully controlled by RA msg.
Below metric is changed from 1996489705 to 1996489704.

$ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704
net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704

IP monitor:
[On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric]

Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high
default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high

Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-26 18:39:45 -08:00
Pali Rohár
fc024c5c07 doc: networking: ip-sysctl: Document conf/all/disable_ipv6 and conf/default/disable_ipv6
This patch adds documentation for sysctl conf/all/disable_ipv6 and
conf/default/disable_ipv6 settings which is currently missing.

Signed-off-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20210121150244.20483-1-pali@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23 13:33:12 -08:00
Vincent Bernat
c0c5a60f0f net: evaluate net.ipvX.conf.all.ignore_routes_with_linkdown
Introduced in 0eeb075fad, the "ignore_routes_with_linkdown" sysctl
ignores a route whose interface is down. It is provided as a
per-interface sysctl. However, while a "all" variant is exposed, it
was a noop since it was never evaluated. We use the usual "or" logic
for this kind of sysctls.

Tested with:

    ip link add type veth # veth0 + veth1
    ip link add type veth # veth1 + veth2
    ip link set up dev veth0
    ip link set up dev veth1 # link-status paired with veth0
    ip link set up dev veth2
    ip link set up dev veth3 # link-status paired with veth2

    # First available path
    ip -4 addr add 203.0.113.${uts#H}/24 dev veth0
    ip -6 addr add 2001:db8:1::${uts#H}/64 dev veth0

    # Second available path
    ip -4 addr add 192.0.2.${uts#H}/24 dev veth2
    ip -6 addr add 2001:db8:2::${uts#H}/64 dev veth2

    # More specific route through first path
    ip -4 route add 198.51.100.0/25 via 203.0.113.254 # via veth0
    ip -6 route add 2001:db8:3::/56 via 2001:db8:1::ff # via veth0

    # Less specific route through second path
    ip -4 route add 198.51.100.0/24 via 192.0.2.254 # via veth2
    ip -6 route add 2001:db8:3::/48 via 2001:db8:2::ff # via veth2

    # H1: enable on "all"
    # H2: enable on "veth0"
    for v in ipv4 ipv6; do
      case $uts in
        H1)
          sysctl -qw net.${v}.conf.all.ignore_routes_with_linkdown=1
          ;;
        H2)
          sysctl -qw net.${v}.conf.veth0.ignore_routes_with_linkdown=1
          ;;
      esac
    done

    set -xe
    # When veth0 is up, best route is through veth0
    ip -o route get 198.51.100.1 | grep -Fw veth0
    ip -o route get 2001:db8:3::1 | grep -Fw veth0

    # When veth0 is down, best route should be through veth2 on H1/H2,
    # but on veth0 on H2
    ip link set down dev veth1 # down veth0
    ip route show
    [ $uts != H3 ] || ip -o route get 198.51.100.1 | grep -Fw veth0
    [ $uts != H3 ] || ip -o route get 2001:db8:3::1 | grep -Fw veth0
    [ $uts = H3 ] || ip -o route get 198.51.100.1 | grep -Fw veth2
    [ $uts = H3 ] || ip -o route get 2001:db8:3::1 | grep -Fw veth2

Without this patch, the two last lines would fail on H1 (the one using
the "all" sysctl). With the patch, everything succeeds as expected.

Also document the sysctl in `ip-sysctl.rst`.

Fixes: 0eeb075fad ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-11 16:41:30 -08:00
Xin Long
046c052b47 sctp: enable udp tunneling socks
This patch is to enable udp tunneling socks by calling
sctp_udp_sock_start() in sctp_ctrlsock_init(), and
sctp_udp_sock_stop() in sctp_ctrlsock_exit().

Also add sysctl udp_port to allow changing the listening
sock's port by users.

Wit this patch, the whole sctp over udp feature can be
enabled and used.

v1->v2:
  - Also update ctl_sock udp_port in proc_sctp_do_udp_port()
    where netns udp_port gets changed.
v2->v3:
  - Call htons() when setting sk udp_port from netns udp_port.
v3->v4:
  - Not call sctp_udp_sock_start() when new_value is 0.
  - Add udp_port entry in ip-sysctl.rst.
v4->v5:
  - Not call sctp_udp_sock_start/stop() in sctp_ctrlsock_init/exit().
  - Improve the description of udp_port in ip-sysctl.rst.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:49 -07:00
Xin Long
e8a3001c21 sctp: add encap_port for netns sock asoc and transport
encap_port is added as per netns/sock/assoc/transport, and the
latter one's encap_port inherits the former one's by default.
The transport's encap_port value would mostly decide if one
packet should go out with udp encapsulated or not.

This patch also allows users to set netns' encap_port by sysctl.

v1->v2:
  - Change to define encap_port as __be16 for sctp_sock, asoc and
    transport.
v2->v3:
  - No change.
v3->v4:
  - Add 'encap_port' entry in ip-sysctl.rst.
v4->v5:
  - Improve the description of encap_port in ip-sysctl.rst.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-30 15:24:06 -07:00
Eric Dumazet
b38e7819ca icmp: randomize the global rate limiter
Keyu Man reported that the ICMP rate limiter could be used
by attackers to get useful signal. Details will be provided
in an upcoming academic publication.

Our solution is to add some noise, so that the attackers
no longer can get help from the predictable token bucket limiter.

Fixes: 4cdf507d54 ("icmp: add a global rate limitation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Keyu Man <kman001@ucr.edu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-16 16:47:09 -07:00
Randy Dunlap
a7db3c7669 Documentation: networking: ip-sysctl: drop doubled word
Drop the doubled word "that".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-04 17:46:21 -07:00
Fernando Gont
969c54646a ipv6: Implement draft-ietf-6man-rfc4941bis
Implement the upcoming rev of RFC4941 (IPv6 temporary addresses):
https://tools.ietf.org/html/draft-ietf-6man-rfc4941bis-09

* Reduces the default Valid Lifetime to 2 days
  The number of extra addresses employed when Valid Lifetime was
  7 days exacerbated the stress caused on network
  elements/devices. Additionally, the motivation for temporary
  addresses is indeed privacy and reduced exposure. With a
  default Valid Lifetime of 7 days, an address that becomes
  revealed by active communication is reachable and exposed for
  one whole week. The only use case for a Valid Lifetime of 7
  days could be some application that is expecting to have long
  lived connections. But if you want to have a long lived
  connections, you shouldn't be using a temporary address in the
  first place. Additionally, in the era of mobile devices, general
  applications should nevertheless be prepared and robust to
  address changes (e.g. nodes swap wifi <-> 4G, etc.)

* Employs different IIDs for different prefixes
  To avoid network activity correlation among addresses configured
  for different prefixes

* Uses a simpler algorithm for IID generation
  No need to store "history" anywhere

Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 17:00:02 -07:00
Eric Dumazet
a70437cc09 tcp: add hrtimer slack to sack compression
Add a sysctl to control hrtimer slack, default of 100 usec.

This gives the opportunity to reduce system overhead,
and help very short RTT flows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 13:24:01 -07:00
Mauro Carvalho Chehab
ff159f4f11 docs: networking: convert tcp-thin.txt to ReST
Not much to be done here:

- add SPDX header;
- adjust identation, whitespaces and blank lines where needed;
- add to networking/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:56:38 -07:00
Mauro Carvalho Chehab
1cec2cacaa docs: networking: convert ip-sysctl.txt to ReST
- add SPDX header;
- adjust titles and chapters, adding proper markups;
- mark code blocks and literals as such;
- mark lists as such;
- mark tables as such;
- use footnote markup;
- adjust identation, whitespaces and blank lines;
- add to networking/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 14:40:18 -07:00