linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Files	Lines
2023-02-22	bpf: add missing header file include	Linus Torvalds	1	-0/+1
	Commit 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") built fine on x86-64 and arm64, and that's the extent of my local build testing. It turns out those got the <linux/nospec.h> include incidentally through other header files (<linux/kvm_host.h> in particular), but that was not true of other architectures, resulting in build errors kernel/bpf/core.c: In function ‘___bpf_prog_run’: kernel/bpf/core.c:1913:3: error: implicit declaration of function ‘barrier_nospec’ so just make sure to explicitly include the proper <linux/nospec.h> header file to make everybody see it. Fixes: 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Reported-by: Huacai Chen <chenhuacai@loongson.cn> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-22	gfs2: Convert gfs2_page_add_databufs to folios	Andreas Gruenbacher	3	-8/+8
	Convert gfs2_page_add_databufs() to folios and rename it to gfs2_trans_add_databufs(). Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-02-22	gfs2: jdata writepage fix	Andreas Gruenbacher	1	-2/+1
	The ->writepage() and ->writepages() operations are supposed to write entire pages. However, on filesystems with a block size smaller than PAGE_SIZE, __gfs2_jdata_writepage() only adds the first block to the current transaction instead of adding the entire page. Fix that. Fixes: 18ec7d5c3f43 ("[GFS2] Make journaled data files identical to normal files on disk") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-02-21	uaccess: Add speculation barrier to copy_from_user()	Dave Hansen	3	-2/+11
	The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively: if (access_ok(from, size)) // Right here even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated. But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down. "copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this: if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar); If userspace passes in an evil 'uptr' that actually points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values. Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy. Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code. Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away. Reported-by: Jordy Zomer <jordyzomer@google.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-21	smackfs: Added check catlen	Denis Arefev	1	-3/+14
	If the catlen is 0, the memory for the netlbl_lsm_catmap structure must be allocated anyway, otherwise the check of such rules is not completed correctly. Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2023-02-21	page_pool: add a comment explaining the fragment counter usage	Ilias Apalodimas	1	-0/+10
	When reading the page_pool code the first impression is that keeping two separate counters, one being the page refcnt and the other being fragment pp_frag_count, is counter-intuitive. However without that fragment counter we don't know when to reliably destroy or sync the outstanding DMA mappings. So let's add a comment explaining this part. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20230217222130.85205-1-ilias.apalodimas@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21	net: ethtool: fix __ethtool_dev_mm_supported() implementation	Vladimir Oltean	1	-1/+1
	The MAC Merge layer is supported when ops->get_mm() returns 0. The implementation was changed during review, and in this process, a bug was introduced. Link: https://lore.kernel.org/netdev/20230111161706.1465242-5-vladimir.oltean@nxp.com/ Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Link: https://lore.kernel.org/all/20230220122343.1156614-2-vladimir.oltean@nxp.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21	ethtool: pse-pd: Fix double word in comments	Bo Liu	1	-1/+1
	Remove the repeated word "for" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230221083036.2414-1-liubo03@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21	xsk: add linux/vmalloc.h to xsk.c	Xuan Zhuo	1	-0/+1
	Fix the failure of the compilation under the sh4. Because we introduced remap_vmalloc_range() earlier, this has caused the compilation failure on the sh4 platform. So this introduction of the header file of linux/vmalloc.h. config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20230221/202302210041.kpPQLlNQ-lkp@intel.com/config) compiler: sh4-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=9f78bf330a66cd400b3e00f370f597e9fa939207 git remote add net-next https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git git fetch --no-tags net-next master git checkout 9f78bf330a66cd400b3e00f370f597e9fa939207 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash net/ Fixes: 9f78bf330a66 ("xsk: support use vaddr as ring") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302210041.kpPQLlNQ-lkp@intel.com/ Link: https://lore.kernel.org/r/20230221075140.46988-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21	drm/fb-helper: Remove drm_fb_helper_unprepare() from drm_fb_helper_fini()	Thomas Zimmermann	10	-3/+17
	Move drm_fb_helper_unprepare() from drm_fb_helper_fini() into the calling fbdev implementation. Avoids a possible stale mutex with generic fbdev code. As indicated by its name, drm_fb_helper_prepare() prepares struct drm_fb_helper before setting up the fbdev support with a call to drm_fb_helper_init(). In legacy fbdev emulation, this happens next to each other. If successful, drm_fb_helper_fini() later tear down the fbdev device and also unprepare via drm_fb_helper_unprepare(). Generic fbdev emulation prepares struct drm_fb_helper immediately after allocating the instance. It only calls drm_fb_helper_init() as part of processing a hotplug event. If the hotplug-handling fails, it runs drm_fb_helper_fini(). This unprepares the fb-helper instance and the next hotplug event runs on stale data. Solve this by moving drm_fb_helper_unprepare() from drm_fb_helper_fini() into the fbdev implementations. Call it right before freeing the fb-helper instance. Fixes: 643231b28380 ("drm/fbdev-generic: Minimize hotplug error handling") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230216140620.17699-1-tzimmermann@suse.de
2023-02-21	sefltests: netdevsim: wait for devlink instance after netns removal	Jiri Pirko	1	-0/+18
	When devlink instance is put into network namespace and that network namespace gets deleted, devlink instance is moved back into init_ns. This is done as a part of cleanup_net() routine. Since cleanup_net() is called asynchronously from workqueue, there is no guarantee that the devlink instance move is done after "ip netns del" returns. So fix this race by making sure that the devlink instance is present before any other operation. Reported-by: Amir Tzin <amirtz@nvidia.com> Fixes: b74c37fd35a2 ("selftests: netdevsim: add tests for devlink reload with resources") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230220132336.198597-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-21	selftest: fib_tests: Always cleanup before exit	Roxana Nicolescu	1	-0/+2
	Usage of `set -e` before executing a command causes immediate exit on failure, without cleanup up the resources allocated at setup. This can affect the next tests that use the same resources, leading to a chain of failures. A simple fix is to always call cleanup function when the script exists. This approach is already used by other existing tests. Fixes: 1056691b2680 ("selftests: fib_tests: Make test results more verbose") Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com> Link: https://lore.kernel.org/r/20230220110400.26737-2-roxana.nicolescu@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-21	update internal module version number for cifs.ko	Steve French	1	-1/+1
	From 2.41 to 2.42 Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-21	cifs: update ip_addr for ses only for primary chan setup	Shyam Prasad N	1	-7/+11
	We update ses->ip_addr whenever we do a session setup. But this should happen only for primary channel in mchan scenario. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-21	cifs: use tcon allocation functions even for dummy tcon	Shyam Prasad N	1	-2/+2
	In smb2_reconnect_server, we allocate a dummy tcon for calling reconnect for just the session. This should be allocated using tconInfoAlloc, and not kmalloc. Fixes: 3663c9045f51 ("cifs: check reconnects for channels of active tcons too") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-21	cifs: use the least loaded channel for sending requests	Shyam Prasad N	1	-4/+29
	Till now, we've used a simple round robin approach to distribute the requests between the channels. This does not work well if the channels consume the requests at different speeds, even if the advertised speeds are the same. This change will allow the client to pick the channel with least number of requests currently in-flight. This will disregard the link speed, and select a channel based on the current load of the channels. For cases when all the channels are equally loaded, fall back to the old round robin method. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20	net/mlx5e: Align IPsec ASO result memory to be as required by hardware	Leon Romanovsky	1	-1/+1
	Hardware requires an alignment to 64 bytes to return ASO data. Missing this alignment caused to unpredictable results while ASO events were generated. Fixes: 8518d05b8f9a ("net/mlx5e: Create Advanced Steering Operation object for IPsec") Reported-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/de0302c572b90c9224a72868d4e0d657b6313c4b.1676797613.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/mlx5e: TC, Set CT miss to the specific ct action instance	Paul Blakey	5	-24/+82
	Currently, CT misses restore the missed chain on the tc skb extension so tc will continue from the relevant chain. Instead, restore the CT action's miss cookie on the extension, which will instruct tc to continue from the this specific CT action instance on the relevant filter's action list. Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use this miss mapping instead of the current chain miss object (CHAIN_MISS) for CT action misses. To restore this new miss mapping value, add a RX restore rule for each such mapping value. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Sholmo <ozsh@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG	Paul Blakey	5	-14/+14
	This reg usage is always a mapped object, not necessarily containing chain info. Rename to properly convey what it stores. This patch doesn't change any functionality. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/mlx5: Refactor tc miss handling to a single function	Paul Blakey	4	-227/+231
	Move tc miss handling code to en_tc.c, and remove duplicate code. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/mlx5: Kconfig: Make tc offload depend on tc skb extension	Paul Blakey	5	-15/+2
	Tc skb extension is a basic requirement for using tc offload to support correct restoration on action miss. Depend on it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/sched: flower: Support hardware miss to tc action	Paul Blakey	1	-1/+12
	To support hardware miss to tc action in actions on the flower classifier, implement the required getting of filter actions, and setup filter exts (actions) miss by giving it the filter's handle and actions. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/sched: flower: Move filter handle initialization earlier	Paul Blakey	1	-27/+35
	To support miss to action during hardware offload the filter's handle is needed when setting up the actions (tcf_exts_init()), and before offloading. Move filter handle initialization earlier. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/sched: cls_api: Support hardware miss to tc action	Paul Blakey	7	-27/+236
	For drivers to support partial offload of a filter's action list, add support for action miss to specify an action instance to continue from in sw. CT action in particular can't be fully offloaded, as new connections need to be handled in software. This imposes other limitations on the actions that can be offloaded together with the CT action, such as packet modifications. Assign each action on a filter's action list a unique miss_cookie which drivers can then use to fill action_miss part of the tc skb extension. On getting back this miss_cookie, find the action instance with relevant cookie and continue classifying from there. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/sched: Rename user cookie and act cookie	Paul Blakey	6	-32/+32
	struct tc_action->act_cookie is a user defined cookie, and the related struct flow_action_entry->act_cookie is used as an handle similar to struct flow_cls_offload->cookie. Rename tc_action->act_cookie to user_cookie, and flow_action_entry->act_cookie to cookie so their names would better fit their usage. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	sfc: fix builds without CONFIG_RTC_LIB	Alejandro Lucero	1	-1/+1
	Add an embarrassingly missed semicolon plus and embarrassingly missed parenthesis breaking kernel building when CONFIG_RTC_LIB is not set like the one reported with ia64 config. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302170047.EjCPizu3-lkp@intel.com/ Fixes: 14743ddd2495 ("sfc: add devlink info support for ef100") Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Link: https://lore.kernel.org/r/20230220110133.29645-1-alejandro.lucero-palau@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	sfc: clean up some inconsistent indentings	Yang Li	1	-2/+2
	Fix some indentngs and remove the warning below: drivers/net/ethernet/sfc/mae.c:657 efx_mae_enumerate_mports() warn: inconsistent indenting Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4117 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20230220065958.52941-1-yang.lee@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/mlx4_en: Introduce flexible array to silence overflow warning	Kees Cook	2	-11/+12
	The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY memcpy() warning on ppc64 platform: In function ‘fortify_memcpy_chk’, inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2, inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4, inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3: ./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 513 \| __write_overflow_field(p_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same behaviour on x86 you can get if you use "__always_inline" instead of "inline" for skb_copy_from_linear_data() in skbuff.h The call here copies data into inlined tx destricptor, which has 104 bytes (MAX_INLINE) space for data payload. In this case "spc" is known in compile-time but the destination is used with hidden knowledge (real structure of destination is different from that the compiler can see). That cause the fortify warning because compiler can check bounds, but the real bounds are different. "spc" can't be bigger than 64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined tx descriptor. The fact that "inl" points into inlined tx descriptor is determined earlier in mlx4_en_xmit(). Avoid confusing the compiler with "inl + 1" constructions to get to past the inl header by introducing a flexible array "data" to the struct so that the compiler can see that we are not dealing with an array of inl structs, but rather, arbitrary data following the structure. There are no changes to the structure layout reported by pahole, and the resulting machine code is actually smaller. Reported-by: Josef Oskera <joskera@redhat.com> Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com Fixes: f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time") Cc: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	cifs: DIO to/from KVEC-type iterators should now work	David Howells	1	-20/+0
	DIO to/from KVEC-type iterators should now work as the iterator is passed down to the socket in non-RDMA/non-crypto mode and in RDMA or crypto mode care is taken to handle vmap/vmalloc correctly and not take page refs when building a scatterlist. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20	cifs: Remove unused code	David Howells	1	-606/+0
	Remove a bunch of functions that are no longer used and are commented out after the conversion to use iterators throughout the I/O path. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928621823.457102.8777804402615654773.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211421039.3154751.15199634443157779005.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348881165.2106726.2993852968344861224.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364827876.3334034.9331465096417303889.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126396915.708021.2010212654244139442.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697261080.61150.17513116912567922274.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732033255.3186319.5527423437137895940.stgit@warthog.procyon.org.uk/ # rfc Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20	cifs: Build the RDMA SGE list directly from an iterator	David Howells	2	-93/+63
	In the depths of the cifs RDMA code, extract part of an iov iterator directly into an SGE list without going through an intermediate scatterlist. Note that this doesn't support extraction from an IOBUF- or UBUF-type iterator (ie. user-supplied buffer). The assumption is that the higher layers will extract those to a BVEC-type iterator first and do whatever is required to stop the pages from going away. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/166697260361.61150.5064013393408112197.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732032518.3186319.1859601819981624629.stgit@warthog.procyon.org.uk/ # rfc Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20	cifs: Change the I/O paths to use an iterator rather than a page list	David Howells	14	-1091/+1133
	Currently, the cifs I/O paths hand lists of pages from the VM interface routines at the top all the way through the intervening layers to the socket interface at the bottom. This is a problem, however, for interfacing with netfslib which passes an iterator through to the ->issue_read() method (and will pass an iterator through to the ->issue_write() method in future). Netfslib takes over bounce buffering for direct I/O, async I/O and encrypted content, so cifs doesn't need to do that. Netfslib also converts IOVEC-type iterators into BVEC-type iterators if necessary. Further, cifs needs foliating - and folios may come in a variety of sizes, so a page list pointing to an array of heterogeneous pages may cause problems in places such as where crypto is done. Change the cifs I/O paths to hand iov_iter iterators all the way through instead. Notes: (1) Some old routines are #if'd out to be removed in a follow up patch so as to avoid confusing diff, thereby making the diff output easier to follow. I've removed functions that don't overlap with anything added. (2) struct smb_rqst loses rq_pages, rq_offset, rq_npages, rq_pagesz and rq_tailsz which describe the pages forming the buffer; instead there's an rq_iter describing the source buffer and an rq_buffer which is used to hold the buffer for encryption. (3) struct cifs_readdata and cifs_writedata are similarly modified to smb_rqst. The ->read_into_pages() and ->copy_into_pages() are then replaced with passing the iterator directly to the socket. The iterators are stored in these structs so that they are persistent and don't get deallocated when the function returns (unlike if they were stack variables). (4) Buffered writeback is overhauled, borrowing the code from the afs filesystem to gather up contiguous runs of folios. The XARRAY-type iterator is then used to refer directly to the pagecache and can be passed to the socket to transmit data directly from there. This includes: cifs_extend_writeback() cifs_write_back_from_locked_folio() cifs_writepages_region() cifs_writepages() (5) Pages are converted to folios. (6) Direct I/O uses netfs_extract_user_iter() to create a BVEC-type iterator from an IOBUF/UBUF-type source iterator. (7) smb2_get_aead_req() uses netfs_extract_iter_to_sg() to extract page fragments from the iterator into the scatterlists that the crypto layer prefers. (8) smb2_init_transform_rq() attached pages to smb_rqst::rq_buffer, an xarray, to use as a bounce buffer for encryption. An XARRAY-type iterator can then be used to pass the bounce buffer to lower layers. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Paulo Alcantara <pc@cjr.nz> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164311907995.2806745.400147335497304099.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/164928620163.457102.11602306234438271112.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211420279.3154751.15923591172438186144.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348880385.2106726.3220789453472800240.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364827111.3334034.934805882842932881.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126396180.708021.271013668175370826.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697259595.61150.5982032408321852414.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732031756.3186319.12528413619888902872.stgit@warthog.procyon.org.uk/ # rfc Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20	net: lan966x: Fix possible deadlock inside PTP	Horatiu Vultur	1	-2/+2
	When doing timestamping in lan966x and having PROVE_LOCKING enabled the following warning is shown. ======================================================== WARNING: possible irq lock inversion dependency detected 6.2.0-rc7-01749-gc54e1f7f7e36 #2786 Tainted: G N -------------------------------------------------------- swapper/0/0 just changed the state of lock: c2609f50 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x16c/0x2e8 but this lock took another, SOFTIRQ-unsafe lock in the past: (&lan966x->ptp_ts_id_lock){+.+.}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lan966x->ptp_ts_id_lock); local_irq_disable(); lock(_xmit_ETHER#2); lock(&lan966x->ptp_ts_id_lock); <Interrupt> lock(_xmit_ETHER#2); * DEADLOCK * 5 locks held by swapper/0/0: #0: c1001e18 ((&ndev->rs_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x33c #1: c105e7c4 (rcu_read_lock){....}-{1:2}, at: ndisc_send_skb+0x134/0x81c #2: c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x17c/0xc64 #3: c105e7d8 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x4c/0x1224 #4: c3056174 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x354/0x1224 the shortest dependencies between 2nd lock and 1st lock: -> (&lan966x->ptp_ts_id_lock){+.+.}-{2:2} { HARDIRQ-ON-W at: lock_acquire.part.0+0xb0/0x248 _raw_spin_lock+0x38/0x48 lan966x_ptp_irq_handler+0x164/0x2a8 irq_thread_fn+0x1c/0x78 irq_thread+0x130/0x278 kthread+0xec/0x110 ret_from_fork+0x14/0x28 SOFTIRQ-ON-W at: lock_acquire.part.0+0xb0/0x248 _raw_spin_lock+0x38/0x48 lan966x_ptp_irq_handler+0x164/0x2a8 irq_thread_fn+0x1c/0x78 irq_thread+0x130/0x278 kthread+0xec/0x110 ret_from_fork+0x14/0x28 INITIAL USE at: lock_acquire.part.0+0xb0/0x248 _raw_spin_lock_irqsave+0x4c/0x68 lan966x_ptp_txtstamp_request+0x128/0x1cc lan966x_port_xmit+0x224/0x43c dev_hard_start_xmit+0xa8/0x2f0 sch_direct_xmit+0x108/0x2e8 __dev_queue_xmit+0x41c/0x1224 packet_sendmsg+0xdb4/0x134c __sys_sendto+0xd0/0x154 sys_send+0x18/0x20 ret_fast_syscall+0x0/0x1c } ... key at: [<c174ba0c>] __key.2+0x0/0x8 ... acquired at: _raw_spin_lock_irqsave+0x4c/0x68 lan966x_ptp_txtstamp_request+0x128/0x1cc lan966x_port_xmit+0x224/0x43c dev_hard_start_xmit+0xa8/0x2f0 sch_direct_xmit+0x108/0x2e8 __dev_queue_xmit+0x41c/0x1224 packet_sendmsg+0xdb4/0x134c __sys_sendto+0xd0/0x154 sys_send+0x18/0x20 ret_fast_syscall+0x0/0x1c -> (_xmit_ETHER#2){+.-.}-{2:2} { HARDIRQ-ON-W at: lock_acquire.part.0+0xb0/0x248 _raw_spin_lock+0x38/0x48 netif_freeze_queues+0x38/0x68 dev_deactivate_many+0xac/0x388 dev_deactivate+0x38/0x6c linkwatch_do_dev+0x70/0x8c __linkwatch_run_queue+0xd4/0x1e8 linkwatch_event+0x24/0x34 process_one_work+0x284/0x744 worker_thread+0x28/0x4bc kthread+0xec/0x110 ret_from_fork+0x14/0x28 IN-SOFTIRQ-W at: lock_acquire.part.0+0xb0/0x248 _raw_spin_lock+0x38/0x48 sch_direct_xmit+0x16c/0x2e8 __dev_queue_xmit+0x41c/0x1224 ip6_finish_output2+0x5f4/0xc64 ndisc_send_skb+0x4cc/0x81c addrconf_rs_timer+0xb0/0x2f8 call_timer_fn+0xb4/0x33c expire_timers+0xb4/0x10c run_timer_softirq+0xf8/0x2a8 __do_softirq+0xd4/0x5fc __irq_exit_rcu+0x138/0x17c irq_exit+0x8/0x28 __irq_svc+0x90/0xbc arch_cpu_idle+0x30/0x3c default_idle_call+0x44/0xac do_idle+0xc8/0x138 cpu_startup_entry+0x18/0x1c rest_init+0xcc/0x168 arch_post_acpi_subsys_init+0x0/0x8 INITIAL USE at: lock_acquire.part.0+0xb0/0x248 _raw_spin_lock+0x38/0x48 netif_freeze_queues+0x38/0x68 dev_deactivate_many+0xac/0x388 dev_deactivate+0x38/0x6c linkwatch_do_dev+0x70/0x8c __linkwatch_run_queue+0xd4/0x1e8 linkwatch_event+0x24/0x34 process_one_work+0x284/0x744 worker_thread+0x28/0x4bc kthread+0xec/0x110 ret_from_fork+0x14/0x28 } ... key at: [<c175974c>] netdev_xmit_lock_key+0x8/0x1c8 ... acquired at: __lock_acquire+0x978/0x2978 lock_acquire.part.0+0xb0/0x248 _raw_spin_lock+0x38/0x48 sch_direct_xmit+0x16c/0x2e8 __dev_queue_xmit+0x41c/0x1224 ip6_finish_output2+0x5f4/0xc64 ndisc_send_skb+0x4cc/0x81c addrconf_rs_timer+0xb0/0x2f8 call_timer_fn+0xb4/0x33c expire_timers+0xb4/0x10c run_timer_softirq+0xf8/0x2a8 __do_softirq+0xd4/0x5fc __irq_exit_rcu+0x138/0x17c irq_exit+0x8/0x28 __irq_svc+0x90/0xbc arch_cpu_idle+0x30/0x3c default_idle_call+0x44/0xac do_idle+0xc8/0x138 cpu_startup_entry+0x18/0x1c rest_init+0xcc/0x168 arch_post_acpi_subsys_init+0x0/0x8 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G N 6.2.0-rc7-01749-gc54e1f7f7e36 #2786 Hardware name: Generic DT based system unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from mark_lock.part.0+0x59c/0x93c mark_lock.part.0 from __lock_acquire+0x978/0x2978 __lock_acquire from lock_acquire.part.0+0xb0/0x248 lock_acquire.part.0 from _raw_spin_lock+0x38/0x48 _raw_spin_lock from sch_direct_xmit+0x16c/0x2e8 sch_direct_xmit from __dev_queue_xmit+0x41c/0x1224 __dev_queue_xmit from ip6_finish_output2+0x5f4/0xc64 ip6_finish_output2 from ndisc_send_skb+0x4cc/0x81c ndisc_send_skb from addrconf_rs_timer+0xb0/0x2f8 addrconf_rs_timer from call_timer_fn+0xb4/0x33c call_timer_fn from expire_timers+0xb4/0x10c expire_timers from run_timer_softirq+0xf8/0x2a8 run_timer_softirq from __do_softirq+0xd4/0x5fc __do_softirq from __irq_exit_rcu+0x138/0x17c __irq_exit_rcu from irq_exit+0x8/0x28 irq_exit from __irq_svc+0x90/0xbc Exception stack(0xc1001f20 to 0xc1001f68) 1f20: ffffffff ffffffff 00000001 c011f840 c100e000 c100e000 c1009314 c1009370 1f40: c10f0c1a c0d5e564 c0f5da8c 00000000 00000000 c1001f70 c010f0bc c010f0c0 1f60: 600f0013 ffffffff __irq_svc from arch_cpu_idle+0x30/0x3c arch_cpu_idle from default_idle_call+0x44/0xac default_idle_call from do_idle+0xc8/0x138 do_idle from cpu_startup_entry+0x18/0x1c cpu_startup_entry from rest_init+0xcc/0x168 rest_init from arch_post_acpi_subsys_init+0x0/0x8 Fix this by using spin_lock_irqsave/spin_lock_irqrestore also inside lan966x_ptp_irq_handler. Fixes: e85a96e48e33 ("net: lan966x: Add support for ptp interrupts") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230217210917.2649365-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	net/ulp: Remove redundant ->clone() test in inet_clone_ulp().	Kuniyuki Iwashima	1	-2/+1
	Commit 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status") guarantees that all ULP listeners have clone() op, so we no longer need to test it in inet_clone_ulp(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	MAINTAINERS: Add Miquel Raynal as additional maintainer for ieee802154	Stefan Schmidt	1	-0/+1
	We are growing the maintainer team for ieee802154 to spread the load for review and general maintenance. Miquel has been driving the subsystem forward over the last year and we would like to welcome him as a maintainer. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230218211317.284889-4-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	MAINTAINERS: Switch maintenance for mrf24j40 driver over	Stefan Schmidt	1	-2/+2
	Alan Ott has not been actively working on the driver or reviewing patches for several years. I have been taking odd fixes in through the wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this reality. I wanted to thank Alan for his work on the driver. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Link: https://lore.kernel.org/r/20230218211317.284889-3-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	MAINTAINERS: Switch maintenance for mcr20a driver over	Stefan Schmidt	1	-2/+2
	Xue Liu has not been actively working on the driver or reviewing patches for several years. I have been taking odd fixes in through the wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this reality. I wanted to thank Xue Liu for his work on the driver. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Link: https://lore.kernel.org/r/20230218211317.284889-2-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20	MAINTAINERS: Switch maintenance for cc2520 driver over	Stefan Schmidt	1	-2/+2
	Varka Bhadram has not been actively working on the driver or reviewing patches for several years. I have been taking odd fixes in through the wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this reality. I wanted to thank Varka for his work on the driver. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Link: https://lore.kernel.org/r/20230218211317.284889-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21	tracing/eprobe: no need to check for negative ret value for snprintf	Quanfa Fu	1	-8/+4
	No need to check for negative return value from snprintf() as the code does not return negative values. Link: https://lore.kernel.org/all/20230109040625.3259642-1-quanfafu@gmail.com/ Signed-off-by: Quanfa Fu <quanfafu@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21	test_kprobes: Add recursed kprobe test case	Masami Hiramatsu (Google)	1	-2/+37
	Add a recursed kprobe test case to the KUnit test module for kprobes. This will probe a function which is called from the pre_handler and post_handler itself. If the kprobe is correctly implemented, the recursed kprobe handlers will be skipped and the number of skipped kprobe will be counted on kprobe::nmissed. Link: https://lore.kernel.org/all/167414238758.2301956.258548940194352895.stgit@devnote3/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21	tracing/probe: add a char type to show the character value of traced arguments	Donglin Peng	5	-2/+53
	There are scenes that we want to show the character value of traced arguments other than a decimal or hexadecimal or string value for debug convinience. I add a new type named 'char' to do it and a new test case file named 'kprobe_args_char.tc' to do selftest for char type. For example: The to be traced function is 'void demo_func(char type, char *name);', we can add a kprobe event as follows to show argument values as we want: echo 'p:myprobe demo_func $arg1:char +0($arg2):char[5]' > kprobe_events we will get the following trace log: ... myprobe: (demo_func+0x0/0x29) arg1='A' arg2={'b','p','f','1',''} Link: https://lore.kernel.org/all/20221219110613.367098-1-dolinux.peng@gmail.com/ Signed-off-by: Donglin Peng <dolinux.peng@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21	selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols	Masami Hiramatsu (Google)	1	-1/+1
	Fix kprobe probepoint testcase to ignore __pfx_* prefix symbols. Those are introduced by commit b341b20d648b ("x86: Add prefix symbols for function padding") for identifying PADDING_BYTES of NOPs. Since kprobe events can not probe these prefix symbols, this testcase has to skip those symbols. Link: https://lore.kernel.org/all/167309835609.640500.9664678940260305746.stgit@devnote3/ Fixes: b341b20d648b ("x86: Add prefix symbols for function padding") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
2023-02-21	selftests/ftrace: Fix eprobe syntax test case to check filter support	Masami Hiramatsu (Google)	1	-1/+3
	Fix eprobe syntax test case to check whether the kernel supports the filter on eprobe for filter syntax test command. Without this fix, this test case will fail if the kernel supports eprobe but doesn't support the filter on eprobe. Link: https://lore.kernel.org/all/167309834742.640500.379128668288448035.stgit@devnote3/ Fixes: 9e14bae7d049 ("selftests/ftrace: Add eprobe syntax error testcase") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
2023-02-21	tracing/eprobe: Fix to add filter on eprobe description in README file	Masami Hiramatsu (Google)	1	-1/+1
	Fix to add a description of the filter on eprobe in README file. This is required to identify the kernel supports the filter on eprobe or not. Link: https://lore.kernel.org/all/167309833728.640500.12232259238201433587.stgit@devnote3/ Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-21	x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range	Yang Jihong	3	-2/+3
	When arch_prepare_optimized_kprobe calculating jump destination address, it copies original instructions from jmp-optimized kprobe (see __recover_optprobed_insn), and calculated based on length of original instruction. arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when checking whether jmp-optimized kprobe exists. As a result, setup_detour_execution may jump to a range that has been overwritten by jump destination address, resulting in an inval opcode error. For example, assume that register two kprobes whose addresses are <func+9> and <func+11> in "func" function. The original code of "func" function is as follows: 0xffffffff816cb5e9 <+9>: push %r12 0xffffffff816cb5eb <+11>: xor %r12d,%r12d 0xffffffff816cb5ee <+14>: test %rdi,%rdi 0xffffffff816cb5f1 <+17>: setne %r12b 0xffffffff816cb5f5 <+21>: push %rbp 1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1. After the optimization, "func" code changes to: 0xffffffff816cc079 <+9>: push %r12 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp Now op1->flags == KPROBE_FLAG_OPTIMATED; 2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2. register_kprobe(kp2) register_aggr_kprobe alloc_aggr_kprobe __prepare_optimized_kprobe arch_prepare_optimized_kprobe __recover_optprobed_insn // copy original bytes from kp1->optinsn.copied_insn, // jump address = <func+14> 3. disable kp1: disable_kprobe(kp1) __disable_kprobe ... if (p == orig_p \|\| aggr_kprobe_disabled(orig_p)) { ret = disarm_kprobe(orig_p, true) // add op1 in unoptimizing_list, not unoptimized orig_p->flags \|= KPROBE_FLAG_DISABLED; // op1->flags == KPROBE_FLAG_OPTIMATED \| KPROBE_FLAG_DISABLED ... 4. unregister kp2 __unregister_kprobe_top ... if (!kprobe_disabled(ap) && !kprobes_all_disarmed) { optimize_kprobe(op) ... if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return return; p->kp.flags \|= KPROBE_FLAG_OPTIMIZED; // now op2 has KPROBE_FLAG_OPTIMIZED } "func" code now is: 0xffffffff816cc079 <+9>: int3 0xffffffff816cc07a <+10>: push %rsp 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp 5. if call "func", int3 handler call setup_detour_execution: if (p->flags & KPROBE_FLAG_OPTIMIZED) { ... regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX; ... } The code for the destination address is 0xffffffffa021072c: push %r12 0xffffffffa021072e: xor %r12d,%r12d 0xffffffffa0210731: jmp 0xffffffff816cb5ee <func+14> However, <func+14> is not a valid start instruction address. As a result, an error occurs. Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21	x86/kprobes: Fix __recover_optprobed_insn check optimizing logic	Yang Jihong	3	-3/+4
	Since the following commit: commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe may be in the optimizing or unoptimizing state when op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty. The __recover_optprobed_insn check logic is incorrect, a kprobe in the unoptimizing state may be incorrectly determined as unoptimizing. As a result, incorrect instructions are copied. The optprobe_queued_unopt function needs to be exported for invoking in arch directory. Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Cc: stable@vger.kernel.org Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21	kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list	Masami Hiramatsu (Google)	1	-13/+10
	Since forcibly unoptimized kprobes will be put on the freeing_list directly in the unoptimize_kprobe(), do_unoptimize_kprobes() must continue to check the freeing_list even if unoptimizing_list is empty. This bug can happen if a kprobe is put in an instruction which is in the middle of the jump-replaced instruction sequence of an optprobe, and the optprobe is recently unregistered and queued on unoptimizing_list. In this case, the optprobe will be unoptimized forcibly (means immediately) and put it into the freeing_list, expecting the optprobe will be handled in do_unoptimize_kprobe(). But if there is no other optprobes on the unoptimizing_list, current code returns from the do_unoptimize_kprobe() soon and does not handle the optprobe which is on the freeing_list. Then the optprobe will hit the WARN_ON_ONCE() in the do_free_cleaned_kprobes(), because it is not handled in the latter loop of the do_unoptimize_kprobe(). To solve this issue, do not return from do_unoptimize_kprobes() immediately even if unoptimizing_list is empty. Moreover, this change affects another case. kill_optimized_kprobes() expects kprobe_optimizer() will just free the optprobe on freeing_list. So I changed it to just do list_move() to freeing_list if optprobes are on unoptimizing list. And the do_unoptimize_kprobe() will skip arch_disarm_kprobe() if the probe on freeing_list has gone flag. Link: https://lore.kernel.org/all/Y8URdIfVr3pq2X8w@xpf.sh.intel.com/ Link: https://lore.kernel.org/all/167448024501.3253718.13037333683110512967.stgit@devnote3/ Fixes: e4add247789e ("kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-20	cifs: Add a function to read into an iter from a socket	David Howells	2	-0/+17
	Add a helper function to read data from a socket into the given iterator. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928617874.457102.10021662143234315566.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211419563.3154751.18431990381145195050.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348879662.2106726.16881134187242702351.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364826398.3334034.12541600783145647319.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126395495.708021.12328677373159554478.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258876.61150.3530237818849429372.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732031039.3186319.10691316510079412635.stgit@warthog.procyon.org.uk/ # rfc Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20	cifs: Add some helper functions	David Howells	2	-0/+96
	Add some helper functions to manipulate the folio marks by iterating through a list of folios held in an xarray rather than using a page list. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/164928616583.457102.15157033997163988344.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165211418840.3154751.3090684430628501879.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165348878940.2106726.204291614267188735.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165364825674.3334034.3356201708659748648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166126394799.708021.10637797063862600488.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/166697258147.61150.9940790486999562110.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732030314.3186319.9209944805565413627.stgit@warthog.procyon.org.uk/ # rfc Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20	cifs: Add a function to Hash the contents of an iterator	David Howells	1	-0/+144
	Add a function to push the contents of a BVEC-, KVEC- or XARRAY-type iterator into a synchronous hash algorithm. UBUF- and IOBUF-type iterators are not supported on the assumption that either we're doing buffered I/O, in which case we won't see them, or we're doing direct I/O, in which case the iterator will have been extracted into a BVEC-type iterator higher up. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-crypto@vger.kernel.org Link: https://lore.kernel.org/r/166697257423.61150.12070648579830206483.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/166732029577.3186319.17162612653237909961.stgit@warthog.procyon.org.uk/ # rfc Signed-off-by: Steve French <stfrench@microsoft.com>