From 6122f5c72e38a88eda13c7168e2ebbd3bd80b681 Mon Sep 17 00:00:00 2001 From: Trigger Huang Date: Mon, 19 Aug 2024 15:53:22 +0800 Subject: drm/amdgpu: skip printing vram_lost if needed The vm lost status can only be obtained after a GPU reset occurs, but sometimes a dev core dump can be happened before GPU reset. So a new argument is added to tell the dev core dump implementation whether to skip printing the vram_lost status in the dump. And this patch is also trying to decouple the core dump function from the GPU reset function, by replacing the argument amdgpu_reset_context with amdgpu_job to specify the context for core dump. V2: Inform user if VRAM lost check is skipped so users don't assume VRAM wasn't lost (Alex) Signed-off-by: Trigger Huang Suggested-by: Alex Deucher Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 49ef22dcf7fb..45edf99ae7ec 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5489,7 +5489,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, vram_lost = amdgpu_device_check_vram_lost(tmp_adev); if (!test_bit(AMDGPU_SKIP_COREDUMP, &reset_context->flags)) - amdgpu_coredump(tmp_adev, vram_lost, reset_context); + amdgpu_coredump(tmp_adev, false, vram_lost, reset_context->job); if (vram_lost) { DRM_INFO("VRAM is lost due to GPU reset!\n"); -- cgit v1.2.3 From af76ca8e180f38a7d874c18cf810707762766627 Mon Sep 17 00:00:00 2001 From: Victor Zhao Date: Mon, 26 Aug 2024 00:14:26 +0800 Subject: drm/amd/amdgpu: move drain_workqueue before shutdown is set [background] when unloading amdgpu driver right after running a workload, drain_workqueue is causing "Fence fallback timer expired on ring sdma0.0". Under sriov, this issue will cause sriov full access timeout and a reset happening. move drain_workqueue before shutdown is set to allow ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 45edf99ae7ec..f4628412dac4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4531,6 +4531,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) { dev_info(adev->dev, "amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); + + if (adev->mman.initialized) + drain_workqueue(adev->mman.bdev.wq); adev->shutdown = true; /* make sure IB test finished before entering exclusive mode @@ -4551,9 +4554,6 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) } amdgpu_fence_driver_hw_fini(adev); - if (adev->mman.initialized) - drain_workqueue(adev->mman.bdev.wq); - if (adev->pm.sysfs_initialized) amdgpu_pm_sysfs_fini(adev); if (adev->ucode_sysfs_en) -- cgit v1.2.3