[torch] torch._C._cuda_getDeviceCount() = 0 해결

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

꾸준하게

[torch] torch._C._cuda_getDeviceCount() = 0 해결 본문

이슈 해결

[torch] torch._C._cuda_getDeviceCount() = 0 해결

yeonsikc 2025. 8. 22. 01:35

dpkg -l | grep -i fabricmanager || rpm -qa | grep -i fabricmanager
sudo apt-get install -y nvidia-fabricmanager-57
sudo systemctl enable --now nvidia-fabricmanager
python -c "import torch; print(torch.cuda.is_available())"

nvidia-smi, nvcc -V 모두 이상이 없지만 위와 같이 torch에서 cuda를 사용할 수 없다고 나온다.

reboot 하면 해결된다는 글들이 많지만 나는 fabricmanager라는게 dead 상태였고, 이를 다시 작동시켜 해결하였다.

참고로 사용 GPU는 a100*8 환경이다.

>>> import torch
torch.cuda.is_available()
>>> torch.cuda.is_available()
/home/genai06/miniforge3/envs/ys/lib/python3.12/site-packages/torch/cuda/__init__.py:174: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

'이슈 해결' 카테고리의 다른 글

[vLLM] ValueError: 'aimv2' is already used by a Transformers config, pick another name. (0)	2025.08.22
[vLLM] AttributeError: 'MultiprocExecutor' object has no attribute 'workers' (0)	2025.05.07
ImportError: /home/lawform/miniconda3/envs/ys/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (0)	2024.12.03
[Azure] NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. (2)	2024.09.02
[VS code] Python Debugger가 작동하지 않을 때 (외부코드) (0)	2024.08.17

'이슈 해결' Related Articles

꾸준하게

[torch] torch._C._cuda_getDeviceCount() = 0 해결 본문

[torch] torch._C._cuda_getDeviceCount() = 0 해결

'이슈 해결' 카테고리의 다른 글

티스토리툴바