T4 GPUs include Tensor Cores, which can accelerate certain deep learning operations, particularly when working with mixed-precision training
GPU P100
NVIDIA GPU model from the Tesla series.
Pascal
3,584 CUDA cores
16 GB or 12 GB of HBM2 memory.
In general, GPU P100 is considered more powerful than GPU T4, primarily due to the higher number of CUDA cores and better memory bandwidth. However, the actual performance depends on the specific workload you’re running.
GPU Model: T4 is an NVIDIA GPU model from the Tesla series. Architecture: T4 is based on the Turing architecture. CUDA Cores: T4 has 2,560 CUDA cores. Memory: T4 typically has 16 GB of GDDR6 memory. Performance: T4 offers good performance for a range of tasks, including machine learning, deep learning, and general GPU-accelerated computations. Tensor Cores: T4 GPUs include Tensor Cores, which can accelerate certain deep learning operations, particularly when working with mixed-precision training.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.35 GiB is allocated by PyTorch, and 144.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Model load has failed. Doesn't exist.
GPUのメモリが足りない・・・? 確認してみる
c:\Dev\StreamDiffusion>nvidia-smi
Wed Jan 17 17:13:45 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 522.06 Driver Version: 522.06 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 58C P8 9W / N/A | 114MiB / 4096MiB | 22% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 5588 C+G ...ser\Application\brave.exe N/A |
| 0 N/A N/A 7000 C+G ...n64\EpicGamesLauncher.exe N/A |
+-----------------------------------------------------------------------------+
今はありそうな気もするけど実行時に足りなくなってしまったということなのかな。
そして結局
Found cached model: engines\KBlueLeaf/kohaku-v2.1--lcm_lora-True--tiny_vae-True--max_batch-2--min_batch-2--mode-img2img\unet.engine.onnx
Generating optimizing model: engines\KBlueLeaf/kohaku-v2.1--lcm_lora-True--tiny_vae-True--max_batch-2--min_batch-2--mode-img2img\unet.engine.opt.onnx
[W] Model does not contain ONNX domain opset information! Using default opset.
UNet: original .. 0 nodes, 0 tensors, 0 inputs, 0 outputs
UNet: cleanup .. 0 nodes, 0 tensors, 0 inputs, 0 outputs
[I] Folding Constants | Pass 1
[W] Model does not contain ONNX domain opset information! Using default opset.
Only support models of onnx opset 7 and above.
Traceback (most recent call last):
File "C:\Dev\StreamDiffusion\examples\screen\..\..\utils\wrapper.py", line 546, in _load_model
compile_unet(
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\streamdiffusion\acceleration\tensorrt\__init__.py", line 76, in compile_unet
builder.build(
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\streamdiffusion\acceleration\tensorrt\builder.py", line 70, in build
optimize_onnx(
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\streamdiffusion\acceleration\tensorrt\utilities.py", line 437, in optimize_onnx
onnx_opt_graph = model_data.optimize(onnx.load(onnx_path))
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\streamdiffusion\acceleration\tensorrt\models.py", line 118, in optimize
opt.fold_constants()
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\streamdiffusion\acceleration\tensorrt\models.py", line 49, in fold_constants
onnx_graph = fold_constants(gs.export_onnx(self.graph), allow_onnxruntime_shape_inference=True)
File "<string>", line 3, in fold_constants
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\polygraphy\backend\base\loader.py", line 40, in __call__
return self.call_impl(*args, **kwargs)
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\polygraphy\util\util.py", line 694, in wrapped
return func(*args, **kwargs)
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\polygraphy\backend\onnx\loader.py", line 424, in call_impl
postfold_num_nodes = onnx_util.get_num_nodes(model)
File "C:\Dev\StreamDiffusion\.venv\lib\site-packages\polygraphy\backend\onnx\util.py", line 41, in get_num_nodes
return _get_num_graph_nodes(model.graph)
AttributeError: 'NoneType' object has no attribute 'graph'
Acceleration has failed. Falling back to normal mode.
>pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch
Error invoking remote method ‘docker-start-container’: Error: (HTTP code 500) server error – Ports are not available: exposing port TCP 127.0.0.1:5432 -> 0.0.0.0:0: listen tcp 127.0.0.1:5432: bind: An attempt was made to access a socket in a way forbidden by its access permissions.
VSCode のコマンドパレットから「Remote-Containers: Add Development Container Configuration Files…」を選択し、作成したDockerfileを指定します。
devcontainer.jsonが作成されるので、編集します
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.245.0/containers/docker-existing-dockerfile
{
"name": "Existing Dockerfile",
// Sets the run context to one level up instead of the .devcontainer folder.
"context": "..",
// Update the 'dockerFile' property if you aren't using the standard 'Dockerfile' filename.
"dockerFile": "../Dockerfile",
// 追記ここから。GPU利用可のコンテナ起動オプション
"runArgs":[
"--gpus",
"all"
],
// 追記ここまで。
"customizations": {
"vscode": {
"extensions": [
"ms-python.python"
]
}
}
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Uncomment the next line to run commands after the container is created - for example installing curl.
// "postCreateCommand": "apt-get update && apt-get install -y curl",
// Uncomment when using a ptrace-based debugger like C++, Go, and Rust
// "runArgs": [ "--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined" ],
// Uncomment to use the Docker CLI from inside the container. See https://aka.ms/vscode-remote/samples/docker-from-docker.
// "mounts": [ "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" ],
// Uncomment to connect as a non-root user if you've added one. See https://aka.ms/vscode-remote/containers/non-root.
// "remoteUser": "vscode"
}
コマンドパレットから「Remote-Containers: Rebuild and Reopen in Container」を選択することで、コンテナ内でVSCodeが起動する形となります。 VSCode の左下がこの状態ですね
Modelの配置
前回取得した Hugging Face のモデルを、Stable Diffusionがモデルのデフォルトパスとして指定している”models/ldm/stable-diffusion-v1/model.ckpt”に配備します。 このあたりは、”txt2image.py”を読んでいくとなんのパラメータ指定が出来るのかがわ借ります。 別のパスを指定する場合は —ckpt オプションで指定可能となっています
実行!
参考にさせていただいたページに従って、test.pyを用意し、実行します
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2/stable-diffusion# python test.py
Traceback (most recent call last):
File "test.py", line 6, in <module>
pipe = StableDiffusionPipeline.from_pretrained(
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/diffusers/pipeline_utils.py", line 154, in from_pretrained
cached_folder = snapshot_download(
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/huggingface_hub/utils/_deprecation.py", line 93, in inner_f
return f(*args, **kwargs)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/huggingface_hub/_snapshot_download.py", line 168, in snapshot_download
repo_info = _api.repo_info(
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1454, in repo_info
return self.model_info(
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1276, in model_info
_raise_for_status(r)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 169, in _raise_for_status
raise e
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 131, in _raise_for_status
response.raise_for_status()
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/CompVis/stable-diffusion-v1-4/revision/fp16 (Request ID: PQ6M8N2Lators-j6XdN6V)
Access to model CompVis/stable-diffusion-v1-4 is restricted and you are not in the authorized list. Visit https://huggingface.co/CompVis/stable-diffusion-v1-4 to ask for access.
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2/stable-diffusion# python test.py
Downloading: 100%|████████████████████████████████████████████████████| 1.34k/1.34k [00:00<00:00, 1.13MB/s]
Downloading: 100%|████████████████████████████████████████████████████| 12.5k/12.5k [00:00<00:00, 11.2MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 342/342 [00:00<00:00, 337kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 543/543 [00:00<00:00, 557kB/s]
Downloading: 100%|████████████████████████████████████████████████████| 4.63k/4.63k [00:00<00:00, 4.02MB/s]
Downloading: 100%|██████████████████████████████████████████████████████| 608M/608M [00:20<00:00, 29.6MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 209/209 [00:00<00:00, 208kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 209/209 [00:00<00:00, 185kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 572/572 [00:00<00:00, 545kB/s]
Downloading: 100%|██████████████████████████████████████████████████████| 246M/246M [00:08<00:00, 29.6MB/s]
Downloading: 100%|███████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 583kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 476kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 788/788 [00:00<00:00, 864kB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 1.06M/1.06M [00:01<00:00, 936kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 772/772 [00:00<00:00, 780kB/s]
Downloading: 100%|████████████████████████████████████████████████████| 1.72G/1.72G [00:54<00:00, 31.7MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 71.2k/71.2k [00:00<00:00, 188kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 550/550 [00:00<00:00, 532kB/s]
Downloading: 100%|██████████████████████████████████████████████████████| 167M/167M [00:05<00:00, 28.5MB/s]
0it [00:01, ?it/s]
Traceback (most recent call last):
File "test.py", line 15, in <module>
image = pipe(prompt)["sample"][0]
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 137, in __call__
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings)["sample"]
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 150, in forward
sample, res_samples = downsample_block(
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/diffusers/models/unet_blocks.py", line 505, in forward
hidden_states = attn(hidden_states, context=encoder_hidden_states)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/diffusers/models/attention.py", line 168, in forward
x = block(x, context=context)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/diffusers/models/attention.py", line 196, in forward
x = self.attn1(self.norm1(x)) + x
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/diffusers/models/attention.py", line 245, in forward
sim = torch.einsum("b i d, b j d -> b i j", q, k) * self.scale
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 4.00 GiB total capacity; 3.13 GiB already allocated; 0 bytes free; 3.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2# mkdir basujindal
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2# ls
Dockerfile basujindal stable-diffusion
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2# cd basujindal/
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2/basujindal# git clone https://github.com/basujindal/stable-diffusion.git
適当なフォルダを作って、リポジトリからclone
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2/basujindal# cd stable-diffusion/
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2/basujindal/stable-diffusion# python optimizedSD/optimized_txt2img.py --prompt "a photograph of an astronaut riding a horse" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 10 --ddim_steps 50
Global seed set to 27
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Traceback (most recent call last):
File "optimizedSD/optimized_txt2img.py", line 184, in <module>
sd = load_model_from_config(f"{ckpt}")
File "optimizedSD/optimized_txt2img.py", line 29, in load_model_from_config
pl_sd = torch.load(ckpt, map_location="cpu")
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/serialization.py", line 699, in load
with _open_file_like(f, 'rb') as opened_file:
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/serialization.py", line 231, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/serialization.py", line 212, in __init__
super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/ldm/stable-diffusion-v1/model.ckpt'
おっと、モデルを新たに置き直さないと行けない。 サクッとコピーして、再度実行する
(ldm) root@21feb17171f4:/workspaces/StableDiffusion2/basujindal/stable-diffusion# python optimizedSD/optimized_txt2img.py --prompt "a photograph of an astronaut riding a horse" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 10 --ddim_steps 50
Global seed set to 27
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
Traceback (most recent call last):
File "optimizedSD/optimized_txt2img.py", line 204, in <module>
model = instantiate_from_config(config.modelUNet)
File "/stable-diffusion/ldm/util.py", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/stable-diffusion/ldm/util.py", line 93, in get_obj_from_str
return getattr(importlib.import_module(module, package=None), cls)
File "/opt/miniconda3/envs/ldm/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'optimizedSD'