Tracking GPU Memory Usage
A tutorial of checking memory usage
The most amazing thing about Collaboratory (or Google's generousity) is that there's also GPU option available.
In this short notebook we look at how to track GPU memory usage.
This notebook has been divided into sections. Feel free to skip the sections which you are already familar with.
Footnote: This notebook is a fork of a great Colab notebook and it is edited for my personal reference.
Enabling GPU (on Colab)
If you are using a Colab environment and have not tried switching to the GPU mode on the Colab notebook before, here's a quick refresher on that.
When using another notebook/environment you will need to find out
how to connect to a GPU runtime on your own.
Sorry, I haven't perfected my environment-sensing and mind-reading skills yet.
Follow on the collaboratory menu tabs, "Runtime" => "Change runtime type".
Choosing Runtime type
Then you should see a pop-up where you can choose GPU.
After you change your runtime, your runtime should automatically restart (which means information from executed cells disappear).
Checking Runtime type
A quick way to check your current runtime is to hover on the toolbar where it shows the RAM
and Disk
details.
If it mentions "(GPU)"
, then the Colab notebook is connected to a GPU runtime. Otherwise a standard CPU runtime.
Checking GPU availability
To find out if GPU is available, we have two preferred ways:
-
PyTorch / Tensorflow APIs (Framework interface)
Every deep learning framework has an API to check the details of the available GPU devices.
-
Nvidia SMI (Command line interface)
Nvidia is the manufacturer of the GPUs currently used for Deep Learning.
Nvidia provides a command line tool for their System Management Interface(nvidia-smi
for short)
import torch
torch.cuda.is_available()
!nvidia-smi
Violla! I got Tesla P100 GPU w/ 16 GB memory.
This could be different for you.
On Google Colab you might get a Tesla K80 GPU with 12 GB memory too.
Now if you want to acquire values in this summary text, youl probably want something else like gputi
.
Next section demonstrates how.
Fetching GPU usage stats in code
To find out if GPU is available, we have again multiple ways.
I have two preferred ways based on whether I'm working with a DL framework or writing things from scratch.
Here they are:
-
PyTorch / Tensorflow APIs (Framework interface)
Every deep learning framework has an API to monitor the stats of the GPU devices.
It is easier to use this if working with a DL framework.
-
USing GPUtil python packages (Custom function)
A few python packages like
gputil
provide a interface to fetch GPU usage statistics.
This can be used if you are not working with any DL framework.
import torch
# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()
#Additional Info when using cuda
if device.type == 'cuda':
print(torch.cuda.get_device_name(0))
print('Memory Usage:')
print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
print('Cached: ', round(torch.cuda.memory_cached(0)/1024**3,1), 'GB')
!pip install gputil
!pip install psutil
!pip install humanize
Now gputil
, psutil
, and humanize
are all available.
# Import packages
import os,sys,humanize,psutil,GPUtil
# Define function
def mem_report():
print("CPU RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ))
GPUs = GPUtil.getGPUs()
for i, gpu in enumerate(GPUs):
print('GPU {:d} ... Mem Free: {:.0f}MB / {:.0f}MB | Utilization {:3.0f}%'.format(i, gpu.memoryFree, gpu.memoryTotal, gpu.memoryUtil*100))
# Execute function
# mem_report()
import torchvision.models as models
wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)
if torch.cuda.is_available():
resnet18.cuda()
mem_report()
Closing words
Now you can use any of the above methods anywhere you want the GPU Memory Usage from.
I typically use it from while training a Deep Learning model within the training loop. This helps me to get a sense of how much of the GPU memory is available/unused by me. Based on that I can increase/decrease the batch size to utilize the GPU resources efficiently.