Skip to main content

It has become very popular these days for software developers to package applications in Docker images.  The idea is to have the software applications run anywhere, cloud or on-premise, without worrying about if there is any missing library, module, etc.  Any missing dependency will cause the software unusable.  Putting everything related to the targeted software into a Docker image has proven to be a very useful way to deliver applications.  In that regard, building Docker image is one of the techniques which is quite useful for providing applications to end-users.  Once the Docker images are delivered to end-users, applications can be run on machines regardless of how the OSes are set up.

To build and run Docker images, we can do that in a special VCL image at the University of North Carolina – Chapel Hill maintained by ITS Research Computing Center, TarHeel Linux, CentOS 7 (Full Blade with GPU).

VCL has long been a valuable teaching tool. Here we describe a way to use VCL for research. We have completely overhauled and updated the TarHeel Linux, CentOS 7 (Full Blade with GPU) VCL image to include SSHFS, Podman, etc. SSHFS allows us to mount remote filesystem in a secure way.  Podman is a tool for building and running Docker images. In our implementation, Podman is configured to run rootless Docker containers. One can simply use any Docker command in Podman.  Singularity is another way of working with containers and it is also installed.

For long running jobs, one can submit these jobs to the DGX cluster directly from the VCL.  One can also check and monitor jobs running in DGX cluster.

This page is intended to tell you the basic usage of the VCL image.  If you have a research project in mind and would like to see if VCL is going to help, please email research@unc.edu.  For other questions and comments in using TarHeel Linux, CentOS 7 (Full Blade with GPU) VCL image, please email research@unc.edu too.

Accessing VCL of TarHeel Linux, CentOS 7 (Full Blade with GPU)

Use your favorite web browser to point to https://vcl.unc.edu.  Select “Shibboleth (UNC-Chapel Hill)” in the pull-down menu and click “Proceed to Login” to continue.  Authenticate with your ONYEN with your ONYEN password.  Off-campus access will require a VPN connection in advance.  In the menu on the left, click “Reservations”, theclick “New Reservation”.  In the “New Reservation” window, use the pull-down menu to choose “TarHeel Linux, CentOS 7 (Full Blade with GPU).  You can also choose to start now or later and set the duration of the reservation.  Click “Create Reservation” to continue.  When the VCL is ready for you to use, it will present you with a “Connect” button.  Click “Connect” to see the IP address of the VCL, username and password.  Username is normally your ONYEN.  For password, you can use the provided temporary password or your ONYEN password.  Use your favorite terminal to ssh to the machine with the provided IP address.  Assuming that the login ID of your local machine is also your ONYEN.  Otherwise, you will have to specify your ONYEN in the command.

ssh <ip_address>
ssh <onyen>@<ip_address>

If graphics need to be exported from VCL to your machine, add “-X” in the command to allow X11 port forwarding.

ssh -X <ip_address>
ssh -X <onyen>@<ip_address>

Once you are done with the VCL, you can exit from the terminal.  Then, click “Delete Reservation” to release the resources for others to use.

Accessing DGX Cluster with Kubernetes

Normal VCL session is limited to 10 hours or so.  If your job needs to take more time to run, you can submit your job to the DGX cluster.  The DGX cluster is using Kubernetes as job manager.  If you have already created Kubernetes token to access DGX cluster in Longleaf login node, you are ready to submit Kubernetes job from VCL.  Run the following command to set up your Kubernetes environment for DGX cluster by copying your Kubernetes token from Longleaf over. 

setup-dgx-kube 

The above command will check if you have Kubernetes config file in Longleaf.  If you do, it will copy the file over to VCL.  You will have to enter your ONYEN password twice to get the file.  After you have your DGX Cluster Kubernetes config file.  Run the following command to check.

kubectl get node

If the above command returns with a list of nodes in the DGX cluster, you are ready to submit jobs to the DGX cluster.

To see all the jobs currently running the DGX Kubernetes cluster, you can invoke the following command.

kubectl get job

If you have jobs completed and/or died, please delete those jobs to clean up the job list..

kubectl delete job <job_name>

A Kubernetes job will spin out one or more pods, you can see the status of all pods with this command.  With the -owide option, you see more information about pods.

kubectl get pod
kubectl get pod -owide

Please remove your jobs as soon as you are done using the DGX cluster resources.  Any idle process in CPU or GPU in DGX cluster is a waste of resources.

Mounting /proj

In research, we normally have research data and files saved in the /proj filesystem.  Storage space on the /proj filesystem is granted to projects and PI groups who request it. It is advantageous to be able to access /proj in the VCL if you already have space there.  Also, the output files can be saved in /proj.  Once we exit VCL, those output files will remain intact in /proj. 

At the prompt running VCL of TarHeel Linux, CentOS 7 (Full Blade with GPU), you can type this command. 

setup-proj-access 

To unmount, use this command. 

fusermount –u /proj 

Accessing Longleaf Home Directory

If you have files in your Longleaf Home directory, you may want to access your Longleaf home directory in VCL.  You may also want to save files in your Longleaf home directory.  To access your Longleaf home directory, use the following command. 

setup-longleaf-home

To navigate to your Longleaf home directory use cd command. replace <onyen> with your ONYEN.

cd /nas/home/longleaf/<onyen> 

To unmount your Longleaf home directory, use this command.

fusermount –u /nas/longleaf/home/<onyen> 

Checking Podman and Singularity

The TarHeel Linux, CentOS 7 (Full Blade with GPU) VCL image provides the basis for computing in research.  Software applications can be loaded in terms of Docker and Singularity containers.  To check the availability of Docker and Singularity commands, we can simply invoke the following. 

docker version 
singularity version 

Using this VCL image, one can pull or build Docker and Singularity images and run Docker and Singularity containers.

Switching CUDA Version

In the VCL instance, we have the latest CUDA version 11.2 installed as of February 2021.  However, some applications would like to have access to older versions of CUDA.  Therefore, we have installed multiple versions of CUDA in the VCL instance.  To check what versions of CUDA are installed, invoke the following command. 

source switch-cuda 

By default, CUDA version 11.2 is used when VCL is started.  To switch to use different version of CUDA, use the following command, replaced <version> with the version number you desire. 

source switch-cuda <version>

If the CUDA version you like to have is not there, please email research@unc.edu to express your need.

Pulling Docker Images from DockerHub

Once you log into the VCL instance, you can pull Docker images from Docker registries.  There are numerous Docker images one can get from various registries such as NVIDIA GPU Cloud (NGC) and DockerHub.  To pull Docker images from NGC, you will need an account there.  Here we demonstrate how we can pull a Docker image from DockerHub and use it in VCL.

Use the following command to pull a CUDA 9.0 with Development Library in Ubuntu 16.04 Docker image from DockerHub registry.

docker pull docker.io/nvidia/cuda:9.0-devel-ubuntu16.04

If the image has already been pulled before, it will not pull again.  You can check if the image is already there or not with this command.

docker images

Now, you are ready to run the image and create a container.

docker run -ti --rm docker.io/nvidia/cuda:9.0-devel-ubuntu16.04

The prompt will change to something like the following indicating you are inside the container with container ID as 07ac4ae5eea4.  The prompt of the container changes to “#” at the end indicating root access.  In fact, you are NOT running as root in the container since we have pre-configured Podman to run in rootless mode by default.

root@07ac4ae5eea4:/#

While you are in the container, you can run commands like you normally do in a Linux system.  In the container, you can check the status of the GPUs with this command.

root@07ac4ae5eea4:/# nvidia-smi

When you done with this container, you can type exit.

root@07ac4ae5eea4:/# exit

Then, the container will be removed since we have “–rm” in the “docker run” command.  However, the image will stay in the node until we remove it with this command.

docker rmi docker.io/nvidia/cuda:9.0-devel-ubuntu16.04

Building Docker Image with GPU Access

There are numerous Docker images one can get from various registries such as NVIDIA GPU Cloud (NGC) or DockerHub.  Sometimes, you want to build you own Docker images.  Particularly, you want to have access to a GPU in your own containers.  Here we demonstrate how we can build Docker images in VCL of TarHeel Linux, CentOS 7 (Full Blade with GPU). 

Once you login to VCL of TarHeel Linux, CentOS 7 (Full Blade with GPU), change directory to wherever you like, create a file named “Dockerfile” with the following listing. 

FROM docker.io/nvidia/cuda:9.2-devel-ubuntu18.04 

# Avoid user interaction with tzdata 
ARG DEBIAN_FRONTEND=noninteractive

# Update OS  
RUN apt-get -y update && apt-get -y upgrade 

# Set working directory 
WORKDIR /usr/local 

In this file, we ask to pull a basic Docker image from DockerHub which has CUDA 9.2 with development library in Ubuntu 18.04.  Then, we update the OS. 

We use either one of the following commands to build the Docker image.  The “.” in the first command is to indicate that we are using the default filename for the file as “Dockerfile”.  If the file is not named as “Dockerfile”, we can use “-f” option to indicate the filename as in the second command below.

docker build . -t cuda:9.2-devel-ubuntu18.04
docker build -f Dockerfile -t cuda:9.2-devel-ubuntu18.04

It will take just a short while to build the Docker image.  Once it is done, type command “docker images” to check its existence. 

$ docker images 
REPOSITORY            TAG                     IMAGE ID       CREATED          SIZE 
localhost/cuda        9.2-devel-ubuntu18.04   e7796838f843   1 minutes ago    2.37 GB 
docker.io/nvidia/cuda 9.2-devel-ubuntu18.04   816085a0101a   8 weeks ago      2.21 GB 

The one labelled as “localhost/cuda:9.2-devel-ubuntu18.04” is the one you have just created.  The other one listed as “docker.io/nvidia/cuda: 9.2-devel-ubuntu18.04” is what we pull from DockerHub to build the local one.  The new Docker image is saved locally.

We can then create a Docker container by running this image using this command.

docker run -ti --rm localhost/cuda:9.2-devel-ubuntu18.04

This Docker container has access to GPU too since Podman is pre-configured to do so.  In the container, invoke “nvidia-smi” command to check the status of the GPU.  When you are done with this container, type “exit” to finish and you will be back to VCL.  In case you would like to list all the dead and/or active containers, invoke “docker ps -a” at the prompt of VCL.  That list includes container ID.  You can do “docker stop <container_id>” to stop running container, “docker rm <container_id>” to remove dead container.

Running GUI Application in Docker Container

There are a lot of Docker images we can pull from various Docker registries.  Each of those Docker images serves at least one application.  Here, we demonstrate how we can create a Docker image with an application build-in.  When the application has a GUI, we can display the graphics too.  Since we would like to display graphics on our screen, we need to allow X11 forwarding from remote VCL machine.  To do that, use “ssh -X <ip_address>” to log into the VCL instance.  Once we are in the VCL, create a file named “Dockerfile” with the following listing.

FROM fedora

RUN yum -y update
RUN yum -y install xorg-x11-apps && yum clean all

CMD [ "/usr/bin/xclock" ]

Then, we can build a Docker image from this file using the following command.  Basically, this Docker image is Fedora-based.  We update the OS, and we then install xorg-x11-apps which includes the GUI application “xclock”.  The last line in the file is to say running “xclock” if we run the Docker image.

docker build . -t xclockimage

When the build is finished, we can check the existence of the image with the “docker images” command.

$ docker images
REPOSITORY                 TAG      IMAGE ID       CREATED          SIZE
localhost/xclockimage      latest   868d45fd0eb3   26 minutes ago   558 MB
docker.io/library/fedora   latest   33c4a622f37c   10 days ago      183 MB

You can see that the one labelled as localhost/xclockimage is the one we have just created.  Then, we can use the following command the run “xclock” from the Docker image.  The “–net=host” indicates that we are using the VCL network for the Docker container.  In the command, we have not specified which executable to run.  In this case, it is going to run what CMD indicates in the last line of the “Dockerfile” which is “xclock”.

docker run -ti --rm -e DISPLAY --net=host -v ~/.Xauthority:/root/.Xauthority:Z xclockimage

A graphical clock should appear on your screen.

This demonstrates that one can easily build Docker images with any desired application build-in with or without GUI.  Many users choose to utilize the “Dockerfile”.  When the user wants to invoke the application, the Docker image will be built from the “Dockerfile”.  If building the Docker image takes a long time, you may want to build and push the Docker image to one of the Docker registries, such as DockerHub.  When you need to run the application, just pull the image from the Docker registry.  Therefore, there is no need to build the Docker image.