Skip to main content

By design, each Docker image has at least one application installed.  Behind the application, there is operating system.  Sometimes, it will be useful to gain access to the Docker container to run the application interactively.  In the DGX cluster, we can use Kubernetes to allocate resources and launch a Docker image to gain access to a Docker container in interactive manner.  Starting May 2021, we have implemented a time limit of 10 hours for interactive session in DGX cluster.

In Longleaf login node, invoke the following command.  The Kubernetes in DGX cluster has a mechanism to figure out your UID and GID.

kubectl run cuda-shell --rm -ti --limits='cpu=4,memory=4G,nvidia.com/gpu=1' --restart=Never \
--image nvcr.io/nvidia/cuda:9.0-devel-ubuntu16.04 -- bash

In the above command, we allocate 4 CPU cores, 4G memory, and one GPU, then, get Docker image nvcr.io/nvidia/cuda:9.0-devel-ubuntu16.04, name the pod  as “cuda-shell”, run it as a Docker container with bash shell access.  The name “cuda-shell” needs to be unique in DGX Kubernetes cluster.  If there is a job running using the same pod name “cuda-shell”, you will need to change it to something else to avoid conflict.  For this particular Docker image, you need to have authenticate to NVIDIA GPU Cloud (NGC) in advance.  When the resources have been reserved for this job, you will see this message.

If you don't see a command prompt, try pressing enter.

Press enter and you will be running the container interactively.

Once you are done with the container, you can type “exit” at the prompt.  The corresponding Kubernetes job should stop and be removed.  Resources will be released for other jobs to use.  Since computer hardware is valuable resources, try not to ask for more resources than you need for the job, and remove the job as soon as you are done with it.

In the above kubectl command, you gain access to a container with CUDA installed.  If you are interested in using this container with /proj file system access, copy and paste the following command and save in a file called cuda-shell.sh for example.

kubectl run cuda-shell --rm -ti -limits='cpu=4,memory=4G,nvidia.com/gpu=1' --restart-Never \
--image=nvcr.io/nvidia/cuda:9.0-devel-ubuntu16.04 --overrides='
{
   "kind": "Pod",
   "apiVersion": "v1",
   "metadata": {
      "name": "cuda-shell",
      "creationTimestamp": null,
      "labels": {
         "run": "cuda-shell"
      }
   },
   "spec": {
      "containers": [{
         "name": "cuda-shell",
         "image": "nvcr.io/nvidia/cuda:9.0-devel-ubuntu16.04",
         "args": ["sh"],
         "resources": {},
         "terminationMessagePath": "/dev/termination-log",
         "terminationMessagePolicy": "File",
         "imagePullPolicy": "IfNotPresent",
         "stdin": true,
         "stdinOnce": true,
         "tty": true,
         "volumeMounts": [{
            "mountPath": "/proj",
            "name": "store"
         }]
      }],
      "volumes": [{
         "name":"store",
         "hostPath":{"path":"/proj"}
      }],
      "restartPolicy": "Never",
      "terminationGracePeriodSeconds": 30000,
      "dnsPolicy": "ClusterFirst",
      "securityContext": {},
      "schedulerName": "default-scheduler"
   },
   "status": {}
}
' -- bash

Run this script with the following command and you will have the interactive session with /proj access.

sh ./cuda-shell.sh

When you are done with the container, type “exit” to exit the container.  Resources will be released for other jobs to use.

exit