Gromacs is a versatile software package to run molecular dynamics which is to simulate the Newtonian equations of motion for systems with hundreds to millions of particles. Gromacs provides extremely high performance in computing. With GPU acceleration, Gromacs runs extremely fast in the Longleaf and DGX clusters. In this tutorial, we discuss how we can run Gromacs jobs at UNC Research Computing Center.
Preparing Gromacs Job
First of all, we would like to designate a directory to run a Gromacs job, either a test or a production run, it is always a good idea to have a separate directory for each job. In this tutorial, we are going to have a Gromacs job in /proj file system.
mkdir -p /proj/its/cdpoon/project/adh cd /proj/its/cdpoon/project/adh
Here we download the standard Gromacs benchmark data set, ADH.
wget ftp://ftp.gromacs.org/benchmarks/ADH_bench_systems.tar.gz tar -zxvf ADF_bench_systems.tar.gz
We then have a directory named ADH which has 4 sets of data. We are going to focus on one of them.
In that directory, 2 Gromacs parameter sets are provided. We are going to create a link to one of them with a default Gromacs parameter set name.
ln -s pme_verlet.mdp grompp.mdp
Gromacs allows us to use default filenames to simplify commands. For examples, Gromacs parameter set name is grompp.mdp. Gromacs input atom coordinate filename is conf.gro. Gromacs topology filename is topol.top. If not using default filenames, we will have to enter filenames in the command line.
Next, we are going to create the Gromacs run input file. That is the file Gromacs needed to compute. To do that, we set up Gromacs in our computing environment. In this tutorial, we use Gromacs 2021.3.
module load gcc/9.1.0 module load cuda/11.4 source /nas/longleaf/apps/gromacs/2021.3/avx2_256-cuda11.4/bin/GMXRC.bash gmx grompp
A new file named topol.tpr is created and that is the Gromacs run input file. Then we can submit the job to run in either Longleaf or DGX cluster.
Running Gromacs in Longleaf Cluster
To run Gromacs job in Longleaf cluster, it would be much easier to use a Slurm job submission script. For the ADH benchmark job, we create something like the following. We name this script as run.slrum.
#!/bin/bash #SBATCH --job-name=adh_cubic #SBATCH --ntasks=1 #SBATCH --cpus-per-task=10 #SBATCH --mem=1G #SBATCH --time=4:00:00 #SBATCH --partition=beta-gpu #SBATCH --output=log.%x.%j #SBATCH --gres=gpu:1 #SBATCH --qos=gpu_access unset OMP_NUM_THREADS module load gcc/9.1.0 module load cuda/11.4 source /nas/longleaf/apps/gromacs/2021.3/avx2_256-cuda11.4/bin/GMXRC.bash # Change to working directory cd /proj/its/cdpoon/project/ADH/adh_cubic # Run Gromacs MD gmx_gpu mdrun -ntmpi 1 -ntomp 10 -update gpu -nb gpu -bonded gpu -pme gpu
In this script, we ask to allocate 1 NVIDIA A100 GPU, 10 CPU cores, 1 GB of memory, and 4 hours of maximum run time for this job. And, we would like the job to run in beta-gpu partition. If you have not had permission to use any of the GPU partitions, email firstname.lastname@example.org and state your needs. To submit this job to Longleaf, we use the following command. Note the Slurm job ID from the output.
To monitor the progress of this job, we can use this command. Replace <ONYEN> with your real ONYEN. There is also a Slurm Job ID in the output.
squeue -u <ONYEN>
When the job is finished, it is a good idea to check the job efficiency to make sure that the resource allocation is not too excessive. Replace <JOB_ID> with the job’s real ID. <JOB_ID> can also be found from the Slurm job log file filename. In that log filename, we can extract <JOB_ID>.
If we are using too much memory in the job, cut that down for the next run.
Gromacs keeps log in a file named md.log. Read that file to see how the job runs. If the job finishes successfully, you should see the job performance at the end of the file.
Running Gromacs in DGX Cluster
For running Gromacs jobs on the DGX cluster, follow this direction.
To submit Gromacs job to the DGX cluster, we need to create a YAML file which defines job name, resource requirement, Docker image, location of work directory, etc. For work directory, we are going to use /proj which is locally mounted on the DGX cluster. This example also uses a Docker image created at UNC by the Research Computing Center and can be pulled from NVIDIA GPU Cloud (NGC) registry. This requires that you have NGC account and you have access to our UNC Research Computing Center private Docker registry.
In Spring 2021, we have implemented Volcano for job scheduling. The following YAML file shows how to submit Gromacs jobs to Volcano scheduler gpu queue with 1 GPU, 8 CPUs, and 4 GB of memory.
apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: name: job_name spec: minAvailable:1 schedulerName: volcano queue: queue_name policies: - event: PodEvicted action: RestartJob tasks: - replicas: 1 name: task_name policies: - event: TaskCompleted action: CompleteJob backoffLimit: 5 activeDeadlineSeconds: time_limit template: metadata: name: volcano-job labels: environment: research spec: restartPolicy: Never imagePullSecrets: - name: your_secret volumes: - name: proj hostPath: path: /proj type: Directory containers: - name: gromacs image: nvcr.io/uncchrc/gromacs:2020.4-cuda9.2-ubuntu18.04 resources: requests: cpu: 8 memory: 4Gi nvidia.com/gpu: 1 limits: cpu: 8 memory: 4Gi nvidia.com/gpu: 1 volumeMounts: - name: proj mountPath: /proj readOnly: false command: - "/bin/bash" - "-c" - > cd work_directory && gmx_command
In the above YAML file, change the tags to what you desire.
job_name : Name of the Kubernetes job, has to be unique, no other job in the cluster should have the same name
queue_name: Name of the Volcano queue
task_name: Name of the task, this name will be used in creating the pod name
time_limit: Set time limit for the job in second, when the job exceeds this limit, it will be terminated, for example, set to 86400 for one day time limit
your_secret : Enter the Kubernetes secret here, this Kubernetes holds the credential for the container to access your NGC registry, follow direction in “Nvidia GPU Cloud” page to create your own Kubernetes secret
work_directory : Enter the work directory, should be in /proj
gmx_command : Enter the Gromacs gmx command here, for running Gromacs molecular dynamics, use “gmx_gpu mdrun -ntomp 8 -ntmpi 1” command to run molecular dynamics with GPU acceleration using 1 GPU, 8 CPUs
In this example, we allocate 8 CPU cores, 4 GB of memory, and 1 GPU to run the job. Change these numbers according to your job requirement. This YAML also asks to use the image, nvcr.io/uncchr/gromacs with tag 2020.4-cuda9.2-ubuntu18.04.
To find all the Volcano queues in the setup, use this command.
kubectl get queue
To submit your Gromacs job, run this command.
kubectl create -f name_yaml
Replace name_yaml with the real filename of your YAML file.
We have created a script named “kubelist” to list all your Volcano jobs in the DGX cluster.
When the job starts to run, Kubernetes will create a pod with pod name based on the job name and the task name. We can use the following command to list out all pods in the DGX cluster. Use the -owide to provide long listing.
kubectl get pod kubectl get pod -owide
From the list, you can find the pod name of the job, pod_name, you submit. When the pod is running, its status should show as “Running”. We can also view the output of the pod by the following command.
kubectl logs pod_name
Once the job is finished, you will have to delete the Volcano job from the job list. Use the following command to delete your Volcano job. It is important that we delete all the completed Volcano jobs from the list to keep the list clean.
kubectl delete vcjob job_name