This section of the guide has the aim to provide you with the easiest possible way to start using the DEI cluster. The current page shouldn't be seen as a complete guide, but more as a recap of or a first glimpse into what is much more thoroughly debated in the other sections.
The numbers and the bold text are used to make more clear what the relevant steps are. A number followed by the "b" letter indicates an alternative to the solution proposed with the same number.
To use the cluster in order to run your software you must (1) prepare a Slurm job file, that is a file of instructions for our job scheduler as detailed in Slurm basics. Inside the file you can make a call to the actual execution of your software.
The following is an example with the mandatory options needed plus some other:
#!/bin/bash
#SBATCH --ntasks 4
#SBATCH --partition allgroups
#SBATCH --time 02:00
#SBATCH --mem 1G
#SBATCH --output output_%j.txt
#SBATCH --error errors_%j.txt
#SBATCH --mail-user my_email_address
#SBATCH --mail-type ALL
cd $WORKING_DIR
#your working directory
srun <your_software>
Here we've told the scheduler to reserve resources on the cluster in order to have 4 tasks running simultaneously, 1 GB of RAM, 2 hours of time limit and we've instructed it to place our output and error info files in a certain position and with a "dynamic" name. We've also told Slurm to send an email for all the events (included the "end" event with useful accounting data) to our email address.
If one or more GPUs are needed check the chapter Working with GPUs.
After preparing this, to submit a job to the cluster you must (2) connect via ssh to login.dei.unipd.it using DEI credentials.
Finally (3) you can submit the job file through this command:
sbatch <job_file>
where <job_file> is the name you've chosen.
(4) Now your job is in a queue, ready to be executed, and you get its id. The time it stays queued depends on the resources you've requested (the less the faster it starts) and on the current cluster occupation.
You can check the status of a job inside the queue by using the squeue
command, as explained in Jobs management, for example:
squeue -j <job_id>
You can take control of a job, whatever the status, using the command scontrol
; you can use scancel
to remove it from the queue.
During the execution of the job you can check up on its resource usage by using the command nvtop
for GPUs or myjobinfo <job_id>
for peak RAM usage. After the end of the execution you can have much more punctual information with the commands sacct
and seff
: more details in the How my jobs are performing chapter. At the end you will also receive an email with the content of the seff command if the option for email has been correctly set.
Alternatively, (3b) Jobs can also be interactive, as shown in the chapter Interactive jobs. The available commands are interactive
, for command line, or sinteractive
, for running software with a graphical interface. Your resources will be limited, but you can decide your allocation by providing options when launching the command, such as:
sinteractive --ntasks 4 --time=01:00:00 --mem 40G
In order to use a graphical interface, before calling the sinteractive command the (2b) X option is required with the ssh command:
ssh -X username@login.dei.unipd.it
Important limitations
- At any moment each user has a limit of 70 running jobs;
- each job has a duration limit of 35 days: it is strongly advisable to request much less for the job to start in a reasonable time;
- each user has a limit of 8 GPUs: it is strongly advisable to request the minimum number of GPUs needed for the job to start in a reasonable time and not to waste resources;
- jobs requiring A40 GPUs are limited to 8 cores per GPU;
- jobs requiring RTX 3090 GPUs are limited to 4 cores per GPU.
The system is equipped with some software managed by the administrators. The software catalog can be extended by:
- requesting the installation of some software system wide, if the software is considered useful for a large group of users, by using the DEI Helpdesk System;
- installing your software inside a Singularity container. Take a look at the dedicated chapters of this guide: Singularity basics and Singularity examples.