Where can I write my temporary data?
The nodes of the cluster have a scratch directory /ext. This is writable by everyone and is the ONLY place to save temporary files.
Where can I write my permanent data?
If there is a need to write a large amount of data permanently and this exceeds the quota of the personal home, you should contact the research group you work with to have access to their space on the NAS or, if this is not possible, open a ticket motivating the need to have a greater share of space on your home.
How can I select a specific node of the cluster?
If the selection of a GPU via the --gres option leads to an ambiguity or if you just know that you want to work on a specific node of the cluster, you need to select that node with the --nodelist
option. If this node were, let's say, the machine named gpu1, then the option to add would be: --nodelist=gpu1
or, as an alternative, -w gpu1
. The downside of using this option is that you will probably have to wait some time before that particular machine is available to run the code.
I'd like to run my jobs on a different machine than the one I've been assigned: how can it be excluded?
A specific machine can be excluded using the --exclude
option. Assuming that the runner to avoid is number 01, the option becomes: --exclude=runner-01
or, as an alternative, -x runner-01
.
What should I do if I realize that my jobs need more time than initially estimated or if I need to exceed the maximum duration allowed?
In these cases, a report must be made to the IT Services via the helpdesk portal, which can be reached at the site https://www.dei.unipd.it/helpdesk, providing an estimate of the time needed to complete the jobs, i.e. how many days must be extended with respect to what was specified during the launch of the same.
How can I install a package that I need but is not present on the MATLAB installation that is made available?
In this regard, there are two possibilities:
toolboxFile = '\homeuserMy toolbox.mltbx'; agreeToLicense = true; matlab.addons.install(toolboxFile,agreeToLicense)
How can an environment like Anaconda be used in the execution of jobs?
You need to run the source
command followed by the path before running the actual code. You can then experiment with an interactive job by connecting with ssh to login.dei.unipd.it and then giving the interactive command. This opens a shell on the first free runner.
How can I persistently install missing packages on Singularity?
There are two ways to achieve this:
pip install
command. With the command source
it will then be possible to recall the /bin/activate of the environment.Is it possible to map a non-home workbook to Singularity?
The home path is automatically mapped inside the container. If you want to map the directory to, say, /my_dir you need to add the --bind /my_dir option: singularity exec --bind /my_dir
How can I run an externally compiled program on the compute cluster?
The binaries must necessarily be compiled with the libraries present in the cluster. So if some binaries are not recognized, you have to log in to login.dei.unipd.it and type the interactive
command which opens a shell on one of the servers in the cluster. At this point you need to recompile the programs.
Is a GPU allocation exclusive or if enough memory is left over the GPU can be shared with other jobs? And is it possible to specify the GPU memory needed?
Each allocation (CPU, RAM, GPU) is exclusive for the job, so the fact of not fully exploiting the internal memory of a GPU does not lead to being able to share it among multiple jobs.
I need parallel tasks in order to work with a parfor loop in MATLAB. How do I behave?
Assuming you want to have 10 workers, you will need to ask for a task and a number of CPUs equal to 11, so as to dedicate one to the Matlab process and 10 for the useful work of the parallel tasks:
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 11
Is it necessary to be user "root" to build the Singulariy image?
With Singularity you can build the image without having to be a "root" user or in the "sudoer" list.
How can I receive accurate information on how my job has done?
There's a chapter entirely dedicated to statistics and accounting. Sometimes errors can be made on the estimate of the resources to allocate for a job. Try to be conservative... but not too much: too little RAM for example can result in an expected crash of the execution.
Efficiency however is fundamental as you're working in an environment where a huge number of users need to run their jobs. In the past in peak moments some incautious allocations have prevented people from the possibility to use the system for quite a long time: if problems like this arise we could terminate jobs which present an unwise resource allocation.
To get information about the resources used by your job, maybe a few lines that can give you the idea of what you need before launching a longer execution, you can set the job file in order to receive an email with useful statistics: consult the chapter Slurm basics where the option --mail-type is explained.