Why is my job not running?
The answer depends on a lot of factors, but in many cases the job is put in a PENDING state if there is not enough free resources for it.
How can I get my job start earlier?
If there are enough free resources, your job will start in few seconds. If the machine is intensively used by other users (of by your jobs), Slurm will execute your job as soon as there are enough free resources. Make sure you only request the resource you need: the more you ask, the longer you and other users will wait.
Why was my job killed?
The answer depends on a lot of factors, but in most cases the job is killed if it exceeds the resource limit specified in the job file (run time and/or memory).
For instance, an error like this srun: error: gpu: task 0: Bus error (core dumped)
is usually due to a lack of RAM in the execution of the job.
Sometimes a job can be terminated if its allocation of resources is incorrect and this is preventing other users to rightfully access the cluster: that could have been possible by expliciting an adequate amount of resources in terms of time, GPUs, CPUs and RAM. Please refer the dedicated sections of this guide.