Queues & Partitions
The Odin cluster has multiple SLURM partitions (queues) configured with dynamic scaling.
Partition Summary
| Queue | Instance Type | vCPUs | Memory | GPUs | Max Nodes |
|---|---|---|---|---|---|
| cpu | c7i.8xlarge | 32 | ~61 GB | None | 10 |
| gpu-inferencing | g5.8xlarge | 32 | ~122 GB | 1× A10G | 5 |
| odin | p5.48xlarge | 192 | ~1.9 TB | 8× H100 | 2 |
| albus | p5.48xlarge | 192 | ~1.9 TB | 8× H100 | 2 |
| bali | p5.48xlarge | 192 | ~1.9 TB | 8× H100 | 2 |
| genius | p5.48xlarge | 192 | ~1.9 TB | 8× H100 | 2 |
CPU Partition (Default)
Optimized for compute-intensive CPU workloads.
| Property | Value |
|---|---|
| Instance | c7i.8xlarge |
| vCPUs per node | 32 |
| Memory per node | ~61 GB |
| Max nodes | 10 (320 total vCPUs) |
| Use cases | Data preprocessing, CPU training, batch processing |
Submit a CPU job:
sbatch --partition=cpu --nodes=1 --ntasks=8 my-cpu-job.sh
GPU Inferencing Partition
Cost-effective GPU option for inference and smaller training jobs.
| Property | Value |
|---|---|
| Instance | g5.8xlarge |
| vCPUs per node | 32 |
| Memory per node | ~122 GB |
| GPU | 1× NVIDIA A10G (24GB VRAM) |
| Max nodes | 5 (5 GPUs total) |
| Use cases | Inference, small training jobs |
Submit a GPU inference job:
sbatch --partition=gpu-inferencing --nodes=1 --gres=gpu:1 my-inference-job.sh
H100 Partitions (odin, albus, bali, genius)
High-performance partitions for distributed training and large model training.
| Property | Value |
|---|---|
| Instance | p5.48xlarge |
| vCPUs per node | 192 |
| Memory per node | ~1.9 TB |
| GPUs | 8× NVIDIA H100 (80GB VRAM each) |
| Interconnect | High-bandwidth NVLink |
| Max nodes | 2 per partition (16 H100s each) |
| Use cases | Large-scale training, distributed training |
Submit an H100 job:
sbatch --partition=odin --nodes=1 --gres=gpu:8 my-training-job.sh
Multi-node H100 job:
sbatch --partition=odin --nodes=2 --gres=gpu:8 --ntasks-per-node=8 distributed-training.sh
Viewing Queue Status
View all partitions:
sinfo
Detailed partition info:
sinfo -o '%P %.5D %.6t %.10l %.6c %.6G %.8m %N'
View specific partition:
sinfo -p odin
sinfo -p gpu-inferencing
Queue Selection Guide
| Workload Type | Recommended Partition |
|---|---|
| Data preprocessing | cpu |
| Small model training | cpu or gpu-inferencing |
| Model inference | gpu-inferencing |
| Large model training | odin, albus, bali, or genius |
| Multi-GPU distributed training | odin, albus, bali, or genius |
| Hyperparameter search | cpu (parallel runs) |
Notes
- Default Queue: Jobs without
--partitionuse thecpuqueue - GPU Resources: Always request GPUs with
--gres=gpu:N - Job Priority: All partitions have equal priority (PriorityJobFactor=1)
- Time Limits: No explicit default, but good practice to set one