Queues & Partitions

The Odin cluster has multiple SLURM partitions (queues) configured with dynamic scaling.

Partition Summary

Queue	Instance Type	vCPUs	Memory	GPUs	Max Nodes
cpu	c7i.8xlarge	32	~61 GB	None	10
gpu-inferencing	g5.8xlarge	32	~122 GB	1× A10G	5
odin	p5.48xlarge	192	~1.9 TB	8× H100	2
albus	p5.48xlarge	192	~1.9 TB	8× H100	2
bali	p5.48xlarge	192	~1.9 TB	8× H100	2
genius	p5.48xlarge	192	~1.9 TB	8× H100	2

Optimized for compute-intensive CPU workloads.

Property	Value
Instance	c7i.8xlarge
vCPUs per node	32
Memory per node	~61 GB
Max nodes	10 (320 total vCPUs)
Use cases	Data preprocessing, CPU training, batch processing

Submit a CPU job:

sbatch --partition=cpu --nodes=1 --ntasks=8 my-cpu-job.sh

Cost-effective GPU option for inference and smaller training jobs.

Property	Value
Instance	g5.8xlarge
vCPUs per node	32
Memory per node	~122 GB
GPU	1× NVIDIA A10G (24GB VRAM)
Max nodes	5 (5 GPUs total)
Use cases	Inference, small training jobs

Submit a GPU inference job:

sbatch --partition=gpu-inferencing --nodes=1 --gres=gpu:1 my-inference-job.sh

High-performance partitions for distributed training and large model training.

Property	Value
Instance	p5.48xlarge
vCPUs per node	192
Memory per node	~1.9 TB
GPUs	8× NVIDIA H100 (80GB VRAM each)
Interconnect	High-bandwidth NVLink
Max nodes	2 per partition (16 H100s each)
Use cases	Large-scale training, distributed training

Submit an H100 job:

sbatch --partition=odin --nodes=1 --gres=gpu:8 my-training-job.sh

Multi-node H100 job:

sbatch --partition=odin --nodes=2 --gres=gpu:8 --ntasks-per-node=8 distributed-training.sh

View all partitions:

sinfo

Detailed partition info:

sinfo -o '%P %.5D %.6t %.10l %.6c %.6G %.8m %N'

View specific partition:

sinfo -p odin
sinfo -p gpu-inferencing

Workload Type	Recommended Partition
Data preprocessing	cpu
Small model training	cpu or gpu-inferencing
Model inference	gpu-inferencing
Large model training	odin, albus, bali, or genius
Multi-GPU distributed training	odin, albus, bali, or genius
Hyperparameter search	cpu (parallel runs)