Odin HPC Cluster Documentation
Welcome to the Odin HPC (High Performance Computing) cluster documentation. This guide will help you get started with the cluster and make the most of its capabilities.
Quick Links
| Resource | Description |
|---|---|
| Getting Started | New user setup and SSH configuration |
| Cluster Access | Connect to login nodes and data managers |
| Storage Systems | FSx Lustre, S3 integration, and data management |
| SLURM Jobs | Submit and manage compute jobs |
Cluster Overview
The Odin HPC cluster provides:
- High-performance compute with CPU and GPU nodes
- NVIDIA H100 GPUs for AI/ML workloads (p5.48xlarge instances)
- NVIDIA A10G GPUs for inference (g5.8xlarge instances)
- FSx Lustre storage with automatic S3 synchronization
- GxP-compliant data management workflows
- SLURM scheduler for job management
Infrastructure Highlights
graph TB
User[User Workstation] -->|SSH via VPN| JH[Jump Host]
JH --> LN1[Login Node 1]
JH --> LN2[Login Node 2]
JH --> DML[Data Manager Linux]
JH --> DMW[Data Manager Windows]
LN1 --> SLURM[SLURM Controller]
LN2 --> SLURM
SLURM --> CPU[CPU Nodes<br/>c7i.8xlarge]
SLURM --> GPU[GPU Inference<br/>g5.8xlarge]
SLURM --> H100[H100 Nodes<br/>p5.48xlarge]
subgraph Storage
FSX1["/mnt/odin<br/>12TB"]
FSX2["/mnt/qcs<br/>1.2TB"]
FSX3["/mnt/gxp<br/>1.2TB"]
S3[(S3 Buckets)]
end
LN1 --> FSX1
LN1 --> FSX2
DML --> FSX3
FSX1 <-->|Auto Sync| S3
FSX2 <-->|Auto Sync| S3
Key Resources
| Component | Instance Type | Resources |
|---|---|---|
| Jump Host | c6i.xlarge | 4 vCPU, 8GB RAM |
| Login Nodes | c6i.2xlarge | 8 vCPU, 16GB RAM |
| CPU Compute | c7i.8xlarge | 32 vCPU, 61GB RAM |
| GPU Inference | g5.8xlarge | 32 vCPU, 122GB RAM, 1× A10G |
| H100 Training | p5.48xlarge | 192 vCPU, 1.9TB RAM, 8× H100 |
Getting Help
- Check this documentation
- Review your SSH and user configuration
- Contact the Odin infrastructure team on Slack: #qcs-infra-notification
New User? Start with the Getting Started Guide to set up your access.