Odin HPC Cluster Documentation

Welcome to the Odin HPC (High Performance Computing) cluster documentation. This guide will help you get started with the cluster and make the most of its capabilities.

Resource Description
Getting Started New user setup and SSH configuration
Cluster Access Connect to login nodes and data managers
Storage Systems FSx Lustre, S3 integration, and data management
SLURM Jobs Submit and manage compute jobs

Cluster Overview

The Odin HPC cluster provides:

  • High-performance compute with CPU and GPU nodes
  • NVIDIA H100 GPUs for AI/ML workloads (p5.48xlarge instances)
  • NVIDIA A10G GPUs for inference (g5.8xlarge instances)
  • FSx Lustre storage with automatic S3 synchronization
  • GxP-compliant data management workflows
  • SLURM scheduler for job management

Infrastructure Highlights

graph TB
    User[User Workstation] -->|SSH via VPN| JH[Jump Host]
    JH --> LN1[Login Node 1]
    JH --> LN2[Login Node 2]
    JH --> DML[Data Manager Linux]
    JH --> DMW[Data Manager Windows]
    LN1 --> SLURM[SLURM Controller]
    LN2 --> SLURM
    SLURM --> CPU[CPU Nodes<br/>c7i.8xlarge]
    SLURM --> GPU[GPU Inference<br/>g5.8xlarge]
    SLURM --> H100[H100 Nodes<br/>p5.48xlarge]
    subgraph Storage
        FSX1["/mnt/odin<br/>12TB"]
        FSX2["/mnt/qcs<br/>1.2TB"]
        FSX3["/mnt/gxp<br/>1.2TB"]
        S3[(S3 Buckets)]
    end
    LN1 --> FSX1
    LN1 --> FSX2
    DML --> FSX3
    FSX1 <-->|Auto Sync| S3
    FSX2 <-->|Auto Sync| S3

Key Resources

Component Instance Type Resources
Jump Host c6i.xlarge 4 vCPU, 8GB RAM
Login Nodes c6i.2xlarge 8 vCPU, 16GB RAM
CPU Compute c7i.8xlarge 32 vCPU, 61GB RAM
GPU Inference g5.8xlarge 32 vCPU, 122GB RAM, 1× A10G
H100 Training p5.48xlarge 192 vCPU, 1.9TB RAM, 8× H100

Getting Help

  1. Check this documentation
  2. Review your SSH and user configuration
  3. Contact the Odin infrastructure team on Slack: #qcs-infra-notification

New User? Start with the Getting Started Guide to set up your access.