Odin HPC Cluster Documentation

Welcome to the Odin HPC (High Performance Computing) cluster documentation. This guide will help you get started with the cluster and make the most of its capabilities.

Quick Links

Resource	Description
Getting Started	New user setup and SSH configuration
Cluster Access	Connect to login nodes and data managers
Storage Systems	FSx Lustre, S3 integration, and data management
SLURM Jobs	Submit and manage compute jobs

Cluster Overview

The Odin HPC cluster provides:

High-performance compute with CPU and GPU nodes
NVIDIA H100 GPUs for AI/ML workloads (p5.48xlarge instances)
NVIDIA A10G GPUs for inference (g5.8xlarge instances)
FSx Lustre storage with automatic S3 synchronization
GxP-compliant data management workflows
SLURM scheduler for job management

Infrastructure Highlights

graph TB
    User[User Workstation] -->|SSH via VPN| JH[Jump Host]
    JH --> LN1[Login Node 1]
    JH --> LN2[Login Node 2]
    JH --> DML[Data Manager Linux]
    JH --> DMW[Data Manager Windows]
    LN1 --> SLURM[SLURM Controller]
    LN2 --> SLURM
    SLURM --> CPU[CPU Nodes<br/>c7i.8xlarge]
    SLURM --> GPU[GPU Inference<br/>g5.8xlarge]
    SLURM --> H100[H100 Nodes<br/>p5.48xlarge]
    subgraph Storage
        FSX1["/mnt/odin<br/>12TB"]
        FSX2["/mnt/qcs<br/>1.2TB"]
        FSX3["/mnt/gxp<br/>1.2TB"]
        S3[(S3 Buckets)]
    end
    LN1 --> FSX1
    LN1 --> FSX2
    DML --> FSX3
    FSX1 <-->|Auto Sync| S3
    FSX2 <-->|Auto Sync| S3

Key Resources

Component	Instance Type	Resources
Jump Host	c6i.xlarge	4 vCPU, 8GB RAM
Login Nodes	c6i.2xlarge	8 vCPU, 16GB RAM
CPU Compute	c7i.8xlarge	32 vCPU, 61GB RAM
GPU Inference	g5.8xlarge	32 vCPU, 122GB RAM, 1× A10G
H100 Training	p5.48xlarge	192 vCPU, 1.9TB RAM, 8× H100

Getting Help

Check this documentation
Review your SSH and user configuration
Contact the Odin infrastructure team on Slack: #qcs-infra-notification

New User? Start with the Getting Started Guide to set up your access.