Cluster Access

This guide covers how to connect to the various components of the Odin HPC cluster.

Quick Connection Reference

# Jump host (gateway)
ssh jump-host

# Login nodes (recommended for interactive work)
ssh login1
ssh login2

# Data managers
ssh data-manager-linux
ssh data-manager-windows

⚠️ Important: Use login nodes for all interactive SSH sessions, job submission, and VS Code Remote connections. The headnode is reserved for SLURM scheduler operations only and has limited memory (4GB). VS Code Remote sessions on the headnode can cause cluster-wide outages.

Infrastructure Details

Jump Host

Property Value
DNS jump.odin.navify.com
Instance Type c6i.xlarge
vCPUs 4
Memory 8 GB
Network 12.5 Gbps
Purpose Secure gateway for SSH access

Login Nodes (login1, login2)

Property Value
DNS login1.odin.cluster.local, login2.odin.cluster.local
Instance Type c6i.2xlarge
vCPUs 8
Memory 16 GB
Network 12.5 Gbps
Purpose User SSH access, job submission, VS Code Remote
FSx Mounts /mnt/odin, /mnt/qcs, /mnt/gxp

Head Node (SLURM Controller)

Property Value
Instance Type c6i.xlarge
vCPUs 4
Memory 4 GB
Purpose SLURM scheduler and job management only

⚠️ DO NOT SSH directly to headnode - Use login nodes instead

Direct Connection from Jump Host

If you’re already on the jump host, connect directly using stable hostnames:

# Login nodes (recommended)
ssh YOUR_USERNAME@login1.odin.cluster.local
ssh YOUR_USERNAME@login2.odin.cluster.local

# Data managers
ssh YOUR_USERNAME@data-manager-linux.odin.cluster.local
ssh Administrator@data-manager-windows.odin.cluster.local

These *.odin.cluster.local hostnames provide stable addresses that won’t change when instances are rebuilt.

DNS Resolution

The jump host is configured with VPC DNS resolver, which automatically resolves private Route53 zones. If you experience DNS issues, verify the configuration:

ssh jump-host "resolvectl status"
ssh jump-host "nslookup login1.odin.cluster.local"
ssh jump-host "nslookup login2.odin.cluster.local"
ssh jump-host "nslookup data-manager-linux.odin.cluster.local"
ssh jump-host "nslookup data-manager-windows.odin.cluster.local"

The jump host should be using VPC DNS (10.0.0.2) as the primary DNS server.

Troubleshooting

Check if Instances are Running

# From your local machine
cd /path/to/odin-infra/odin/terraform
terraform output odin_pcluster_headnode
terraform output data_manager_linux_private_ip
terraform output data_manager_windows_private_ip

Test Connectivity from Jump Host

# Test login nodes
ssh jump-host "ping -c 3 login1.odin.cluster.local"
ssh jump-host "ssh -o StrictHostKeyChecking=no YOUR_USERNAME@login1.odin.cluster.local 'hostname'"

# Test data managers
ssh jump-host "ping -c 3 data-manager-linux.odin.cluster.local"
ssh jump-host "ping -c 3 data-manager-windows.odin.cluster.local"

Verify User Account

ssh jump-host "id YOUR_USERNAME"
ssh login1 "id YOUR_USERNAME"
ssh data-manager-linux "id YOUR_USERNAME"

Next Steps