SLURM Accounting Overview

The ODIN cluster uses SLURM’s built-in accounting system to track job execution, resource usage, and project-level billing across multiple accounts.

Key Concepts

Accounts vs Projects

  • Account: A billing/tracking unit (qcs, albus, bali, genius, odin)
  • User: Individual user assigned to one or more accounts
  • Project: Internal grouping within ODIN for resource tracking

Accounts Configured

Account Purpose Type Nodes
qcs General purpose CPU work C7i.8xlarge 10 nodes (32 CPUs, 62GB RAM each)
albus GPU research P5.48xlarge 2 nodes (192 CPUs, 8x H100, 2TB RAM each)
bali GPU research P5.48xlarge 2 nodes (192 CPUs, 8x H100, 2TB RAM each)
genius GPU research P5.48xlarge 2 nodes (192 CPUs, 8x H100, 2TB RAM each)
odin ODIN project P5.48xlarge 2 nodes (192 CPUs, 8x H100, 2TB RAM each)

Accounting Architecture

┌─────────────────────────────────────────────────┐
│        SLURM Controller (Headnode)              │
│  - Job scheduling                               │
│  - Resource allocation                          │
│  - Accounting log file (accounting.log)          │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│    SLURM Database Daemon (slurmdbd)             │
│  - Port: 6819 (localhost)                       │
│  - Syncs accounting logs to MariaDB             │
│  - Sync interval: ~5 minutes                    │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│    MariaDB Accounting Database                  │
│  - Database: slurm_acct_db                      │
│  - Tables: jobs_table, accounts_table, etc.     │
│  - Query interface: sreport                     │
└─────────────────────────────────────────────────┘

What Is Tracked

Allocated Resources (At Job Submission)

  • CPUs: Number of CPUs allocated to the job
  • Memory: Amount of memory requested
  • GPUs: Number and type of GPUs requested (if any)
  • Walltime: Maximum allowed job duration

Execution Information

  • Start Time: When job actually started
  • End Time: When job completed
  • Elapsed Time: Actual runtime
  • State: Final job status (COMPLETED, FAILED, TIMEOUT, CANCELLED, etc.)
  • Exit Code: Job’s exit code

NOT Directly Tracked

  • Actual CPU %: Real CPU utilization percentage
  • Actual Memory %: Real memory utilization
  • Actual GPU %: Real GPU utilization percentage

Note: For actual utilization metrics, use CloudWatch, NVIDIA GPU monitoring tools, or job profiling tools.

Tools for Accounting

sacct - Direct Access

  • Reads accounting logs directly
  • Immediate availability for recently completed jobs
  • Per-job detailed information
  • Command: sacct [options]

sreport - Aggregated Reports

  • Queries the accounting database
  • Provides summary/aggregate views
  • ~5 minute sync delay from job completion
  • Better for historical analysis and trends
  • Command: sreport [report-type] [options]

Common Use Cases

Check Current Usage by Account

sacct --starttime "$(date -d '30 days ago' '+%Y-%m-%d')" \
      --accounts=qcs,albus,bali,genius,odin \
      --format="Account,User,JobCount,CPUTime,AllocCPUS"

Track GPU Usage

sacct --starttime "$(date -d '7 days ago' '+%Y-%m-%d')" \
      --accounts=albus,bali,genius,odin \
      --format="Account,User,JobName,AllocGRES,CPUTime,State"

Estimate Costs by Account

# See Usage Tracking section for cost estimation formulas

Getting Help

See the related sections: