SLURM Accounting Overview
The ODIN cluster uses SLURM’s built-in accounting system to track job execution, resource usage, and project-level billing across multiple accounts.
Key Concepts
Accounts vs Projects
- Account: A billing/tracking unit (qcs, albus, bali, genius, odin)
- User: Individual user assigned to one or more accounts
- Project: Internal grouping within ODIN for resource tracking
Accounts Configured
| Account | Purpose | Type | Nodes |
|---|---|---|---|
qcs |
General purpose CPU work | C7i.8xlarge | 10 nodes (32 CPUs, 62GB RAM each) |
albus |
GPU research | P5.48xlarge | 2 nodes (192 CPUs, 8x H100, 2TB RAM each) |
bali |
GPU research | P5.48xlarge | 2 nodes (192 CPUs, 8x H100, 2TB RAM each) |
genius |
GPU research | P5.48xlarge | 2 nodes (192 CPUs, 8x H100, 2TB RAM each) |
odin |
ODIN project | P5.48xlarge | 2 nodes (192 CPUs, 8x H100, 2TB RAM each) |
Accounting Architecture
┌─────────────────────────────────────────────────┐
│ SLURM Controller (Headnode) │
│ - Job scheduling │
│ - Resource allocation │
│ - Accounting log file (accounting.log) │
└─────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ SLURM Database Daemon (slurmdbd) │
│ - Port: 6819 (localhost) │
│ - Syncs accounting logs to MariaDB │
│ - Sync interval: ~5 minutes │
└─────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ MariaDB Accounting Database │
│ - Database: slurm_acct_db │
│ - Tables: jobs_table, accounts_table, etc. │
│ - Query interface: sreport │
└─────────────────────────────────────────────────┘
What Is Tracked
Allocated Resources (At Job Submission)
- CPUs: Number of CPUs allocated to the job
- Memory: Amount of memory requested
- GPUs: Number and type of GPUs requested (if any)
- Walltime: Maximum allowed job duration
Execution Information
- Start Time: When job actually started
- End Time: When job completed
- Elapsed Time: Actual runtime
- State: Final job status (COMPLETED, FAILED, TIMEOUT, CANCELLED, etc.)
- Exit Code: Job’s exit code
NOT Directly Tracked
- Actual CPU %: Real CPU utilization percentage
- Actual Memory %: Real memory utilization
- Actual GPU %: Real GPU utilization percentage
Note: For actual utilization metrics, use CloudWatch, NVIDIA GPU monitoring tools, or job profiling tools.
Tools for Accounting
sacct - Direct Access
- Reads accounting logs directly
- Immediate availability for recently completed jobs
- Per-job detailed information
- Command:
sacct [options]
sreport - Aggregated Reports
- Queries the accounting database
- Provides summary/aggregate views
- ~5 minute sync delay from job completion
- Better for historical analysis and trends
- Command:
sreport [report-type] [options]
Common Use Cases
Check Current Usage by Account
sacct --starttime "$(date -d '30 days ago' '+%Y-%m-%d')" \
--accounts=qcs,albus,bali,genius,odin \
--format="Account,User,JobCount,CPUTime,AllocCPUS"
Track GPU Usage
sacct --starttime "$(date -d '7 days ago' '+%Y-%m-%d')" \
--accounts=albus,bali,genius,odin \
--format="Account,User,JobName,AllocGRES,CPUTime,State"
Estimate Costs by Account
# See Usage Tracking section for cost estimation formulas
Getting Help
See the related sections:
- Setup & Architecture - Infrastructure setup and configuration
- Account Management - Create and manage accounts
- User Management - Manage user access to accounts
- Usage Tracking - Query and analyze usage patterns