SLURM Account Management

Creating a New Account

Edit odin/terraform/main.tf and add to the slurm_accounts map:

slurm_accounts = {
  # ... existing accounts ...
  
  my_project = {
    name              = "my_project"
    description       = "My research project"
    max_cpus_per_user = 192  # Max CPUs per user in this account
    max_jobs_per_user = 50   # Max concurrent jobs per user
  }
}

# Also add to the slurm_projects list to ensure partition is created
slurm_projects = ["qcs", "albus", "bali", "genius", "odin", "my_project"]

Then deploy:

cd odin/terraform
terraform plan
terraform apply

Manual Creation on Cluster (Temporary)

SSH to headnode and use sacctmgr:

ssh headnode

# Create new account
sudo /opt/slurm/bin/sacctmgr add account my_project description="My research project"

# Verify creation
/opt/slurm/bin/sacctmgr show account my_project

# Set account limits (optional)
sudo /opt/slurm/bin/sacctmgr modify account my_project \
  set MaxCpus=192,MaxJobs=50

Modifying Account Settings

CPU Limits

sudo /opt/slurm/bin/sacctmgr modify account my_project \
  set MaxCpus=256

Job Limits

sudo /opt/slurm/bin/sacctmgr modify account my_project \
  set MaxJobs=75

Memory Limits

sudo /opt/slurm/bin/sacctmgr modify account my_project \
  set MaxMemory=1000000  # In MB

Listing Accounts

List all accounts

/opt/slurm/bin/sacctmgr show account

Example output:

Account        Descr  Org
---------- ---------- ----------
qcs        General c CPU
albus      GPU res
bali       GPU res
genius     GPU res
odin       ODIN pr

List with resource limits

/opt/slurm/bin/sacctmgr show account WithAssoc format=Account,Description,MaxCpus,MaxJobs

List account usage

sacct --starttime "$(date -d '30 days ago' '+%Y-%m-%d')" \
      --accounts=my_project \
      --format="Account,User,JobCount,CPUTime" \
      --parsable2 | awk -F'|' '
      NR>1 {
        acc=$1; user=$2; jobs=$3; cputime=$4;
        split(cputime, t, ":"); 
        if (length(t)==2) secs = t[1]*60 + t[2]; 
        else secs = t[1]*3600 + t[2]*60 + t[3];
        cpu_hours = secs/3600;
        
        total_jobs[acc]++;
        total_cpu_hours[acc] += cpu_hours;
      }
      END {
        for (a in total_jobs) 
          printf "%s: %d jobs, %.1f CPU-hours\n", a, total_jobs[a], total_cpu_hours[a];
      }'

Deleting an Account

Via Terraform

Remove from slurm_accounts map in odin/terraform/main.tf and slurm_projects list, then apply.

Manual Deletion

ssh headnode

# Delete account (removes all associated users)
sudo /opt/slurm/bin/sacctmgr delete account my_project

# Confirm deletion
/opt/slurm/bin/sacctmgr show account my_project

Cost Estimation

CPU-Hours Calculation

sacct --starttime "$(date -d '30 days ago' '+%Y-%m-%d')" \
      --accounts=my_project \
      --format="Account,CPUTime" \
      --parsable2 | awk -F'|' '
      NR>1 {
        cputime=$2;
        split(cputime, t, ":"); 
        if (length(t)==2) secs = t[1]*60 + t[2]; 
        else secs = t[1]*3600 + t[2]*60 + t[3];
        cpu_hours += secs/3600;
      }
      END {
        printf "Total CPU-hours: %.1f\n", cpu_hours;
        printf "Est. cost @ $0.40/CPU-hour: $%.2f\n", cpu_hours * 0.40;
      }'

GPU Usage Estimation

# Check GPU-hours by account
sacct --starttime "$(date -d '30 days ago' '+%Y-%m-%d')" \
      --accounts=my_project \
      --format="Account,AllocGRES,CPUTime" \
      --parsable2 | grep gpu | head -10

Account Hierarchy

SLURM accounts support parent-child relationships:

# Create parent account
sudo /opt/slurm/bin/sacctmgr add account parent_account

# Create child account
sudo /opt/slurm/bin/sacctmgr add account child_account parent=parent_account

# Set parent limits that apply to all children
sudo /opt/slurm/bin/sacctmgr modify account parent_account \
  set MaxCpus=500

Troubleshooting

Cannot create account

# Verify slurmdbd is running
systemctl is-active slurmdbd

# Check database connectivity
mysql -u root slurm_acct_db -e "SELECT 1;" 2>&1

Account created but users can’t submit jobs

  • Verify users are associated with the account (see User Management)
  • Check account MaxCpus limit: sacctmgr show account WithAssoc
  • Check partition is accepting jobs: sinfo

Next Steps