SLURM User Management

Adding Users to Accounts

Users are defined in odin/terraform/users.yaml and automatically synchronized to AWS Secrets Manager, which provisions them on the cluster.

1. Edit users.yaml

# odin/terraform/users.yaml
users:
  john_doe:
    name: "John Doe"
    email: "john.doe@roche.com"
    project: "odin"  # Project/account assignment
    
  jane_smith:
    name: "Jane Smith"
    email: "jane.smith@roche.com"
    project: "albus"  # Another project
    
  bob_jones:
    name: "Bob Jones"
    email: "bob.jones@roche.com"
    project: "qcs"  # CPU project

2. Commit and Push

cd odin/terraform
git add users.yaml
git commit -m "Add new users to cluster"
git push origin main

3. GitHub Action Triggers Automatically

The .github/workflows/update-users.yml workflow:

  1. Detects changes to odin/terraform/users.yaml
  2. Extracts user information and project assignments
  3. Creates/updates AWS Secrets Manager secrets with user config
  4. Cluster reads secrets and provisions users with correct account assignment

4. Verify User Creation

ssh headnode

# List all users
getent passwd | grep -v "root\|slurm\|daemon" | head -20

# Check user's account association
/opt/slurm/bin/sacctmgr show user john_doe

Manual User Creation (Temporary)

For immediate user provisioning (before next Terraform apply):

ssh headnode

# Add user to system
sudo useradd -m -s /bin/bash john_doe
sudo passwd john_doe

# Add user to SLURM account
sudo /opt/slurm/bin/sacctmgr add user john_doe account=odin

# Set as default account for user
sudo /opt/slurm/bin/sacctmgr modify user john_doe account=odin set default=odin

Managing User Account Assignments

List User Account Memberships

/opt/slurm/bin/sacctmgr show user john_doe

Example output:

User=john_doe
    Admin=None
    Account=albus (default), genius, odin
    Coords=

Add User to Additional Account

sudo /opt/slurm/bin/sacctmgr add user john_doe account=bali

Remove User from Account

sudo /opt/slurm/bin/sacctmgr delete user john_doe account=qcs

Change Default Account

sudo /opt/slurm/bin/sacctmgr modify user john_doe account=odin set default=odin

User Resource Limits

Set per-user limits within an account:

CPU Limits

sudo /opt/slurm/bin/sacctmgr modify user john_doe account=odin \
  set MaxCpus=96

Job Limits

sudo /opt/slurm/bin/sacctmgr modify user john_doe account=odin \
  set MaxJobs=20

Memory Limits

sudo /opt/slurm/bin/sacctmgr modify user john_doe account=odin \
  set MaxMemory=500000  # In MB

Viewing Limits

/opt/slurm/bin/sacctmgr show user john_doe WithAssoc format=User,Account,MaxCpus,MaxJobs

User Job Submission

Check Available Accounts

# As the user:
/opt/slurm/bin/squeue --account-list
# or
sacctmgr show user $(whoami) | grep Account

Submit Job to Specific Account

# Use default account
sbatch my_script.sh

# Use specific account
sbatch --account=odin my_script.sh

# Verify job was submitted to correct account
squeue -u $(whoami)

Listing Users by Account

All Users in Account

/opt/slurm/bin/sacctmgr show user account=odin

Users with Job History

sacct --accounts=odin \
      --format="User,Account,JobCount,CPUTime" \
      --group=user | sort

Active Users (With Recent Jobs)

sacct --starttime "$(date -d '7 days ago' '+%Y-%m-%d')" \
      --accounts=odin \
      --format="User,Account,State" \
      --parsable2 | awk -F'|' '!seen[$1]++' | sort -u

User Onboarding Checklist

  • User account created in AWS Secrets Manager (via users.yaml)
  • User SSH key uploaded to headnode
  • User added to SLURM account
  • User’s default account set correctly
  • User resource limits configured (if needed)
  • User can SSH to headnode
  • User can submit test job
  • User can view job in squeue
  • Job completes and appears in sacct

User Offboarding

Remove User from All Accounts

ssh headnode

# Delete from SLURM (removes from all accounts)
sudo /opt/slurm/bin/sacctmgr delete user john_doe

# Remove from system (optional, may want to keep for historical job records)
# sudo userdel -r john_doe

Remove from Specific Account

sudo /opt/slurm/bin/sacctmgr delete user john_doe account=odin

Delegated Account Administration

Create admin users who can manage their account without full sudo access:

# Grant admin rights to user for specific account
sudo /opt/slurm/bin/sacctmgr modify user john_doe \
  set AdminLevel=Account account=odin

# Verify
/opt/slurm/bin/sacctmgr show user john_doe WithAssoc

Admin users can then:

# Add users to their account
sacctmgr add user new_user account=odin

# View account users
sacctmgr show user account=odin

# Modify user limits
sacctmgr modify user john_doe account=odin set MaxCpus=256

Troubleshooting

User cannot submit jobs

# Check if user exists in SLURM
sacctmgr show user john_doe

# Check user's accounts
sacctmgr show user john_doe WithAssoc

# Check if account has available resources
sacctmgr show account WithAssoc format=Account,MaxCpus,MaxJobs

Job rejected - account not found

# Verify account exists
sacctmgr show account my_account

# Verify user is in account
sacctmgr show user john_doe WithAssoc

SSH key not working

See SSH Setup documentation.

Next Steps