Notifications Overview
The QCS HPC cluster provides automated notifications through Slack to keep you informed about important cluster events. All notifications are sent to the #qcs-infra-notification channel.
Channel Access
The #qcs-infra-notification Slack channel is private. To join the channel and receive notifications:
- Contact the QCS HPC Admin or QCS HPC Owner
- Request access to the #qcs-infra-notification channel
- Once added, you’ll receive all cluster notifications
Notification Types
1. Node Creation Events
Notifications are sent when compute nodes are created as part of auto-scaling operations.
Trigger: ParallelCluster auto-scaling creates new compute nodes Information Includes:
- Node name and type (CPU, GPU, etc.)
- Instance ID
- IP address
- Partition assigned
- Status (CONFIGURING, IDLE, etc.)
2. Job Submission and Completion Events
Notifications track the lifecycle of SLURM jobs submitted to the cluster.
Job Submission:
- Job ID
- Job name
- Submitting user
- Partition
- Resource requirements (nodes, CPUs, GPUs)
- Status: SUBMITTED
Job Completion:
- Job ID
- Completion status (COMPLETED, FAILED, CANCELLED, TIMEOUT)
- Execution time
- Exit code
- Associated compute nodes
3. CloudWatch Alarms
Automated alarms monitor the health and performance of the cluster infrastructure.
EC2 Instance Alarms:
- Instance status checks
- CPU utilization thresholds
- Network connectivity issues
- Storage capacity warnings
ParallelCluster Alarms:
- Cluster creation/deletion events
- Auto-scaling activities
- Node initialization failures
- Configuration errors
Notification Examples
Node Creation
🔔 Node Created
Cluster: odin-rnd-us
Node: cpu-dy-cpu-compute-5
Type: t3.xlarge
Instance ID: i-0123456789abcdef0
IP Address: 10.0.50.25
Partition: cpu
Status: CONFIGURING
Job Submitted
📋 Job Submitted
Job ID: 125
Job Name: my-analysis
User: username
Partition: gpu-inferencing
CPUs: 8
GPUs: 1
Nodes: 1
Status: SUBMITTED
Job Completed
✅ Job Completed
Job ID: 125
Job Name: my-analysis
Status: COMPLETED
Run Time: 45 minutes
Exit Code: 0
Nodes: gpu-inferencing-dy-gpu-inferencing-compute-2
Notification Settings
Notifications are automatically configured and cannot be disabled at the cluster level. To stop receiving individual notifications:
- Mute the #qcs-infra-notification channel in Slack
- Adjust your Slack notification preferences for that channel
- Contact QCS HPC Admin if you need to be removed from the channel
Support
For questions about notifications or to report issues:
- QCS HPC Admin: [contact information]
- Slack: Mention @qcs-hpc-admins in #qcs-infra-notification
- Email: [support email if available]
Related Topics
- SLURM Jobs - Job submission and management
- Cluster Access - Accessing the cluster
- Troubleshooting - Common issues and solutions