S3 Integration

FSx Lustre filesystems are integrated with S3 buckets via Data Repository Associations (DRAs). This enables seamless data flow between cloud storage and high-performance compute.

How It Works

sequenceDiagram
    participant User
    participant S3 as S3 Bucket
    participant FSx as FSx Lustre
    participant Compute as Compute Node

    Note over S3,FSx: Auto-Import
    S3->>FSx: New file uploaded to S3
    FSx->>FSx: Creates metadata stub

    Note over User,Compute: Data Access
    User->>Compute: Read file from /mnt/odin
    Compute->>FSx: Request file data
    FSx->>S3: Fetch data on-demand
    S3->>FSx: Return file data
    FSx->>Compute: Return data
    Compute->>User: Display/process file

    Note over FSx,S3: Auto-Export
    Compute->>FSx: Write new file
    FSx->>S3: Auto-export to S3

Copying Data via SSH

Copy files from your local machine to FSx mounts:

With SSH config:

# Copy to /mnt/odin
scp <local-file> login1:/mnt/odin/qcs-training-data/

# Copy to /mnt/qcs
scp <local-file> login1:/mnt/qcs/qcs-bali-dev-ingest/

Without SSH config:

scp -o ProxyJump=YOUR_USERNAME@44.248.2.14 \
  <local-file> \
  YOUR_USERNAME@login1.odin.cluster.local:/mnt/odin/qcs-training-data/

Uploading Directly to S3

Upload files directly to S3 buckets using AWS CLI or Console:

Legacy Data Buckets

aws s3 cp <local-file> s3://qcs-training-data/
aws s3 cp <local-file> s3://produsw2-bora-dev-data-kamino/
# Ingest buckets
aws s3 cp <local-file> s3://qcs-bali-dev-ingest/
aws s3 cp <local-file> s3://qcs-odin-dev-ingest/

# Output buckets
aws s3 cp <local-file> s3://qcs-bali-dev-output/
aws s3 cp <local-file> s3://qcs-odin-dev-output/

Kamino Output Buckets

aws s3 cp <local-file> s3://prod-usw2-redcap-balid1755804778494-supportive-data-s3/redcap/balid1755804778494/output/
aws s3 cp <local-file> s3://prod-usw2-redcap-odinq1728587884189-supportive-data-s3/redcap/odinq1728587884189/output/

Files uploaded to S3 will automatically appear in the corresponding FSx directory.

S3 to FSx Bucket Mapping

/mnt/odin (12TB)

S3 Bucket FSx Path
qcs-training-data /mnt/odin/qcs-training-data/
produsw2-bora-dev-data-kamino /mnt/odin/produsw2-bora-dev-data-kamino/
az-new-legacy-data /mnt/odin/az-new-legacy-data/
az-bora-legacy-data-transfer /mnt/odin/az-bora-legacy-data-transfer/

/mnt/qcs (1.2TB)

S3 Bucket FSx Path
qcs-bali-dev-ingest /mnt/qcs/qcs-bali-dev-ingest/
qcs-bali-dev-output /mnt/qcs/qcs-bali-dev-output/
qcs-cldn-gea-dev-ingest /mnt/qcs/qcs-cldn-gea-dev-ingest/
qcs-cldn-gea-dev-output /mnt/qcs/qcs-cldn-gea-dev-output/
qcs-folr1-ov-dev-ingest /mnt/qcs/qcs-folr1-ov-dev-ingest/
qcs-folr1-ov-dev-output /mnt/qcs/qcs-folr1-ov-dev-output/
qcs-odin-dev-ingest /mnt/qcs/qcs-odin-dev-ingest/
qcs-odin-dev-output /mnt/qcs/qcs-odin-dev-output/

Best Practices

  1. Large file uploads: Use S3 directly with multipart upload for files > 5GB
  2. Bulk transfers: Use aws s3 sync for directory synchronization
  3. Compute jobs: Read/write via FSx mounts for best performance
  4. External sharing: Use S3 presigned URLs or bucket policies

Verifying Sync Status

Check if a file has synced to S3:

# Check HSM state
lfs hsm_state /mnt/odin/qcs-training-data/myfile.txt

# Verify in S3
aws s3 ls s3://qcs-training-data/myfile.txt