/opt/tivoli~30GB/sp/inst1~5GB/sp/db/db00{1..4}4 × 1TB/sp/actlog256GB/sp/archlog1TBDCPOOL_PRIMARY dirs~270TBCCPOOL_ARCHIVE bucketscalableWhy DRAID6 over classic RAID6: on all-flash V7000, distributed RAID rebuilds spread the rebuild I/O across every drive in the array rather than hammering a single hot-spare. Rebuild time on a failed 20TB SSD drops from ~20+ hours to a couple of hours. Performance during rebuild stays close to baseline. The two distributed spare drives mean zero "swap a drive" panic — failed drives can be replaced at the next maintenance window.
Why 16 DCPOOL directories: directory-container pools parallelise I/O across their directories. 16 dirs gives 16 concurrent ingest streams without contention. Smaller filesystem per directory also keeps individual fsck/scan times manageable. Directories should be on separate filesystems if you want true parallelism — or at minimum on the same filesystem with separate top-level paths if the underlying array can handle the parallelism (DRAID6 on SSD can).
noatime: always set on SP filesystems. Without it, every read updates the access timestamp, generating unnecessary write I/O on filesystems that may have hundreds of millions of files. noatime can be a 10-20% performance gain on busy DCPOOL directories.
Why XFS over ext4 for SP: XFS scales much better to hundreds of millions of files in a single filesystem (DCPOOL directories), handles large metadata operations more efficiently, and is the IBM-tested filesystem for both SP DB and container pool directories. Ext4 works but isn't the IBM-blueprint choice.
Do NOT mount DCPOOL directories on the same filesystem as the SP DB. If the DB filesystem fills, the SP server halts. Keeping DCPOOL on separate filesystems (or at minimum a separate set of LUNs from DB) prevents one capacity event from taking down both data and metadata simultaneously.
| Feature | Effect |
|---|---|
| Inline dedup | Variable-block dedup at ingest — typical 2-4× reduction across mixed AIX workloads |
| Inline compression | Stacks on top of dedup — adds another 1.5-2× for compressible data |
| ENCRYPT=YES | AES-256 at rest. Server-managed keys; KMIP-managed available later |
| No reclamation | Containers manage their own space; no nightly RECLAIM jobs to schedule |
| No volume defs | No DEFINE VOLUME / no scratch counts — capacity = sum of directory FS sizes |
Once ENCRYPT=YES is set, it cannot be turned off without rebuilding the pool. All data ingested is encrypted with the server's master key. Confirm you have a master key backup procedure (BACKUP KEYS command) before enabling, or you risk locking yourself out of your own data after a server rebuild.
Why HMAC credentials and not IAM tokens: SP authenticates to COS via the S3 protocol, which requires HMAC access/secret pairs. IAM bearer tokens (the COS default for newer credential types) are not supported by the S3 protocol path. When generating service credentials in the COS portal, tick "Include HMAC Credential" — without it, the secret key field will be missing and the stgpool will refuse to come online.
ENCRYPTIONTYPE=AES is doubled with bucket-side encryption — that's fine and recommended. SP encrypts at the object level before upload (server-side master key). COS additionally encrypts at rest. Two layers protect against both transport-level interception and a leaked COS service credential being usable to read clear data.
Network bandwidth note: sustained tier traffic to COS is bound by the SP server's outbound link. Plan for 100-300 Mbit/s sustained during heavy tiering windows (e.g. month-end when monthly fulls are 30 days old and start tiering en masse). If the SP server is on a 1Gbit link shared with backup ingest, schedule tiering windows outside backup windows — see Phase 5.
| Concept | Behaviour |
|---|---|
| DAYS=30 | Data not accessed/written for 30+ days qualifies for tiering |
| Inactive only | Active backups never tier — only inactive versions move |
| Restore behaviour | Tiered data is fully online — restores fetch on demand from COS, transparent to client |
| No re-hydration | Once moved, data does not auto-promote back to SSD on access |
Why 30 days specifically: matches the most aggressive customer retention (Bluechip's MC_FILE_30D file-level incremental). Their daily incrementals expire around the time they'd otherwise tier — almost no daily data ever reaches COS, which is the right outcome (small, frequently-accessed files stay on fast tier). Monthly and yearly fulls all tier. If you have a customer with sub-30-day fulls (e.g. weekly-fulls-kept-2-weeks), drop the rule to 14 days for that customer-pattern.
RUN STGRULE PREVIEW=YES is the safe first move. It lists candidate objects without moving any data. Always run it on the first activation against a populated DCPOOL — gives you a count of GB-to-be-moved so you can size the COS bucket and outbound bandwidth window appropriately.
Two DB backup paths is non-negotiable. The SP DB backup is the only thing that can recover the server after total loss. Local-only backups die with the server during a site disaster. Cloud-only backups extend RTO significantly (have to download ~3-5TB before you can start a restore). Both targets give you fast local recovery for the common case (DB corruption) and survivable off-site copies for the catastrophic case (site loss).
DRMDBBACKUPEXPIREDAYS=7: keeps 7 daily fulls. SP automatically expires older DB backups from QUERY VOLHISTORY at this threshold. Set higher (14-30) if you want longer recovery point options, lower (3-5) if DB backup capacity is tight.
| Layer | Per-customer | Shared |
|---|---|---|
| Domain | BLUECHIP_DOM, ACME_DOM, … | — |
| Mgmt classes | MC_FILE_30D, MC_MFULL_3M, … | — |
| Cloptsets | COPT_FILE, COPT_MFULL, … | — |
| Schedules | SCH_DAILY_INCR, SCH_MONTHLY_FULL, … | — |
| Nodes | BC0X_FILE, AC0X_FILE, … | — |
| Storage pool | — | DCPOOL_PRIMARY |
| Tiering / archive | — | CCPOOL_ARCHIVE (via STGRULE) |
Per-customer reporting still works: use QUERY OCCUPANCY node_name=BC0* or SELECT … FROM occupancy WHERE node_name LIKE 'BC0%' to get per-customer space attribution. The shared pool stores the bytes once (deduped), but SP tracks logical occupancy per node, so chargeback and capacity reports remain customer-specific.
For new customer onboarding going forward: the Bluechip-style runbook still applies, but Phase 2 (storage foundation) becomes a no-op — you skip DEFINE DEVCLASS and DEFINE STGPOOL entirely. Every new customer's copy groups go straight to DESTINATION=DCPOOL_PRIMARY and inherit the existing tiering rule automatically.
The 4 numbers to watch daily: (1) DCPOOL_PRIMARY pct_utilized — should hover steady-state ~50-70% if tiering is keeping up. Climbing past 80% means tiering is falling behind ingest. (2) CCPOOL_ARCHIVE pct_utilized — informational only (no real ceiling), but tracks growth. (3) Last successful DB backup to both targets — must be within last 24h. (4) DCPOOL directory filesystem usage — any single directory above 90% will start rejecting writes for that dir while others still accept.
DISCARDDATA=YES is irreversible. All backup versions stored in the pool are deleted with no recovery. Use only when you've already migrated customer data elsewhere or accept the data loss. For a real rollback after pools have customer data, the path is: stand up replacement pools, use MOVE NODEDATA per customer to migrate, then delete the empty originals — that's a multi-day operation, not a quick rollback.
ENCRYPT=YES on container pools requires the SP master key to be backed up via BACKUP KEYS. Lose the master key, lose access to all encrypted pool data, even with the data files intact. Back up the key immediately after enabling encryption and before first ingest.SOURCESTGPOOL / SRCPOOLNAME / SRCSTG). Always run HELP DEFINE STGRULE on the deployed server before defining — the runbook syntax targets 8.1.13+.DEFINE STGPOOL are stored in the SP DB. If those credentials are rotated or compromised, run UPDATE STGPOOL CCPOOL_ARCHIVE IDENTITY=… PASSWORD=… to update — don't delete and re-create, you'll orphan the data references.occupancy table — not the stgpool table. The shared pool reports total bytes once; per-node logical bytes are tracked separately and survive dedup correctly for billing.SUM(bytes_processed) from the processes table during the first month — if tiering can't keep up with monthly-full-aging, either widen the tiering window or split DAYS by management class.DESTINATION=DCPOOL_PRIMARY on their copy groups. No new pools, no new device classes per customer.