TSM Storage Architecture — SSD + COS Tiered Pools

SP Server

Linux

OS Disks

3 × 200GB

Hot tier

V7000 — 20×20TB SSD

Cold tier

IBM COS

Hot pool

DCPOOL_PRIMARY

Cold pool

CCPOOL_ARCHIVE

Tier age

30 days

Tenancy

Shared, multi-customer

OS Tier3 × 200GB

/ root mirror (RAID1)~180GB

SP binaries /opt/tivoli~30GB

Instance dir /sp/inst1~5GB

/var, /tmp, swapremainder

SSD Tier — V7000~280TB usable

SP DB /sp/db/db00{1..4}4 × 1TB

Active log /sp/actlog256GB

Archive log /sp/archlog1TB

Archive failover256GB

DB backup target4TB

DCPOOL_PRIMARY dirs~270TB

COS Tiercloud object store

CCPOOL_ARCHIVE bucketscalable

DB backup off-site bucketscalable

Aged data — >30 daysvia STGRULE

DR-site DB backupdaily

Client → DCPOOL_PRIMARY (ingest, dedup, <30d) → CCPOOL_ARCHIVE (aged data, transparent restore)

Phase 1 Capacity Plan & Sizing Rationale

01 Confirm V7000 layout, calculate SP DB sizing, define LUN map PLAN ▶

V7000 raw → usable calculation # 20 × 20TB = 400TB raw. Recommended layout for SP/SP all-flash: # - Single Distributed RAID6 (DRAID6) array # - 2 distributed spare drives (rebuild target across all members) # - 16 data + 2 parity = ~73% efficiency # Usable ~ 400TB × 0.72 = ~285TB # From V7000 CLI (svctask), confirm or build: lsmdiskgrp # existing pools lsdrive -filtervalue use=candidate # spare drives available # If building new — typical command shape (verify on your V7000 firmware): # mkmdiskgrp -name MDG_SP_SSD -ext 1024 -datareduction yes # mkdistributedarray -mdiskgrp MDG_SP_SSD -driveclass <cls> \ # -drivecount 20 -level raid6 -stripewidth 16 \ # -rebuildareas 2 SP DB sizing — rule of thumb # Modern SP DB grows to ~3-5% of unique stored data after dedup. # If projected steady-state unique data on SSD = 80TB: # DB target = 80TB × 0.04 = ~3.2TB # Plus 50% headroom for growth = 4-5TB DB allocation. # IBM recommends 4-8 DB volumes (parallel I/O), equal sized. SSD allocation map (proposed) # Component LUNs Total Filesystem # -------------------- -------- -------- -------------------- # SP DB 4 × 1TB 4.0 TB /sp/db/db00{1..4} # Active log 1 × 256G 256 GB /sp/actlog # Archive log 1 × 1TB 1.0 TB /sp/archlog # Archive failover 1 × 256G 256 GB /sp/archfail # DB backup target 1 × 4TB 4.0 TB /sp/dbbackup # DCPOOL_PRIMARY dirs 16 × 17TB ~272 TB /sp/dcpool/dir{01..16} # ───────── # ~281 TB total COS prerequisites — gather before Phase 4 # From the COS portal, capture for the SP service credential: # - Endpoint URL (e.g. https://s3.eu-gb.cloud-object-storage.appdomain.cloud) # - HMAC access key (32-char alphanumeric) # - HMAC secret key (64-char alphanumeric) # - Bucket name (archive) (e.g. sp-archive-prod-eu) # - Bucket name (db bak) (e.g. sp-dbbackup-prod-eu) # IMPORTANT: bucket must support HMAC creds (S3 protocol), NOT IAM-only.

Why DRAID6 over classic RAID6: on all-flash V7000, distributed RAID rebuilds spread the rebuild I/O across every drive in the array rather than hammering a single hot-spare. Rebuild time on a failed 20TB SSD drops from ~20+ hours to a couple of hours. Performance during rebuild stays close to baseline. The two distributed spare drives mean zero "swap a drive" panic — failed drives can be replaced at the next maintenance window.

Why 16 DCPOOL directories: directory-container pools parallelise I/O across their directories. 16 dirs gives 16 concurrent ingest streams without contention. Smaller filesystem per directory also keeps individual fsck/scan times manageable. Directories should be on separate filesystems if you want true parallelism — or at minimum on the same filesystem with separate top-level paths if the underlying array can handle the parallelism (DRAID6 on SSD can).

Phase 2 Filesystem Layout on Linux

02 Provision SSD-backed filesystems for DB, logs, and DCPOOL directories BUILD ▶

Confirm V7000 LUNs are mapped to this host multipath -ll | grep -iE "size|status" | head -40 lsblk -o NAME,SIZE,TYPE,MOUNTPOINT | grep -vE "loop|sr0" Partition + format — use XFS for SP filesystems (IBM-recommended) # Repeat for each LUN. Example for /sp/db/db001: parted /dev/mapper/mpathX --script mklabel gpt parted /dev/mapper/mpathX --script mkpart primary xfs 0% 100% mkfs.xfs -K /dev/mapper/mpathX-part1 Create mountpoints — naming convention mkdir -p /sp/{inst1,db/db00{1..4},actlog,archlog,archfail,dbbackup} mkdir -p /sp/dcpool/dir{01..16} /etc/fstab entries — XFS, no UUIDs (match site convention) # Append entries — verify with cat /etc/fstab before mount -a cat >> /etc/fstab <<EOF /dev/mapper/mpath_db001-part1 /sp/db/db001 xfs defaults,noatime 0 0 /dev/mapper/mpath_db002-part1 /sp/db/db002 xfs defaults,noatime 0 0 /dev/mapper/mpath_db003-part1 /sp/db/db003 xfs defaults,noatime 0 0 /dev/mapper/mpath_db004-part1 /sp/db/db004 xfs defaults,noatime 0 0 /dev/mapper/mpath_actlog-part1 /sp/actlog xfs defaults,noatime 0 0 /dev/mapper/mpath_archlog-part1 /sp/archlog xfs defaults,noatime 0 0 /dev/mapper/mpath_archfail-part1 /sp/archfail xfs defaults,noatime 0 0 /dev/mapper/mpath_dbbak-part1 /sp/dbbackup xfs defaults,noatime 0 0 /dev/mapper/mpath_dc01-part1 /sp/dcpool/dir01 xfs defaults,noatime 0 0 # … continue dir02 … dir16 EOF mount -a df -hT /sp Set ownership for SP instance user # Default IBM SP install uses tsminst1:tsmsrvrs. Adjust if site differs. chown -R tsminst1:tsmsrvrs /sp chmod -R 750 /sp # Sanity-check ownership on every mount point ls -ld /sp /sp/db/db00{1..4} /sp/actlog /sp/archlog /sp/dcpool/dir{01..16}

noatime: always set on SP filesystems. Without it, every read updates the access timestamp, generating unnecessary write I/O on filesystems that may have hundreds of millions of files. noatime can be a 10-20% performance gain on busy DCPOOL directories.

Why XFS over ext4 for SP: XFS scales much better to hundreds of millions of files in a single filesystem (DCPOOL directories), handles large metadata operations more efficiently, and is the IBM-tested filesystem for both SP DB and container pool directories. Ext4 works but isn't the IBM-blueprint choice.

Do NOT mount DCPOOL directories on the same filesystem as the SP DB. If the DB filesystem fills, the SP server halts. Keeping DCPOOL on separate filesystems (or at minimum a separate set of LUNs from DB) prevents one capacity event from taking down both data and metadata simultaneously.

Phase 3 Define DCPOOL_PRIMARY — Hot Tier on SSD

03 Create directory-container storage pool with inline dedup & compression BUILD ▶

Verify SP version supports container pools (8.1+) QUERY STATUS F=D | grep -iE "server name|version" Create the directory-container storage pool DEFINE STGPOOL DCPOOL_PRIMARY \ STGTYPE=DIRECTORY \ DESCRIPTION="Primary backup container pool — SSD, multi-customer ingest" \ MAXSIZE=NOLIMIT \ ENCRYPT=YES \ COMPRESSION=YES # Verify pool exists and is empty QUERY STGPOOL DCPOOL_PRIMARY F=D Add 16 storage pool directories — one per filesystem DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir01 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir02 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir03 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir04 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir05 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir06 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir07 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir08 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir09 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir10 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir11 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir12 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir13 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir14 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir15 DEFINE STGPOOLDIRECTORY DCPOOL_PRIMARY /sp/dcpool/dir16 Verify all 16 directories registered, status ONLINE QUERY STGPOOLDIRECTORY DCPOOL_PRIMARY F=D Confirm capacity is what we expect # Total capacity should be ~272TB (16 × 17TB) QUERY STGPOOL DCPOOL_PRIMARY F=D | grep -iE "total space|free|util"

Container pool features that matter:

Feature	Effect
Inline dedup	Variable-block dedup at ingest — typical 2-4× reduction across mixed AIX workloads
Inline compression	Stacks on top of dedup — adds another 1.5-2× for compressible data
ENCRYPT=YES	AES-256 at rest. Server-managed keys; KMIP-managed available later
No reclamation	Containers manage their own space; no nightly RECLAIM jobs to schedule
No volume defs	No DEFINE VOLUME / no scratch counts — capacity = sum of directory FS sizes

Once ENCRYPT=YES is set, it cannot be turned off without rebuilding the pool. All data ingested is encrypted with the server's master key. Confirm you have a master key backup procedure (BACKUP KEYS command) before enabling, or you risk locking yourself out of your own data after a server rebuild.

Phase 4 Define CCPOOL_ARCHIVE — Cold Tier on COS

04 Create cloud-container storage pool backed by IBM COS BUILD ▶

Test COS connectivity from the SP server (out-of-band, before defining) # Confirm DNS + TLS path to the COS endpoint resolves and responds curl -sI https://s3.eu-gb.cloud-object-storage.appdomain.cloud | head -3 # Optional but useful — full S3 sanity check with awscli or aws-cli-plugin-cos: # aws --endpoint-url=https://s3.eu-gb...appdomain.cloud s3 ls s3://sp-archive-prod-eu Define the cloud-container storage pool DEFINE STGPOOL CCPOOL_ARCHIVE \ STGTYPE=CLOUD \ CLOUDTYPE=S3 \ CLOUDURL=https://s3.eu-gb.cloud-object-storage.appdomain.cloud \ IDENTITY=<HMAC_ACCESS_KEY> \ PASSWORD=<HMAC_SECRET_KEY> \ BUCKETNAME=sp-archive-prod-eu \ ENCRYPTIONTYPE=AES \ DESCRIPTION="Cloud archive container pool — IBM COS, multi-customer cold tier" Verify pool came up ONLINE — first connection authenticates against COS QUERY STGPOOL CCPOOL_ARCHIVE F=D # Status field should report ONLINE. # If OFFLINE — most common cause is bad HMAC creds or wrong endpoint URL. # Check actlog around the failure: QUERY ACTLOG BEGINTIME=-00:10 SEARCH="CCPOOL_ARCHIVE"

Why HMAC credentials and not IAM tokens: SP authenticates to COS via the S3 protocol, which requires HMAC access/secret pairs. IAM bearer tokens (the COS default for newer credential types) are not supported by the S3 protocol path. When generating service credentials in the COS portal, tick "Include HMAC Credential" — without it, the secret key field will be missing and the stgpool will refuse to come online.

ENCRYPTIONTYPE=AES is doubled with bucket-side encryption — that's fine and recommended. SP encrypts at the object level before upload (server-side master key). COS additionally encrypts at rest. Two layers protect against both transport-level interception and a leaked COS service credential being usable to read clear data.

Network bandwidth note: sustained tier traffic to COS is bound by the SP server's outbound link. Plan for 100-300 Mbit/s sustained during heavy tiering windows (e.g. month-end when monthly fulls are 30 days old and start tiering en masse). If the SP server is on a 1Gbit link shared with backup ingest, schedule tiering windows outside backup windows — see Phase 5.

Phase 5 Define Tiering Rule — Age-Based DCPOOL → CCPOOL

05 Tier inactive data older than 30 days from SSD to COS POLICY ▶

Verify exact STGRULE syntax for your SP version # Parameter names changed slightly between 8.1.10 and 8.1.20+. Always check first: HELP DEFINE STGRULE Define the tiering rule (8.1.13+ syntax — verify against HELP output) DEFINE STGRULE TIER_HOT_TO_COLD \ ACTIONTYPE=TIER \ SRCPOOLNAME=DCPOOL_PRIMARY \ DESTPOOLNAME=CCPOOL_ARCHIVE \ DAYS=30 \ DESCRIPTION="Tier inactive data >30d from SSD container pool to COS" Set a tiering window — outside backup ingest hours # Tiering can compete with backup ingest for SP DB I/O and outbound network. # Restrict to a daytime window when backup load is light: UPDATE STGRULE TIER_HOT_TO_COLD \ STARTTIME=09:00 \ DURATION=8 DURUNITS=HOURS \ ACTIVE=YES Verify rule QUERY STGRULE TIER_HOT_TO_COLD F=D First execution — manually trigger to validate end-to-end flow # Don't wait for the schedule; force a run with a small data set first. # Look for any ANR8xxx / ANR1xxx errors in the actlog during/after. RUN STGRULE TIER_HOT_TO_COLD PREVIEW=YES # dry-run, lists what would tier RUN STGRULE TIER_HOT_TO_COLD # actual run # Watch progress QUERY PROCESS QUERY ACTLOG BEGINTIME=-01:00 SEARCH="TIER_HOT_TO_COLD"

What "30 days" actually means:

Concept	Behaviour
DAYS=30	Data not accessed/written for 30+ days qualifies for tiering
Inactive only	Active backups never tier — only inactive versions move
Restore behaviour	Tiered data is fully online — restores fetch on demand from COS, transparent to client
No re-hydration	Once moved, data does not auto-promote back to SSD on access

Why 30 days specifically: matches the most aggressive customer retention (Bluechip's MC_FILE_30D file-level incremental). Their daily incrementals expire around the time they'd otherwise tier — almost no daily data ever reaches COS, which is the right outcome (small, frequently-accessed files stay on fast tier). Monthly and yearly fulls all tier. If you have a customer with sub-30-day fulls (e.g. weekly-fulls-kept-2-weeks), drop the rule to 14 days for that customer-pattern.

RUN STGRULE PREVIEW=YES is the safe first move. It lists candidate objects without moving any data. Always run it on the first activation against a populated DCPOOL — gives you a count of GB-to-be-moved so you can size the COS bucket and outbound bandwidth window appropriately.

Phase 6 SP Database Backup — Local + COS Off-Site

06 Define DB backup device classes and schedule daily dual-target backups DR ▶

Local DB backup target — FILE devclass on SSD DEFINE DEVCLASS DBBACK_FILE \ DEVTYPE=FILE \ DIRECTORY=/sp/dbbackup \ MOUNTLIMIT=4 \ MAXCAPACITY=50G Off-site DB backup target — DBCLOUD devclass to COS # Different bucket from CCPOOL_ARCHIVE — keep DB backups isolated from data. DEFINE DEVCLASS DBBACK_CLOUD \ DEVTYPE=DBCLOUD \ CLOUDTYPE=S3 \ CLOUDURL=https://s3.eu-gb.cloud-object-storage.appdomain.cloud \ IDENTITY=<HMAC_ACCESS_KEY> \ PASSWORD=<HMAC_SECRET_KEY> \ BUCKETNAME=sp-dbbackup-prod-eu Set DB backup retention # Keep 7 generations of DB backups — covers a week of daily fulls. SET DBRECOVERY DBBACK_FILE SET DRMDBBACKUPEXPIREDAYS 7 First DB backup — validate both targets work # Local first (faster, validates basic DB backup capability) BACKUP DB DEVCLASS=DBBACK_FILE TYPE=FULL WAIT=YES # Then to COS (validates the cloud devclass) BACKUP DB DEVCLASS=DBBACK_CLOUD TYPE=FULL WAIT=YES Verify both backups landed QUERY DBBACKUPTRIGGER QUERY VOLHISTORY TYPE=DBBACKUP Schedule daily DB backup — admin schedule, not client schedule # Local target — runs at 02:00, fast, primary recovery target DEFINE SCHEDULE DAILY_DBBACKUP_LOCAL \ TYPE=ADMINISTRATIVE \ CMD="BACKUP DB DEVCLASS=DBBACK_FILE TYPE=FULL" \ ACTIVE=YES \ STARTDATE=TODAY+1 STARTTIME=02:00 \ PERIOD=1 PERUNITS=DAYS \ DESCRIPTION="Daily SP DB backup to local SSD" # Cloud target — runs at 03:00 after local completes, off-site copy DEFINE SCHEDULE DAILY_DBBACKUP_CLOUD \ TYPE=ADMINISTRATIVE \ CMD="BACKUP DB DEVCLASS=DBBACK_CLOUD TYPE=FULL" \ ACTIVE=YES \ STARTDATE=TODAY+1 STARTTIME=03:00 \ PERIOD=1 PERUNITS=DAYS \ DESCRIPTION="Daily SP DB backup off-site to COS" Verify schedules QUERY SCHEDULE TYPE=ADMINISTRATIVE

Two DB backup paths is non-negotiable. The SP DB backup is the only thing that can recover the server after total loss. Local-only backups die with the server during a site disaster. Cloud-only backups extend RTO significantly (have to download ~3-5TB before you can start a restore). Both targets give you fast local recovery for the common case (DB corruption) and survivable off-site copies for the catastrophic case (site loss).

DRMDBBACKUPEXPIREDAYS=7: keeps 7 daily fulls. SP automatically expires older DB backups from QUERY VOLHISTORY at this threshold. Set higher (14-30) if you want longer recovery point options, lower (3-5) if DB backup capacity is tight.

Phase 7 Customer Policy Alignment — Point Domains at Shared Pools

07 Update existing & future customer copy groups to use DCPOOL_PRIMARY INTEGRATE ▶

Pattern — every customer's MCs point at the shared DCPOOL_PRIMARY # For an existing customer (Bluechip example): UPDATE COPYGROUP BLUECHIP_DOM STANDARD MC_FILE_30D STANDARD \ DESTINATION=DCPOOL_PRIMARY UPDATE COPYGROUP BLUECHIP_DOM STANDARD MC_MFULL_3M STANDARD \ DESTINATION=DCPOOL_PRIMARY UPDATE COPYGROUP BLUECHIP_DOM STANDARD MC_YFULL_2Y STANDARD \ DESTINATION=DCPOOL_PRIMARY Re-activate the policy set so pool change takes effect # UPDATE COPYGROUP modifies the inactive policy set. Activate to push it live. VALIDATE POLICYSET BLUECHIP_DOM STANDARD ACTIVATE POLICYSET BLUECHIP_DOM STANDARD Confirm bindings are now to shared pool QUERY COPYGROUP BLUECHIP_DOM ACTIVE STANDARD F=D \ | grep -iE "mgmt class|destination"

Customer separation is preserved at the policy layer:

Layer	Per-customer	Shared
Domain	BLUECHIP_DOM, ACME_DOM, …	—
Mgmt classes	MC_FILE_30D, MC_MFULL_3M, …	—
Cloptsets	COPT_FILE, COPT_MFULL, …	—
Schedules	SCH_DAILY_INCR, SCH_MONTHLY_FULL, …	—
Nodes	BC0X_FILE, AC0X_FILE, …	—
Storage pool	—	DCPOOL_PRIMARY
Tiering / archive	—	CCPOOL_ARCHIVE (via STGRULE)

Per-customer reporting still works: use QUERY OCCUPANCY node_name=BC0* or SELECT … FROM occupancy WHERE node_name LIKE 'BC0%' to get per-customer space attribution. The shared pool stores the bytes once (deduped), but SP tracks logical occupancy per node, so chargeback and capacity reports remain customer-specific.

For new customer onboarding going forward: the Bluechip-style runbook still applies, but Phase 2 (storage foundation) becomes a no-op — you skip DEFINE DEVCLASS and DEFINE STGPOOL entirely. Every new customer's copy groups go straight to DESTINATION=DCPOOL_PRIMARY and inherit the existing tiering rule automatically.

Phase 8 Verification & Ongoing Monitoring

08 End-to-end checks and the queries you'll re-run forever VERIFY ▶

Pool inventory & capacity QUERY STGPOOL DCPOOL_PRIMARY F=D QUERY STGPOOL CCPOOL_ARCHIVE F=D QUERY STGPOOLDIRECTORY DCPOOL_PRIMARY # 16 dirs, all ONLINE Dedup & compression efficacy on hot tier # After a few days of ingest, these tell you if dedup is working. # Healthy AIX-multi-customer workload: dedup ratio 2-4x, compression 1.5-2x. QUERY STGPOOL DCPOOL_PRIMARY F=D | grep -iE "dedup|compress|space saved" Tiering activity — daily check during ramp-up QUERY STGRULE TIER_HOT_TO_COLD F=D QUERY PROCESS SEARCH="TIER" QUERY ACTLOG BEGINTIME=-24:00 SEARCH="STGRULE" # Total bytes moved to COS in the last 24h SELECT SUM(bytes_processed)/1024/1024/1024 AS "GB tiered (24h)" \ FROM processes WHERE process_name='TIER' Per-customer occupancy — the chargeback view SELECT \ SUBSTR(node_name,1,4) AS "Customer", \ stgpool_name AS "Pool", \ COUNT(*) AS "Filespaces", \ SUM(logical_mb)/1024 AS "Logical GB", \ SUM(reporting_mb)/1024 AS "Stored GB" \ FROM occupancy \ GROUP BY SUBSTR(node_name,1,4), stgpool_name \ ORDER BY 1, 2 DB backup health — must always pass QUERY VOLHISTORY TYPE=DBBACKUP BEGINDATE=-7 # Last successful local + cloud DB backup timestamps SELECT devclass_name,MAX(date_time) AS "Last Successful" \ FROM volhistory \ WHERE type='BACKUPFULL' \ GROUP BY devclass_name Filesystem-level health (Linux side) # DCPOOL directory filesystems — all must stay below 85% utilization. # SP container pools refuse new writes to a directory above ~95%. df -hT /sp/dcpool/dir* | awk 'NR==1 || $6+0 > 70' Daily monitoring one-liner — paste into a wrapper script dsmadmc -id=admin -dataonly=yes \ "SELECT stgpool_name, est_capacity_mb/1024 AS gb, pct_utilized FROM stgpools WHERE stgpool_name IN ('DCPOOL_PRIMARY','CCPOOL_ARCHIVE')"

The 4 numbers to watch daily: (1) DCPOOL_PRIMARY pct_utilized — should hover steady-state ~50-70% if tiering is keeping up. Climbing past 80% means tiering is falling behind ingest. (2) CCPOOL_ARCHIVE pct_utilized — informational only (no real ceiling), but tracks growth. (3) Last successful DB backup to both targets — must be within last 24h. (4) DCPOOL directory filesystem usage — any single directory above 90% will start rejecting writes for that dir while others still accept.

Rollback Removing the New Architecture

RB Tear-down sequence (only if pools have no committed customer data) ROLLBACK ▶

Stop tiering first — prevents data movement during teardown UPDATE STGRULE TIER_HOT_TO_COLD ACTIVE=NO DELETE STGRULE TIER_HOT_TO_COLD Repoint customer copygroups elsewhere BEFORE deleting pools # If pools still have customer data, this is essentially a migration not a rollback. # Rollback is only viable if the pools are empty / no committed backups. Empty cloud pool first (cheaper to delete than to keep around) DELETE STGPOOL CCPOOL_ARCHIVE DISCARDDATA=YES # NOTE: this issues delete operations against the COS bucket — bucket itself remains. # Verify in COS portal that bucket is emptied or remove bucket separately. Drop directory-container pool DELETE STGPOOL DCPOOL_PRIMARY DISCARDDATA=YES Remove DB backup devclasses (only if not in use) DELETE SCHEDULE DAILY_DBBACKUP_LOCAL TYPE=ADMINISTRATIVE DELETE SCHEDULE DAILY_DBBACKUP_CLOUD TYPE=ADMINISTRATIVE DELETE DEVCLASS DBBACK_FILE DELETE DEVCLASS DBBACK_CLOUD Filesystem cleanup (manual, on the Linux host) # Only after SP pool deletes complete umount /sp/dcpool/dir* # Remove fstab entries, then optionally re-format LUNs

DISCARDDATA=YES is irreversible. All backup versions stored in the pool are deleted with no recovery. Use only when you've already migrated customer data elsewhere or accept the data loss. For a real rollback after pools have customer data, the path is: stand up replacement pools, use MOVE NODEDATA per customer to migrate, then delete the empty originals — that's a multi-day operation, not a quick rollback.

Quick Reference Step Summary

Command	Purpose	Notes
mkfs.xfs -K	Format SSD LUNs as XFS	noatime in fstab
DEFINE STGPOOL DCPOOL_PRIMARY STGTYPE=DIRECTORY	Hot tier, inline dedup+compress	ENCRYPT=YES is one-way
DEFINE STGPOOLDIRECTORY × 16	16 directories for parallel ingest	One per filesystem
DEFINE STGPOOL CCPOOL_ARCHIVE STGTYPE=CLOUD	COS-backed cold tier	HMAC creds required (not IAM)
DEFINE STGRULE … ACTIONTYPE=TIER	Age-based DCPOOL → CCPOOL movement	30 days default; verify syntax per version
RUN STGRULE … PREVIEW=YES	Dry-run tiering before first activation	Reports candidate volume
DEFINE DEVCLASS DBBACK_FILE / DBBACK_CLOUD	Local + COS DB backup targets	DEVTYPE=DBCLOUD for COS
BACKUP DB DEVCLASS=… TYPE=FULL	Daily full DB backup	Both targets, scheduled
UPDATE COPYGROUP … DESTINATION=DCPOOL_PRIMARY	Repoint customer MCs at shared pool	Re-activate policy set after
QUERY OCCUPANCY node_name LIKE	Per-customer occupancy in shared pool	Chargeback / capacity reporting
QUERY STGPOOL … (pct_utilized)	Capacity health monitoring	Daily — DCPOOL should sit 50-70%
QUERY ACTLOG SEARCH=STGRULE	Tiering activity audit	Daily during ramp-up

⚠ Key Notes

SP container pools must be 8.1.x or later. Older 7.x servers don't support directory-container or cloud-container pools — for those, the only path is upgrade-or-stay-on-FILE-devclass.
ENCRYPT=YES on container pools requires the SP master key to be backed up via BACKUP KEYS. Lose the master key, lose access to all encrypted pool data, even with the data files intact. Back up the key immediately after enabling encryption and before first ingest.
STGRULE syntax differs slightly between SP versions (SOURCESTGPOOL / SRCPOOLNAME / SRCSTG). Always run HELP DEFINE STGRULE on the deployed server before defining — the runbook syntax targets 8.1.13+.
The COS HMAC credentials embedded in DEFINE STGPOOL are stored in the SP DB. If those credentials are rotated or compromised, run UPDATE STGPOOL CCPOOL_ARCHIVE IDENTITY=… PASSWORD=… to update — don't delete and re-create, you'll orphan the data references.
Per-customer chargeback uses the occupancy table — not the stgpool table. The shared pool reports total bytes once; per-node logical bytes are tracked separately and survive dedup correctly for billing.
Outbound bandwidth to COS is the most common ramp-up bottleneck. Monitor SUM(bytes_processed) from the processes table during the first month — if tiering can't keep up with monthly-full-aging, either widen the tiering window or split DAYS by management class.
DCPOOL directory filesystems should stay below 85% utilization. SP refuses writes to a single directory at ~95%, but other directories continue accepting — monitor each FS individually, not just the pool aggregate.
For new customer onboarding, the existing Bluechip-style runbook still applies — but skip Phase 2 entirely. New customers consume the shared pools directly via DESTINATION=DCPOOL_PRIMARY on their copy groups. No new pools, no new device classes per customer.