cms - thecloudguy: 2026

OCI Full Stack Disaster Recovery (FSDR): A Practitioner's Complete Guide

Here's an uncomfortable truth most cloud architects quietly acknowledge: a DR plan that lives in a Word document is not a DR plan. It's a wish list. When a region goes dark at 2 AM, nobody is calmly following a 47-step runbook — they're improvising, making mistakes, and hoping. Oracle's Full Stack Disaster Recovery (FSDR) is built on the premise that hope is not an operations strategy.

What Exactly is OCI FSDR?

Think of FSDR as a conductor for your disaster recovery orchestra. Your databases, compute instances, load balancers, storage volumes, and middleware are all musicians — each knowing their individual part. But without someone coordinating them, you get noise, not music. FSDR is that conductor: it knows the sequence, enforces the timing, and makes sure the database switches roles before the application servers come back up, not after.

More precisely, FSDR is OCI's native, fully managed DR orchestration service. It moves your entire application stack — not just the database, not just the compute — from a primary OCI region to a standby region, in a single orchestrated workflow you can trigger with one click. Just a pretested, automated plan that executes the same way every single time.

It went generally available at Oracle CloudWorld 2022 and has been expanding steadily — Singapore West, Riyadh, Chile West, Paris, Milan, Newport — with more regions added through 2024 and 2025.

The DR Lie Most Teams Are Living

Ask any team if they have a DR plan. They'll say yes. Ask them when they last tested it end-to-end. Watch the room go quiet. The dirty secret of enterprise DR is that most plans are theoretical — written for the architecture that existed two years ago, by someone who has since left the company, covering an application that has since grown three new dependencies nobody documented.

The problem isn't intent. It's that traditional DR is genuinely hard to get right, for a few structural reasons:

Tools don't talk to each other. Your database failover tool knows nothing about your compute layer. Your compute snapshots know nothing about your middleware config. You end up with a relay race where nobody's sure who's holding the baton.
Runbooks rot the moment you write them. Your application changes every sprint. Your DR runbook gets updated never. By the time you need it, it's archaeology, not operations.
Recovery at scale is a different problem entirely. Bringing one app back up is stressful but doable. Bringing ten back simultaneously, in the right order, with the right dependencies? That's where teams discover their limits.
No test, no trust. DR drills are expensive, risky-feeling, and easy to deprioritize. So they don't happen. And the first real test is a real disaster.

Learn the Language Before You Touch the Console

FSDR has a vocabulary. Some terms look deceptively familiar — RTO, failover, standby — but carry specific weight inside FSDR that matters when you're configuring real protection groups, not just reading about them. Get these wrong in your head and you'll wire things up wrong in the console.

Here's what they actually mean:

RTO => How long your business can survive the app being down. Not a technical target — a business commitment. Everything else is shaped around this number.

RPO => How much data your business can afford to lose, expressed in time. "One hour RPO" means you're okay losing up to 60 minutes of transactions if the worst happens.

Primary => Your production region. Where real users are hitting real workloads right now.

Standby => Your waiting region. Quiet until it isn't. Everything is pre-positioned here so it can take over without scrambling.

Protection Group => The core FSDR concept. Think of it as a named boundary around everything that belongs to one application — compute, DB, load balancer, storage. If it needs to move together, it lives in the same Protection Group.

DR Plan => The actual playbook — an ordered sequence of plan groups and steps that FSDR executes during a transition. This is what runs when things go wrong (or when you're drilling).

Switchover => The planned version. You choose to move. Primary shuts down gracefully, standby comes up clean. Zero data loss. Use this for maintenance windows and drills.

Failover => The unplanned version. Something broke and you can't wait. Standby starts immediately without waiting for primary to acknowledge. Some data loss may occur depending on your replication lag.

Warm Standby => Resources are already provisioned and running in the standby region. Costs more, recovers faster. Right choice for anything customer-facing.

Cold Standby => Standby region has minimal pre-deployment — resources get provisioned during the DR transition itself. Cheaper to run day-to-day, but your RTO takes the hit. Fine for non-critical internal systems.

Prechecks => FSDR's built-in sanity check. Runs before any DR operation to confirm the standby is actually ready. Your early-warning system for configuration drift. Run these regularly, not just before a disaster.

How FSDR Thinks About Your Infrastructure

Everything revolves around one idea: your application has a primary home and a standby home. FSDR's entire job is to move it from one to the other — cleanly, completely, and in the right order — whenever you tell it to.

The two regions are represented by DR Protection Groups. Think of a Protection Group as a container that says "these resources belong to the same application and need to move together." Every compute instance, every database, every load balancer that's part of your app gets added as a member. Once that's done, FSDR understands your topology and can reason about it automatically.

Primary Region:- Active production workloads. Compute, databases, load balancers, storage all running live traffic.

Standby Region:- Reserved or warm-standby infrastructure. DR plans execute here during failover or switchover.
Protection Groups:- Paired consistency groups (one per region) representing the full application system.
DR Plans:- Automated workflows — Switchover or Failover — executed from the standby protection group.

FSDR can work cross-region (different OCI regions entirely) or intra-region (across Availability Domains within one region). For anything mission-critical, cross-region is the right choice — the whole point is surviving an event big enough to take out a data center, and for that you need geographic distance.

What FSDR Actually Does Well

Auto-generates your DR plan — from scratch

Add your resources to a Protection Group and FSDR introspects the topology, figures out the dependencies, and hands you a working DR plan. You don't write the plan; you review and extend it. For large application stacks, this alone saves days of manual sequencing work.

Prechecks — so you're not surprised when it actually counts
Before running any DR operation, FSDR runs a full set of validation checks: is the standby database replicating? Are all the right IAM policies in place? Is the standby compute configured correctly? If anything would cause the recovery to fail, you find out now — not at 3 AM when the primary region is on fire.

You can bolt in your own logic anywhere

FSDR handles the infrastructure transitions, but your app almost certainly needs additional steps that no generic tool can anticipate — flushing a Redis cache, updating a DNS record, hitting a webhook, sending a Slack alert. You can inject shell scripts or OCI Functions as custom plan steps at any point in the sequence. .

You can watch it happen, step by step

The OCI Console shows you a live, step-by-step view of every DR execution as it runs. Each plan step shows its status, duration, and any errors — in real time. No more SSH-ing into servers trying to figure out where the script died. You know exactly where you are in the recovery at every moment.

Setting It Up — What You Actually Do

The setup process is more logical than it looks on first glance. Here's the honest sequence:

Map your stack honestly

Before touching the console, sit down and list every resource your application actually needs to function — not what the architecture diagram says, what it actually uses.

Compute instances, databases, load balancers, file storage, any middleware. Miss something here and your DR plan will have a gap you'll only discover at the worst possible moment.

Create Protection Groups in both regions

One Protection Group in your primary region, one in the standby. Assign their roles, point them at an Object Storage bucket for logs, then link them as peers.

This paired relationship is the foundation everything else builds on.

Add members to each group

This is where you tell FSDR which resources belong to this application. Add your compute instances, databases, OKE clusters — whatever the stack needs.

FSDR introspects what you add and uses that to auto-generate the recovery plan. The quality of what you put in directly determines the quality of the plan that comes out.

Generate plans — then actually look at them

Create your Switchover and Failover plans. FSDR will build them for you, but don't just accept them blindly.

Read through every plan group, understand the sequence, and add any custom steps your application actually needs. This is the moment to think, not during a real outage.

Run prechecks. Then run them again next month.

Prechecks aren't a one-time gate — they're a health signal. Run them regularly. They catch configuration drift: IAM policies that were quietly changed, standby databases that stopped replicating, storage buckets that were deleted. The cost of catching these early is near zero. The cost of catching them during a failover is enormous.

When disaster strikes — one click from the standby side

This is intentional design: failover always executes from the standby region, never the primary. That means even if your primary region is completely unreachable,

you can still trigger recovery. Go to the standby Protection Group, execute the Failover plan, and watch FSDR do in minutes what a manual recovery team would spend hours on.

What Can FSDR Actually Protect?

This is the question that catches people off guard. FSDR isn't a narrow database tool wearing a DR hat — it has genuine breadth across the OCI service catalogue, and Oracle has been expanding it steadily. Here's what's in scope today:

Compute: VM instances, Dedicated VM Hosts, Boot Volumes, Block Volumes — the bread and butter of most workloads, fully covered.

Database: Autonomous Database Serverless, Base Database Service, Exadata Database Service, and MySQL HeatWave. If you're running Oracle databases, there's a FSDR integration for it.

Containers: Oracle Kubernetes Engine (OKE) clusters — added in 2024 and a genuine game-changer for teams running cloud-native workloads alongside traditional infrastructure.

Storage: File Storage, Block Volume groups, Object Storage via replication policies. Your data moves with your application, not separately.

Networking: Load Balancers and Network Load Balancers — so traffic routes correctly the moment your standby comes live, without manual DNS surgery.

Integration: Oracle Integration Cloud (OIC) and Oracle GoldenGate — for teams running Oracle middleware and real-time data replication as part of their stack.

How to Not Waste the Tool You Just Set Up

FSDR gives you real power. These are the habits that determine whether you use it well or spend months convincing yourself you're protected when you're not.

Schedule prechecks like you schedule backups
Drill with Switchover, not just tabletop exercises
One app per Protection Group — resist the urge to bundle
Your custom steps are part of the plan — treat them that way
Cross-region over intra-region — unless you have a specific reason

The Bottom Line

DR has always been one of those things organizations know they need and consistently under-invest in — because the cost of doing it well is visible today, and the cost of doing it badly only shows up when everything is already on fire. FSDR doesn't eliminate that tension, but it does shift the calculus. Setting up protection groups, running prechecks, and drilling with real switchovers is no longer a six-month infrastructure project. It's a few days of focused work and an ongoing operational habit.

What FSDR gets right that most DR tools miss is scope. It doesn't protect your database and leave your compute to fend for itself. It doesn't protect your compute and forget about your load balancers. It thinks about your application as a stack, moves it as a stack, and recovers it as a stack.

If you're running production workloads on OCI and your DR strategy is still a runbook in a shared drive — this is the moment to fix that. Not because something bad is about to happen. Because the whole point is that you genuinely don't know when it will.

Start here

Open the OCI Console, go to Migration & Disaster Recovery -> Dr Protection Groups
and create your first Protection Group.
The official docs live at docs.oracle.com/en-us/iaas/disaster-recovery — they're well-maintained and worth reading alongside this post.

#OracleCloud #OCI #DisasterRecovery #FSDR #FullStackDisasterRecovery #CloudInfrastructure #BusinessContinuity #CloudArchitecture #OracleCloudInfrastructure #SRE #DevOps #CloudEngineering #InfrastructureAsCode #Terraform #DataProtection #CloudResilience #OracleDatabase #Kubernetes #OKE #ITResilience

OCI IAM Policies Explained with Real Examples

OCI IAM Policies

In Oracle Cloud Infrastructure (OCI), policies define who (groups) can access what resources (compartments/services) and how (permissions/verbs). They are written in a human-readable format like: “Allow group X to manage Y in compartment Z.”

Policies are attached at the tenancy or compartment level and follow a least-privilege model, ensuring users get only the access required to perform their tasks.

Policy syntax anatomy

OCI uses human-readable policy statements attached to compartments. Every OCI policy statement follows a strict grammar with five (sometimes six) components:

Example — basic policy => Allow group NetworkAdmins to manage virtual-network-family in compartment production

If you observe the above navigation, policies sits outside of the domains - This ensures centralized access control, so a single policy can govern users across multiple domains and resources.

Identity domains manage users and groups, while policies control what those groups can do—keeping authentication (who you are) separate from authorization (what you can access). This separation improves scalability, consistency, and security across the environment.

The verb hierarchy

Verbs are cumulative — each level includes all permissions from the levels below it:

Verb hierarchy (least → most permissive)

inspect → list resources, view metadata
read → inspect + get resource contents
use → read + work with existing resources
manage → use + create, update, delete resources

Dynamic Groups — The Real Power of OCI IAM

Dynamic groups allow you to grant IAM policies to OCI resources (like compute instances, functions, or data science notebooks) rather than to human users or service accounts.

This is how you avoid hardcoding credentials. A dynamic group is defined by a matching rule — an expression that determines which resources belong to the group based on their

attributes.

Dynamic group matching rule syntax

Match all instances in a specific compartment

All {instance.compartment.id = 'ocid1.compartment.oc1..aaaabbbbcccc'}

Match a specific instance by OCID

Any {instance.id = 'ocid1.instance.oc1.ap-mumbai-1..aaaabbbb'}

Match OCI Functions in a compartment (serverless)

All {resource.type = 'fnfunc', resource.compartment.id = 'ocid1.compartment.oc1..xxxxxx'}

Match Data Science notebook sessions

All {resource.type = 'datasciencenotebooksession', resource.compartment.id = 'ocid1.compartment.oc1..xxxxxx'}

Example:- Compute Instance → Object Storage Access

This is the most common pattern in OCI projects. Instead of baking access keys into your application code or instance user data, you use Instance Principal authentication.

Create the dynamic group and matching rule
Create the policy
Use instance principal in application access code

Policy for compartment level access:-

Cross-compartment access is where many developers hit a wall. The key insight: the policy must be written in the compartment that contains the resource being accessed,

OR in a parent compartment that has authority over both.

-- This policy is attached to the ROOT TENANCY ---- (so it has authority over all compartments) --

Allow group Developers to read secret-family in compartment security-compartment

-- OR: attach policy to security-compartment itself --

Allow group Developers to read secret-family in compartment id ocid1.compartment.oc1..security

Using Conditions (Where Clauses):-

Conditions let you add fine-grained filters to any policy statement. They support logical operators AND, OR, and comparison operators like =, !=, in.

Condition examples

-- Restrict to specific resource --

Allow group Auditors to read buckets in tenancy where target.bucket.name = 'audit-logs-bucket'

-- Require MFA for admin actions --

Allow group Admins to manage all-resources in tenancy where request.user.mfaTotpVerified = 'true'

-- Multiple conditions with AND, block deletes --

Allow group DataEngineers to manage objects in compartment analytics where all {target.bucket.name = 'raw-data', request.operation != 'DeleteObject'}

Refer to the below table for the common request and target variables using it in the condition.

Troubleshooting: Why Your Policy Isn't Working

Verify the policy is attached to the correct compartment (not just the user's compartment)
Check that the group name in the policy exactly matches the IAM group name (case-sensitive)
For dynamic groups, confirm the resource's OCID or compartment OCID matches the matching rule
Check OCI Audit logs for the actual error message and resource OCID that was denied
Wait 60 seconds after any policy change before retesting
Use the Policy Simulator (Identity → Policy → Simulate) to test before applying
Ensure there is no explicit DENY policy overriding your ALLOW (deny takes precedence)

Best Practices Checklist

Always follow least-privilege — grant only the verb and resource-type you actually need
Use dynamic groups + instance principals instead of API keys for OCI-to-OCI communication
Scope object storage access to specific buckets using where target.bucket.name
Require MFA for admin policies using where request.user.mfaTotpVerified = 'true'
Attach root-level policies only for cross-compartment access; keep other policies compartment-local
Use compartment OCID (not name) in policies for portability and to avoid name-collision bugs
Document every policy with a description explaining the business reason it exists
Audit policy list quarterly — remove access that is no longer needed
Never use manage all-resources in tenancy for anything other than the OCI administrator group

Summary:

OCI IAM policies become intuitive once you internalize three things: the policy must live near the resource (not the requester), dynamic groups need an accompanying policy to do anything, and the verb hierarchy saves you from over-granting permissions.

Mounting NFS File Systems on Oracle ATP in OCI | DBMS_CLOUD_ADMIN Guide

A step-by-step guide to attaching external NFS volumes to Oracle Autonomous Transaction Processing (ATP) using DBMS_CLOUD_ADMIN, creating DBA directories, and performing file I/O with UTL_FILE.

Purpose

Oracle Autonomous Transaction Processing (ATP) is a fully managed, cloud-native database that abstracts away infrastructure concerns. However, many real-world enterprise workloads require the database to interact directly with files — reading flat files, writing exports, or integrating with shared network storage.

This blog walks you through how to mount an NFS (Network File System) volume from an OCI Compute VM onto an ATP database instance, create a DBA_DIRECTORY pointing to that mount, and use Oracle's UTL_FILE package to read and write files — all without leaving the SQL/PL-SQL environment.

📌 Note: This guide assumes you have an existing ATP instance (Dedicated or Serverless) and an OCI Compute VM that is already exporting an NFS share. The VM and the ATP instance must reside within the same VCN or have appropriate network peering and security list rules in place.

Architecture Overview

The solution bridges a Compute VM's NFS export with ATP's internal directory system:

When Do You Need to Mount a File System on ATP?

ATP's managed nature means direct OS-level access is not available. Yet many integration patterns depend on file-based exchange. Here are common real-world scenarios.

Where mounting an NFS volume becomes essential:

Bulk Data Ingestion :- Legacy systems drop flat files (CSV, fixed-width) on a shared NFS volume. ATP must read and load them via external tables or UTL_FILE.
Report Exports :- Regulatory or BI reports generated by PL/SQL procedures must be written to a shared drive for downstream consumption by other services.
ETL Staging Area :- ETL pipelines stage intermediate files on an NFS mount that ATP reads before transforming and loading into final tables.
Document Storage :- CMS or content-driven applications store documents on NFS. ATP needs to read metadata or contents directly from the file system.
Audit & Compliance Logs :- PL/SQL audit routines write log files to a secured NFS location for external SIEM or archival tools to consume.
Cross-System Integration :- Middleware or on-premises systems communicate with ATP via file drops on a shared volume — a common brownfield integration pattern.

Prerequisites:

ATP Instance Running => An active Oracle ATP instance (Serverless or Dedicated) in OCI. You need ADMIN or DBA privileges.
OCI Compute VM as NFS Server => A Compute VM with an NFS server configured, exporting the directory /mydocs over NFS v4.
Network Connectivity => ATP's private endpoint subnet can reach the VM on TCP port 2049. Security lists and NSGs must allow this traffic.
NFS Export Permissions => The VM's /etc/exports file must allow the ATP instance's private IP with appropriate read/write permissions.

Step 1 — Create the DBA Directory

A DBA Directory is an Oracle database object that maps a logical name (used inside PL/SQL) to a physical path on the server's file system. Once the NFS share is mounted, this directory object tells ATP where to find it.

Run the following as ADMIN or a user with the CREATE ANY DIRECTORY privilege:

-- Creates (or replaces) a directory object named MY_DIR

-- pointing to the physical path 'mydocs' on the ATP server.

-- This path will resolve once the NFS mount is attached in Step 2.

CREATE OR REPLACE DIRECTORY MY_DIR AS 'mydocs';

⚠️ Important

The path 'mydocs' is a relative path that ATP resolves internally after the NFS mount is attached. Do not use an absolute OS path here — DBMS_CLOUD_ADMIN.ATTACH_FILE_SYSTEM handles the binding. The directory name MY_DIR must match exactly in all subsequent steps.

Step 2 — Attach the NFS File System to ATP

With the directory object in place, use DBMS_CLOUD_ADMIN.ATTACH_FILE_SYSTEM to mount the NFS export from your Compute VM. This procedure instructs ATP to establish an NFS mount and link it to the MY_DIR directory object.

Replace <VM_NAME_OR_IP> with the hostname or private IP address of your Compute VM.

PL/SQL — Attach File System via DBMS_CLOUD_ADMIN

BEGIN

DBMS_CLOUD_ADMIN.ATTACH_FILE_SYSTEM(

file_system_name => 'mydocsdocs', -- Unique name for this mount

file_system_location => '<VM_NAME_OR_IP>:/mydocs', -- NFS server export path

directory_name => 'MY_DIR', -- Must match Step 1 directory

description => 'Source NFS for data',

params => JSON_OBJECT('nfs_version' value 4)

);

END;

Parameter Reference =>

Step 3 — Write a File to the Mounted Directory

With the NFS volume attached and mapped to MY_DIR, you can now use Oracle's UTL_FILE package to write files directly to the NFS mount. The following block creates a text file named cms2.txt in the mounted directory:

PL/SQL — Write File to MY_DIR using UTL_FILE

DECLARE

l_file UTL_FILE.FILE_TYPE;

l_location VARCHAR2(100) := 'MY_DIR'; -- DBA Directory name

l_filename VARCHAR2(100) := 'cms2.txt'; -- File to create

BEGIN

-- Open the file in write mode ('w' creates or overwrites)

l_file := UTL_FILE.FOPEN(l_location, l_filename, 'w');

-- Write a line of data to the file

UTL_FILE.PUT(l_file, 'Chetan1, male, 1002');

-- Close the file handle (important: always close to flush buffers)

UTL_FILE.FCLOSE(l_file);

END;

✅ What Happens Here

UTL_FILE.FOPEN opens a file handle at MY_DIR/cms2.txt in write mode. The PUT call writes the string without a newline (use PUT_LINE if you want a newline appended). FCLOSE flushes and closes the file — always close to prevent handle leaks. The file will physically appear on the VM at /mydocs/cms2.txt.

Step 4 — Read the File from the Mounted Directory

Now verify the write succeeded by reading the same file back from ATP using UTL_FILE.GET_LINE. This also demonstrates that ATP can read files placed on the NFS share by external systems.

PL/SQL — Read File from MY_DIR using UTL_FILE

SET SERVEROUTPUT ON;

DECLARE

l_file UTL_FILE.FILE_TYPE;

l_location VARCHAR2(100) := 'MY_DIR'; -- DBA Directory name

l_filename VARCHAR2(100) := 'cms2.txt'; -- File to read

l_text VARCHAR2(32767); -- Buffer for file content

BEGIN

-- Open the file in read mode ('r')

l_file := UTL_FILE.FOPEN(l_location, l_filename, 'r');

-- Read one line from the file into the buffer

UTL_FILE.GET_LINE(l_file, l_text, 32767);

-- Output the read content to the console

DBMS_OUTPUT.PUT_LINE('File content: ' || l_text);

-- Close the file

UTL_FILE.FCLOSE(l_file);

END;

💡 Note

To read multi-line files, wrap UTL_FILE.GET_LINE in a loop and catch the NO_DATA_FOUND exception to detect end-of-file. The buffer size of 32767 is the maximum per-line limit for UTL_FILE.

Step 5 — Detach the File System

When the NFS mount is no longer required — for maintenance, decommissioning, or to switch to a different volume — use DBMS_CLOUD_ADMIN.DETACH_FILE_SYSTEM to cleanly unmount it from ATP. Always detach before deleting the directory object.

PL/SQL — Detach File System

BEGIN

DBMS_CLOUD_ADMIN.DETACH_FILE_SYSTEM(

file_system_name => 'mydocsdocs' -- Must match the name used in ATTACH

);

END;

Before You Detach

Ensure all open file handles using UTL_FILE or External Tables pointing to MY_DIR are closed. Detaching an active mount may cause I/O errors in running sessions.

Also note that the DBA_DIRECTORY object (MY_DIR) remains after detach — drop it separately with DROP DIRECTORY MY_DIR if no longer needed.

Troubleshooting Common Issues

ORA-29283: Invalid File Operation

This usually means the directory path is not yet mounted or the NFS server is unreachable. Verify network connectivity from ATP's subnet to the VM on port 2049 and

confirm ATTACH_FILE_SYSTEM completed without errors.

ORA-29284: File Read Error

The file does not exist at the expected path, or the NFS mount permissions don't allow the Oracle process to read it. Check NFS export options

(rw, no_root_squash as needed) in /etc/exports on the VM.

ATTACH_FILE_SYSTEM Hangs or Times Out

The most common cause is a blocked port. Ensure OCI Security Lists allow TCP/UDP on port 2049 between the ATP private endpoint subnet and the Compute VM subnet.

Also confirm the VM's OS-level firewall (firewalld / iptables) allows NFS traffic.

NFS Version Mismatch

ATP's ATTACH_FILE_SYSTEM with nfs_version => 4 requires the VM to export via NFSv4. Verify with nfsstat -s or check /proc/fs/nfsd/versions on the VM.

Summary

Mounting an NFS file system on Oracle ATP bridges the gap between the managed database world and file-based integration patterns. The workflow is clean and entirely SQL/PL-SQL-driven:

Create the Directory:- CREATE OR REPLACE DIRECTORY MY_DIR AS 'mydocs' — registers the logical path in Oracle's data dictionary.
Attach the NFS Mount:- DBMS_CLOUD_ADMIN.ATTACH_FILE_SYSTEM — mounts the NFS export from the Compute VM and binds it to MY_DIR.
Write Files:- UTL_FILE.FOPEN / PUT / FCLOSE — creates and writes files directly on the NFS share from PL/SQL.
Read Files:- UTL_FILE.FOPEN / GET_LINE / FCLOSE — reads files from the NFS share into PL/SQL variables.
Detach When Done:- DBMS_CLOUD_ADMIN.DETACH_FILE_SYSTEM — cleanly unmounts the NFS share.

🎉 Key Takeaway

This approach lets you leverage ATP's enterprise-grade managed database capabilities while still integrating with the file-based workflows that many enterprise architectures depend on — with no OS access required and full auditability through Oracle's data dictionary.

How to Swap Your Oracle Cloud Boot Volume Without Rebuilding Your Server

If you're running servers on Oracle Cloud Infrastructure, there might come a time when you need to change the boot volume disk that contains your operating system. Maybe something went wrong, or perhaps you want to upgrade to a newer version. The good news? You don't have to tear everything down and start from scratch.

What This Feature Does

Think of your boot volume as the hard drive where your server's operating system lives. Traditionally, swapping it out meant shutting down your entire instance, deleting it, and creating a brand new one. That's time-consuming and risky.

Oracle Cloud now lets you hot-swap this boot volume. Your instance pauses momentarily, switches to the new boot volume, and comes back up - almost like restarting your computer with a different hard drive. Everything else about your server stays the same.

Before You Start: What Works and What Doesn't

Not every server and operating system can use this feature. Here's what you need to know:

Operating System Limitations:

This feature only works with Linux-based systems. If you're running Windows servers or using marketplace images (pre-configured software packages from Oracle's marketplace), you'll need to stick with the traditional rebuild method.

Additionally, you can't switch between different Linux flavors. Running Oracle Linux? You must replace it with another Oracle Linux boot volume. You can't suddenly switch to Ubuntu or any other distribution during the swap.

What Your Instance Needs:

You'll need either a virtual machine or a bare metal instance. Beyond that, you need one of two things: a properly formatted block volume that already has a compatible operating system installed, or a backup image that matches your current setup.

The technical bit here involves launch options - basically, the way your boot volume connects to your instance needs to match between the old and new volumes. If they don't align, Oracle won't let you proceed.

Getting Permission to Make Changes

Oracle Cloud takes security seriously, which means you can't just make system-level changes without proper authorization. Your cloud administrator needs to grant you specific permissions through something called IAM policies.

The Permission Structure

There are different levels of permission you might receive. Some administrators prefer giving broad access across the entire cloud account (called a tenancy), while others limit permissions to specific compartments think of these as folders that organize your cloud resources.

At minimum, you need permission to manage instances or specifically to replace boot volumes. Your administrator will add you to a group with these rights. Without this setup, you'll get an "unauthorized" error when attempting the replacement.

Understanding the Safety Net

What happens if something goes wrong during the swap? Oracle has built in a rollback mechanism to protect you.

How Rollback Works:

If the replacement process hits a snag, the system automatically tries to restore everything to how it was before. It brings back your original settings, reconnects your volumes, and restarts your instance with the old boot volume. This safety feature works well in most situations, though there are rare edge cases where a complete restoration might not be possible. It's still good practice to have backups before making major changes.

One important detail: if you were replacing your boot volume with a newly created image and the rollback kicks in, that new boot volume gets deleted automatically. However, if you specified an existing volume by its ID, rollback keeps that volume around it just doesn't use it.

Replacing Your Boot Volume

you can replace your boot volume using several methods including Web interface (Console), CLI and API. Below steps show you - how to do it with Web interface while logging into the OCI console.

Log into the Oracle Cloud console and navigate to the Compute section. Find the Instances page and click on the specific server you want to modify.
Once you're viewing its details, look for the "More Actions" dropdown menu and select "Replace Boot Volume."

First, decide what happens to your current boot volume. You'll see an option called "Preserve Boot Volume." If you enable this, your old boot volume sticks around after the replacement succeeds, useful if you want to keep it as a backup. If you disable it, the old volume gets deleted, freeing up storage space and reducing costs.

Selecting Your Replacement

You have two main approaches here: using an existing boot volume or creating one from an image.

In the replacement dialog, you'll see several options:

Preserve boot volume toggle: A switch at the top that controls whether your old boot volume is kept or deleted after successful replacement
Replace by section: Two radio buttons letting you choose between "Boot volume" or "Image" as your replacement source
Apply boot volume by: Two methods to specify your volume:

> "Select from a list" - Browse available volumes from a dropdown menu

> "Input OCID" - Directly paste the unique identifier if you already know it

Boot volume compartment: A dropdown to select which compartment to search in for available volumes
Boot volume: The main dropdown where you pick your specific volume from the filtered list

If you're going with an existing boot volume, you can either pick it from the compartment-filtered dropdown list or enter its OCID - a unique identifier string that Oracle assigns to every resource. The OCID method is faster if you already know the specific volume you want. Similarly, if you're using an image to create a new boot volume, you can browse through available images or paste in an image OCID directly.

Advanced Configuration Options

Beyond the basic replacement, Oracle provides advanced options to customize your new boot volume:

Metadata section: Add custom key/value pairs such as SSH public keys needed to connect to the instance after replacement. This is particularly useful when you're switching to a fresh operating system installation and need to ensure you can access it

                      - Click "Add item" to insert new metadata pairs
                      - Each pair has a Name field and Value field
                      - Use the X button to remove unwanted entries

Extended metadata section: Provide additional metadata pairs that serve the same purpose as the standard metadata. This gives you extra flexibility for complex configurations where you need to pass multiple custom values to your instance

These advanced options ensure that when your instance boots up with the new volume, it has all the configuration details it needs to function properly in your environment.

Once you've made all your selections, hit the Replace button and let Oracle handle the rest. Your instance will stop, perform the swap, and restart automatically.

Working From the Command Line

If you prefer automation or working in scripts, Oracle's Command Line Interface (CLI) lets you perform boot volume replacements programmatically. You'll use the instance update command with a JSON file that describes the changes you want to make. This JSON file contains all the specifications for your instance, including the new boot volume details. The CLI approach is particularly useful when managing multiple instances or integrating this process into larger automation workflows. You can generate example JSON files to see the correct format, then modify them for your needs.

Using Oracle's REST API

For developers building applications or custom tools, Oracle provides a REST API that exposes the boot volume replacement functionality. You can integrate this directly into your code using one of Oracle's SDKs or by making raw HTTP requests.

The key API operation is UpdateInstance, which handles boot volume replacement along with other instance modifications. You'll need to properly authenticate your API requests using Oracle's security credentials system.

When to Use This Feature

Boot volume replacement shines in several scenarios. Maybe you discovered your current operating system installation has issues and you have a clean backup image. Rather than manually fixing problems, you can swap to the backup and get running quickly. Perhaps you're standardizing your server configurations across a fleet of instances. You can create one perfect boot volume, then replicate it across multiple servers without rebuilding each one from scratch.

This feature also helps with disaster recovery. Keep recent boot volume backups, and if something catastrophic happens, you can roll back to a known-good state in minutes rather than hours.

Final Thoughts

Oracle Cloud's boot volume replacement feature removes a lot of friction from server maintenance. Instead of the old approach of destroying and recreating instances, you can now swap boot volumes like changing batteries - the device stays the same, you're just swapping out one component. Just remember the limitations: Linux only, matching distributions required, and you need proper permissions. Within those boundaries, this tool gives you flexibility to maintain, upgrade, and recover your cloud infrastructure with minimal downtime.

Whether you're a system administrator managing a handful of servers or a DevOps engineer orchestrating hundreds, having the ability to replace boot volumes without rebuilding instances makes life considerably easier.

OCI Networking Series – Part 7: Monitoring & Troubleshooting OCI Networks

Designing a secure and scalable OCI network is only half the job. In real-world enterprise environments, networks evolve continuously—routes change, security rules get updated, hybrid links fluctuate, and workloads scale dynamically. Without strong observability and troubleshooting capabilities, even the best-designed architectures can fail operationally.

Oracle Cloud Infrastructure (OCI) provides native tools that give deep visibility into network behavior, traffic flows, routing decisions, and connectivity paths.

This blog focuses on how to monitor, analyze, and troubleshoot OCI networks effectively using Flow Logs, Logging Analytics, Path Analyzer, and operational diagnostic tools.

1. VCN Flow Logs – Foundational Network Visibility

What VCN Flow Logs Are

VCN Flow Logs capture metadata about IP traffic flowing to and from a VNIC or subnet within a VCN. They do not capture packet payloads; instead, they record key attributes such as source and destination IP, ports, protocol, action (ALLOW/DENY), bytes transferred, and timestamps.

Flow Logs provide a factual record of how network security and routing decisions are applied in practice—not how they are intended to work.

Why Flow Logs Are Critical

Flow Logs form the ground truth for network troubleshooting. When traffic fails, they help answer questions like:

Did the packet reach the subnet or VNIC?
Was it allowed or denied?
Which port or protocol was involved?
Was the issue security-related or routing-related?

Without Flow Logs, architects often rely on assumptions.

How to Enable VCN Flow Logs

Navigate to Networking → Virtual Cloud Networks.
Select the Subnet and select monitoring tab
Enable Flow Logs.
Select or create a Log Group.
Save the configuration.

Like shown in the above diagram, we can enable the flow logs for the subnet, it will show what traffic is coming to the selected subnet, whether its accepted or rejected with source and destination IP details.

You can also select the log retention period as it takes the cost for the object storage space. By default its 30 days as shown in the above diagram.

Common Use Cases

Troubleshooting “instance not reachable” issues.
Verifying NSG or Security List behavior.
Auditing hybrid traffic entering via DRG.
Identifying suspicious inbound connection attempts.

2. Logging Analytics – Making Sense of Flow Logs at Scale

What Logging Analytics Is

Logging Analytics is OCI’s centralized log analytics and observability platform. It ingests logs (including VCN Flow Logs), indexes them, applies machine learning, and presents insights through dashboards, queries, and alerts.

While Flow Logs capture raw data, Logging Analytics turns data into insights.

Why Logging Analytics Matters

In large environments, Flow Logs can generate millions of records. Logging Analytics enables:

Pattern detection (normal vs abnormal traffic)
Traffic trend analysis over time
Rapid filtering of denied or suspicious flows
Correlation across network, compute, and application logs

It significantly reduces mean time to resolution (MTTR).

How to Enable Logging Analytics

Navigate to Observability & Management → Logging Analytics.
Enable the Logging Analytics service for the tenancy.
Grant required IAM policies for log ingestion.
In OCI Logging, configure Flow Logs to be forwarded to Logging Analytics.
Use built-in parsers and dashboards to visualize traffic.

you can enable the logging analytics from the home screen as highlighted in below screen.

Common Use Cases

Identifying spikes in denied traffic after rule changes.
Monitoring hybrid connectivity stability.
Investigating security incidents using historical logs.
Creating alerts for repeated denied connections.

3. Path Analyzer – Understanding Routing and Security Decisions

What Path Analyzer Does

Path Analyzer simulates the network path traffic would take between two endpoints in OCI. It evaluates route tables, gateways, NSGs, security lists, and

DRG attachments to determine whether traffic is allowed and how it is routed.

This is not packet capture—it is deterministic path simulation based on configuration.

Why Path Analyzer Is Essential

In complex environments—hub-spoke VCNs, hybrid connectivity, multi-region architectures—routing behavior can be difficult to predict. Path Analyzer eliminates guesswork by showing:

Which route table is selected
Which gateway is used
Whether security rules allow traffic
Where traffic would be dropped

How to Use Path Analyzer

Navigate to Networking → Network Path Analyzer.
Define the Source (instance, VNIC, or IP).
Define the Destination (instance, CIDR, on-prem IP).
Select protocol and port.
Run the analysis to view the simulated path and results.

you can do bidirectional as well as unidirectional analysis also depending on the issue.

Once you have given all the source, destination, port, protocol and direction details then run the analysis.

Common Use Cases

Debugging hub-spoke connectivity failures.
Validating DRG routing logic.
Troubleshooting asymmetric routing.
Confirming hybrid traffic paths.
Diagnosing backend reachability behind load balancers.

4. Common OCI Networking Troubleshooting Scenarios

Instance Not Reachable

This typically results from:

Missing or incorrect route table entries.
NSG or Security List blocking traffic.
Incorrect gateway (IGW vs NAT).
OS-level firewall restrictions.

Flow Logs identify denies, while Path Analyzer reveals routing issues.

Hybrid Connectivity Issues

Common causes include:

IPSec tunnel down or BGP not established.
Overlapping CIDR blocks.
Incorrect DRG route table association.
MTU mismatches causing silent packet drops.
Missing return routes on on-prem routers.

Hybrid issues often require validating both OCI and customer-edge configuration.

Routing Conflicts

These occur due to:

Overlapping VCN CIDRs.
Incorrect DRG attachment route priorities.
Multiple gateways competing for the same prefix.

Path Analyzer is the fastest way to isolate such issues.

Conclusion

Monitoring and troubleshooting are not optional in cloud networking—they are foundational. OCI provides a robust set of native tools that allow architects to observe

real traffic behavior, simulate routing decisions, and diagnose issues across VCNs, DRGs, and hybrid environments. By combining Flow Logs, Logging Analytics, Path Analyzer, and operational tools like CLI and traceroute, teams can significantly reduce outages, improve reliability, and maintain confidence in complex OCI network architectures.

This concludes the OCI Networking Series, where we explored networking from fundamentals to advanced monitoring and troubleshooting. Thank you for following along — see you soon in the next one...!!!!

Happy Networking. :)

Link List

OCI Full Stack Disaster Recovery (FSDR): A Practitioner's Complete Guide

What Exactly is OCI FSDR?

The DR Lie Most Teams Are Living

Learn the Language Before You Touch the Console

How FSDR Thinks About Your Infrastructure

What FSDR Actually Does Well

Auto-generates your DR plan — from scratch

You can watch it happen, step by step

Setting It Up — What You Actually Do

What Can FSDR Actually Protect?

How to Not Waste the Tool You Just Set Up

The Bottom Line

OCI IAM Policies Explained with Real Examples

OCI IAM Policies

Policy syntax anatomy

The verb hierarchy

Dynamic Groups — The Real Power of OCI IAM

Dynamic group matching rule syntax

Policy for compartment level access:-

Using Conditions (Where Clauses):-

Troubleshooting: Why Your Policy Isn't Working

Best Practices Checklist

Summary:

Mounting NFS File Systems on Oracle ATP in OCI | DBMS_CLOUD_ADMIN Guide

Purpose

Architecture Overview

When Do You Need to Mount a File System on ATP?

Where mounting an NFS volume becomes essential:

Prerequisites:

Step 1 — Create the DBA Directory

Step 2 — Attach the NFS File System to ATP

Step 3 — Write a File to the Mounted Directory

Step 4 — Read the File from the Mounted Directory

Step 5 — Detach the File System

Troubleshooting Common Issues

Summary

How to Swap Your Oracle Cloud Boot Volume Without Rebuilding Your Server

What This Feature Does

Before You Start: What Works and What Doesn't

Operating System Limitations:

What Your Instance Needs:

Getting Permission to Make Changes

The Permission Structure

Understanding the Safety Net

How Rollback Works:

Replacing Your Boot Volume

Selecting Your Replacement

Advanced Configuration Options

Working From the Command Line

Using Oracle's REST API

When to Use This Feature

Final Thoughts

OCI Networking Series – Part 7: Monitoring & Troubleshooting OCI Networks

1. VCN Flow Logs – Foundational Network Visibility

What VCN Flow Logs Are

Why Flow Logs Are Critical

How to Enable VCN Flow Logs

Common Use Cases

2. Logging Analytics – Making Sense of Flow Logs at Scale

What Logging Analytics Is

Why Logging Analytics Matters

How to Enable Logging Analytics

Common Use Cases

3. Path Analyzer – Understanding Routing and Security Decisions

What Path Analyzer Does

Why Path Analyzer Is Essential

How to Use Path Analyzer

Once you have given all the source, destination, port, protocol and direction details then run the analysis.

Common Use Cases

4. Common OCI Networking Troubleshooting Scenarios

Conclusion