Replicating and Recovering Application to a Different K8S Cluster
In this section we wil snapshot the Application components, replicate it to a second K8S cluster and recover.
Design
In this section there are two PC and individual PE connected. Each PE has a NKP cluster.
We will be replicating application from one NKP cluster in one PC to another NKP cluster in a different PC.
# | PC | PE | NKP Cluster |
---|---|---|---|
Source | PC-1 | PE-1 | nkpprimary |
Destination | PC-2 | PE-2 | nkpsecondary |
The following is the flow of the application recovery.
stateDiagram-v2
direction LR
state PC1 {
[*] --> PE1
PE1 --> nkpprimary
nkpprimary --> SourceApp
}
state PC2 {
[*] --> PE2
PE2 --> nkpsecondary
nkpsecondary --> DestinationApp
}
[*] --> PC1
PC1 --> PC2
PC2 --> [*]
Setup Destination NKP in Primary PC/PE/K8s
Make sure to name your NKP cluster appropriately so it is easy to identify
For the purposes of this lab, we will call the source NKP cluster as nkpsecondary
Follow instructions in NKP Deployment to setup destination/secondary NKP K8s cluster.
Replication Custom Resources
The steps include configuring the following NDK custom resources:
Custom Resource | Purpose |
---|---|
StorageCluster |
Defines the Nutanix storage fabric and UUIDs for secondary NKP cluster |
Remote |
Defines a target Kubernetes cluster for replication on the target NKP cluster |
ReplicationTarget |
Specifies where to replicate an application snapshot. |
ApplicationSnapshotReplication |
Triggers snapshot replication to another cluster. |
ApplicationSnapshotRestore |
Restores an application snapshot. |
Configure Availability Zones on PCs
To enable replication between two PC and underlying PE, we will need to configure Availability Zone bi-directionally.
- Logon to Primary PC and go to Administration > Availability Zones
- Click on Connect to Availability Zone
-
Choose Physical Location and enter the Secondary PC details
- IP Address for Remote PC - 10.x.x.x
- Username - admin
- Password - xxxxxxxxxxx
-
Click on Connect
- Confirm addition of remote PC
- Repeat steps 1 - 5 on the remote PC to configure access to Primary PC
Install NDK on Secondary NKP Cluster
- Login to VSCode Terminal
-
Set you NKP cluster KUBECONFIG
-
Test connection to
nkpsecondary
cluster$ kubectl get nodes NAME STATUS ROLES AGE VERSION nkpsec-md-0-fdrzg-clvf9-2gnqc Ready <none> 24h v1.32.3 nkpsec-md-0-fdrzg-clvf9-9msmd Ready <none> 24h v1.32.3 nkpsec-md-0-fdrzg-clvf9-hnjlm Ready <none> 24h v1.32.3 nkpsec-md-0-fdrzg-clvf9-t8t4l Ready <none> 24h v1.32.3 nkpsec-rhdh8-2xs7z Ready control-plane 24h v1.32.3 nkpsec-rhdh8-srm6h Ready control-plane 24h v1.32.3 nkpsec-rhdh8-wxbd9 Ready control-plane 24h v1.32.3
-
Install NDK
helm upgrade -n ntnx-system --install ndk chart/ \ --set manager.repository="$IMAGE_REGISTRY/ndk/manager" \ --set manager.tag=${NDK_VERSION} \ --set infraManager.repository="$IMAGE_REGISTRY/ndk/infra-manager" \ --set infraManager.tag=${NDK_VERSION} \ --set kubeRbacProxy.repository="$IMAGE_REGISTRY/ndk/kube-rbac-proxy" \ --set kubeRbacProxy.tag=${KUBE_RBAC_PROXY_VERSION} \ --set bitnamiKubectl.repository="$IMAGE_REGISTRY/ndk/bitnami-kubectl" \ --set bitnamiKubectl.tag=${KUBECTL_VERSION} \ --set jobScheduler.repository="$IMAGE_REGISTRY/ndk/job-scheduler" \ --set jobScheduler.tag=${NDK_VERSION} \ --set config.secret.name=nutanix-csi-credentials \ --set tls.server.enable=false
helm upgrade -n ntnx-system --install ndk chart/ \ --set manager.repository="harbor.example.com/nkp/ndk/manager" \ --set manager.tag=1.2.0 \ --set infraManager.repository="harbor.example.com/nkp/ndk/infra-manager" \ --set infraManager.tag=1.2.0 \ --set kubeRbacProxy.repository="harbor.example.com/nkp/ndk/kube-rbac-proxy" \ --set kubeRbacProxy.tag=v0.17.0 \ --set bitnamiKubectl.repository="harbor.example.com/nkp/ndk/bitnami-kubectl" \ --set bitnamiKubectl.tag=1.30.3 \ --set jobScheduler.repository="harbor.example.com/nkp/ndk/job-scheduler" \ --set jobScheduler.tag=1.2.0 \ --set config.secret.name=nutanix-csi-credentials \ --set tls.server.enable=false
-
Check if all NDK custom resources are running (4 of 4 containers should be running inside the
ndk-controller-manger
pod)Active namespace is "ntnx-system". $ k get all -l app.kubernetes.io/name=ndk NAME READY STATUS RESTARTS AGE pod/ndk-controller-manager-57fd7fc56b-gg5nl 4/4 Running 0 19m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/ndk-controller-manager-metrics-service ClusterIP 10.109.134.126 <none> 8443/TCP 19m service/ndk-intercom-service LoadBalancer 10.99.216.62 10.122.7.212 2021:30258/TCP 19m service/ndk-scheduler-webhook-service ClusterIP 10.96.174.148 <none> 9444/TCP 19m service/ndk-webhook-service ClusterIP 10.107.189.171 <none> 443/TCP 19m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/ndk-controller-manager 1/1 1 1 19m NAME DESIRED CURRENT READY AGE replicaset.apps/ndk-controller-manager-57fd7fc56b 1 1 1 19m
Configure NDK
The first component we would configure in NDK is StorageCluster
. This is used to represent the Nutanix Cluster components including the following:
- Prism Central (PC)
- Prism Element (PE)
By configuring StorageCluster
custom resource with NDK, we are providing Nutanix infrastructure information to NDK.
-
Logon to Jumphost VM Terminal in
VSCode
-
Get
uuid
of secondary PC and PE using the following command -
Add (append) the following environment variables and save it
export SECONDARY_PRISM_CENTRAL_UUID=_pc_uuid_from_previous_commands export SECONDARY_PRISM_ELEMENT_UUID=_pe_uuid_from_previous_commands export SECONDARY_SC_NAME=_storage_cluster_name export NDK_REPLICATION_CLUSTER_NAME=_secondary_cluster_name export KUBECONFIG=$HOME/nkp/_nkp_secondary_cluster_name.conf
-
Note and export the external IP assigned to the NDK intercom service on the Primary Cluster
-
Add (append) the following environment variables file
$HOME/ndk/.env
and save it -
Source the
.env
file -
Create the StorageCluster custom resource
-
Find and configure secondary NDK IP and port number
NDK Recover to the Secondary NKP Cluster
Since we have a sample workload configured on the primary NKP cluster, we will:
- Configure remote NKP cluster on the primary NKP cluster (using
Remote
ReplicationTarget
custom resources) - Replicate the snapshot of the sample workload from the primary NKP to secondary NKP (using
ApplicationSnapshotReplication
custom resource) - Restore the replicated snapshot on the secondary NKP to get the workloads (using
ApplicationSnapshotRestore
custom resource)
Create Remote Cluster on Primary NKP Cluster
-
Switch context to primary NKP cluster
nkpprimary
-
Create the Remote resource on the primary NKP cluster
-
Make sure the
Remote
cluster is healthy -
Create the replication target on the primary NKP cluster
-
Make sure the
ReplicationTarget
is healthy -
Replicate the Snapshot to the Replication Cluster
-
Monitor the progress of the replication and make sure to complete it
Status: Conditions: Last Transition Time: 2025-07-16T21:51:32Z Message: Observed Generation: 1 Reason: ReplicationComplete Status: True Type: Available Last Transition Time: 2025-07-16T21:51:32Z Message: Observed Generation: 1 Reason: ReplicationComplete Status: False Type: Progressing Replication Completion Percent: 100
Recover Application in Remote NKP Cluster
-
Switch context to secondary NKP cluster
nkpsecondary
-
Confirm if the
ApplicationSnapshot
has been replicated -
Restore the replicated
ApplicationSnapshot
-
Monitor the restore
-
Monitor the restore steps to understand the flow
NAME SNAPSHOT-NAME COMPLETED status: completed: true conditions: - lastTransitionTime: "2025-07-16T21:57:53Z" message: All prechecks passed and finalizers on dependent resources set observedGeneration: 1 reason: PrechecksPassed status: "True" type: PrechecksPassed - lastTransitionTime: "2025-07-16T21:59:00Z" message: All eligible application configs restored observedGeneration: 1 reason: ApplicationConfigRestored status: "True" type: ApplicationConfigRestored - lastTransitionTime: "2025-07-16T21:59:15Z" message: All eligible volumes restored observedGeneration: 1 reason: VolumesRestored status: "True" type: VolumesRestored - lastTransitionTime: "2025-07-16T21:59:15Z" message: Application restore successfully finalised observedGeneration: 1 reason: ApplicationRestoreFinalised status: "True" type: ApplicationRestoreFinalised finishTime: "2025-07-16 21:59:15" startTime: "2025-07-16 21:57:52"
-
Verify if app1 pvc and pod are restored
-
Check if data is present within the data mount
/data
inside the pod
We have successfully replicated application data to a secondary NKP cluster and recovered it using NDK.