Replicating and Recovering Application to a Different K8S Cluster

In this section we wil snapshot the Application components, replicate it to a second K8S cluster and recover.

Design

In this section there are two PC and individual PE connected. Each PE has a NKP cluster.

We will be replicating application from one NKP cluster in one PC to another NKP cluster in a different PC.

#	PC	PE	NKP Cluster
Source	PC-1	PE-1	`nkpprimary`
Destination	PC-2	PE-2	`nkpsecondary`

The following is the flow of the application recovery.

stateDiagram-v2
    direction LR

    state PC1 {
        [*] --> PE1
        PE1 --> nkpprimary
        nkpprimary --> SourceApp
    }

    state PC2 {
        [*] --> PE2
        PE2 --> nkpsecondary
        nkpsecondary --> DestinationApp
    }

    [*] --> PC1
    PC1 --> PC2
    PC2 --> [*]

Setup Destination NKP in Primary PC/PE/K8s

Make sure to name your NKP cluster appropriately so it is easy to identify

For the purposes of this lab, we will call the source NKP cluster as nkpsecondary

Follow instructions in NKP Deployment to setup destination/secondary NKP K8s cluster.

Replication Custom Resources

The steps include configuring the following NDK custom resources:

Custom Resource	Purpose
`StorageCluster`	Defines the Nutanix storage fabric and UUIDs for secondary NKP cluster
`Remote`	Defines a target Kubernetes cluster for replication on the target NKP cluster
`ReplicationTarget`	Specifies where to replicate an application snapshot.
`ApplicationSnapshotReplication`	Triggers snapshot replication to another cluster.
`ApplicationSnapshotRestore`	Restores an application snapshot.

Configure Availability Zones on PCs

To enable replication between two PC and underlying PE, we will need to configure Availability Zone bi-directionally.

Logon to Primary PC and go to Administration > Availability Zones
Click on Connect to Availability Zone
Choose Physical Location and enter the Secondary PC details
- IP Address for Remote PC - 10.x.x.x
- Username - admin
- Password - xxxxxxxxxxx
Click on Connect
Confirm addition of remote PC
Repeat steps 1 - 5 on the remote PC to configure access to Primary PC

Install NDK on Secondary NKP Cluster

Login to VSCode Terminal

Set you NKP cluster KUBECONFIG

Command Sample Command

export KUBECONFIG=$HOME/nkp/_nkp_secondary_cluster_name.conf

export KUBECONFIG=$HOME/nkp/nkpsecondary.conf

Test connection to nkpsecondary cluster

Command Sample Command

kubectl get nodes -owide

$ kubectl get nodes

NAME                            STATUS   ROLES           AGE   VERSION
nkpsec-md-0-fdrzg-clvf9-2gnqc   Ready    <none>          24h   v1.32.3
nkpsec-md-0-fdrzg-clvf9-9msmd   Ready    <none>          24h   v1.32.3
nkpsec-md-0-fdrzg-clvf9-hnjlm   Ready    <none>          24h   v1.32.3
nkpsec-md-0-fdrzg-clvf9-t8t4l   Ready    <none>          24h   v1.32.3
nkpsec-rhdh8-2xs7z              Ready    control-plane   24h   v1.32.3
nkpsec-rhdh8-srm6h              Ready    control-plane   24h   v1.32.3
nkpsec-rhdh8-wxbd9              Ready    control-plane   24h   v1.32.3

Install NDK

Command Sample Command Command output

helm upgrade -n ntnx-system --install ndk chart/ \
--set manager.repository="$IMAGE_REGISTRY/ndk/manager" \
--set manager.tag=${NDK_VERSION} \
--set infraManager.repository="$IMAGE_REGISTRY/ndk/infra-manager" \
--set infraManager.tag=${NDK_VERSION} \
--set kubeRbacProxy.repository="$IMAGE_REGISTRY/ndk/kube-rbac-proxy" \
--set kubeRbacProxy.tag=${KUBE_RBAC_PROXY_VERSION} \
--set bitnamiKubectl.repository="$IMAGE_REGISTRY/ndk/bitnami-kubectl" \
--set bitnamiKubectl.tag=${KUBECTL_VERSION} \
--set jobScheduler.repository="$IMAGE_REGISTRY/ndk/job-scheduler" \
--set jobScheduler.tag=${NDK_VERSION} \
--set config.secret.name=nutanix-csi-credentials \
--set tls.server.enable=false

helm upgrade -n ntnx-system --install ndk chart/ \
--set manager.repository="harbor.example.com/nkp/ndk/manager" \
--set manager.tag=1.2.0 \
--set infraManager.repository="harbor.example.com/nkp/ndk/infra-manager" \
--set infraManager.tag=1.2.0 \
--set kubeRbacProxy.repository="harbor.example.com/nkp/ndk/kube-rbac-proxy" \
--set kubeRbacProxy.tag=v0.17.0 \
--set bitnamiKubectl.repository="harbor.example.com/nkp/ndk/bitnami-kubectl" \
--set bitnamiKubectl.tag=1.30.3 \
--set jobScheduler.repository="harbor.example.com/nkp/ndk/job-scheduler" \
--set jobScheduler.tag=1.2.0 \
--set config.secret.name=nutanix-csi-credentials \
--set tls.server.enable=false

Release "ndk" does not exist. Installing it now.
NAME: ndk
LAST DEPLOYED: Mon Jul  7 06:33:28 2025
NAMESPACE: ntnx-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

Check if all NDK custom resources are running (4 of 4 containers should be running inside the ndk-controller-manger pod)

Command Sample Command

kubens ntnx-system
k get all -l app.kubernetes.io/name=ndk

Active namespace is "ntnx-system".

$ k get all -l app.kubernetes.io/name=ndk

NAME                                          READY   STATUS    RESTARTS   AGE
pod/ndk-controller-manager-57fd7fc56b-gg5nl   4/4     Running   0          19m

NAME                                             TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
service/ndk-controller-manager-metrics-service   ClusterIP      10.109.134.126   <none>         8443/TCP         19m
service/ndk-intercom-service                     LoadBalancer   10.99.216.62     10.122.7.212   2021:30258/TCP   19m
service/ndk-scheduler-webhook-service            ClusterIP      10.96.174.148    <none>         9444/TCP         19m
service/ndk-webhook-service                      ClusterIP      10.107.189.171   <none>         443/TCP          19m

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ndk-controller-manager   1/1     1            1           19m

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/ndk-controller-manager-57fd7fc56b   1         1         1       19m

Configure NDK

The first component we would configure in NDK is StorageCluster. This is used to represent the Nutanix Cluster components including the following:

Prism Central (PC)
Prism Element (PE)

By configuring StorageCluster custom resource with NDK, we are providing Nutanix infrastructure information to NDK.

Logon to Jumphost VM Terminal in VSCode
Command
cd $HOME/ndk

Get uuid of secondary PC and PE using the following command

Template Command Sample .command Command output

kubectl get node _any_nkp_node_name -o jsonpath='{.metadata.labels}' | grep -o 'csi\.nutanix\.com/[^,]*'

kubectl get node nkpsecondary-md-0-fdrzg-clvf9-t8t4l -o jsonpath='{.metadata.labels}' | grep -o 'csi\.nutanix\.com/[^,]*'

$ kubectl get node nkpsecondary-md-0-fdrzg-clvf9-t8t4l -o jsonpath='{.metadata.labels}' | grep -o 'csi\.nutanix\.com/[^,]*' 

csi.nutanix.com/prism-central-uuid":"cb5ca4e1-29d4-4a6f-91c7-xxxxxxxxxxxx"
csi.nutanix.com/prism-element-uuid":"000639fd-8cfa-9bf4-3d70-xxxxxxxxxxxx"

Add (append) the following environment variables and save it

Template .env Sample .env

export SECONDARY_PRISM_CENTRAL_UUID=_pc_uuid_from_previous_commands
export SECONDARY_PRISM_ELEMENT_UUID=_pe_uuid_from_previous_commands
export SECONDARY_SC_NAME=_storage_cluster_name
export NDK_REPLICATION_CLUSTER_NAME=_secondary_cluster_name
export KUBECONFIG=$HOME/nkp/_nkp_secondary_cluster_name.conf

export SECONDARY_PRISM_CENTRAL_UUID=ad0f1eb56-9ee6-4469-b21f-xxxxxxxxxx
export SECONDARY_PRISM_ELEMENT_UUID=00062f20-b2e0-fa8e-4b04-xxxxxxxxxx
export SECONDARY_SC_NAME=secondary-storage-cluster
export NDK_REPLICATION_CLUSTER_NAME=nkpsecondary
export KUBECONFIG=$HOME/nkp/nkpsecondary.conf

Note and export the external IP assigned to the NDK intercom service on the Primary Cluster

export SECONDARY_NDK_IP=$(k get svc -n ntnx-system ndk-intercom-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $SECONDARY_NDK_IP

Add (append) the following environment variables file $HOME/ndk/.env and save it

Template .env

export SECONDARY_NDK_PORT=2021
export SECONDARY_NDK_IP=$(k get svc -n ntnx-system ndk-intercom-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Source the .env file
Command
source $HOME/ndk/.env

Create the StorageCluster custom resource

Command Command Output

kubectl apply -f -<<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: StorageCluster
metadata:
 name: $SECONDARY_SC_NAME
spec:
 storageServerUuid: $SECONDARY_PRISM_ELEMENT_UUID
 managementServerUuid: $SECONDARY_PRISM_CENTRAL_UUID
EOF

storagecluster.dataservices.nutanix.com/secondary-storage-cluster created

Find and configure secondary NDK IP and port number

NDK Recover to the Secondary NKP Cluster

Since we have a sample workload configured on the primary NKP cluster, we will:

Configure remote NKP cluster on the primary NKP cluster (using Remote ReplicationTarget custom resources)
Replicate the snapshot of the sample workload from the primary NKP to secondary NKP (using ApplicationSnapshotReplication custom resource)
Restore the replicated snapshot on the secondary NKP to get the workloads (using ApplicationSnapshotRestore custom resource)

Create Remote Cluster on Primary NKP Cluster

Switch context to primary NKP cluster nkpprimary

Command Sample Command

export KUBECONFIG=$HOME/nkp/_nkp_primary_cluster_name.conf

export KUBECONFIG=$HOME/nkp/nkpprimary.conf

Create the Remote resource on the primary NKP cluster

Command Sample Command Command output

kubectl apply -f - <<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: Remote
metadata:
  name: ${NDK_REPLICATION_CLUSTER_NAME}
spec:
  ndkServiceIp: ${SECONDARY_NDK_IP}
  ndkServicePort: ${SECONDARY_NDK_PORT}
EOF

kubectl apply -f - <<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: Remote
metadata:
name: nkpsecondary
spec:
ndkServiceIp: 10.38.19.32
ndkServicePort: 2021
EOF

remote.dataservices.nutanix.com/nkpsecondary created

Make sure the Remote cluster is healthy

Sample Command Command output

kubectl describe remote.dataservices.nutanix.com/nkpsecondary

kubectl describe remote.dataservices.nutanix.com/nkpsecondary

Status:
Conditions:
    Last Transition Time:  2025-07-16T21:29:38Z
    Message:               
    Observed Generation:   1
    Reason:                Healthy
    Status:                True
    Type:                  Available
Events:                    <none>

Create the replication target on the primary NKP cluster

Command Sample Command Command output

kubectl apply -f -<<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: ReplicationTarget
metadata:
  name: ${NDK_REPLICATION_CLUSTER_NAME}
  namespace: default
spec:
  remoteName: ${NDK_REPLICATION_CLUSTER_NAME}
EOF

kubectl apply -f -<<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: ReplicationTarget
metadata:
  name: nkpsecondary
  namespace: default
spec:
  remoteName: nkpsecondary
EOF

replicationtarget.dataservices.nutanix.com/nkpsecondary created

Make sure the ReplicationTarget is healthy

Sample Command Command output

kubectl describe replicationtarget.dataservices.nutanix.com/nkpsecondary

kubectl describe replicationtarget.dataservices.nutanix.com/nkpsecondary

status:
  conditions:
  - lastTransitionTime: "2025-07-16T21:31:06Z"
    message: ""
    observedGeneration: 1
    reason: Healthy
    status: "True"
    type: Available

Replicate the Snapshot to the Replication Cluster

Command Command output

k apply -f -<<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: ApplicationSnapshotReplication
metadata:
  name: replcation-1
  namespace: default
spec:
  applicationSnapshotName: app1-snap
  replicationTargetName: ${NDK_REPLICATION_CLUSTER_NAME}
EOF

k apply -f -<<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: ApplicationSnapshotReplication
metadata:
  name: replcation-1
  namespace: default
spec:
  applicationSnapshotName: app1-snap
  replicationTargetName: ${NDK_REPLICATION_CLUSTER_NAME}
EOF

Monitor the progress of the replication and make sure to complete it

Command Command output

kubectl describe ApplicationSnapshotReplication

Status:
  Conditions:
    Last Transition Time:          2025-07-16T21:51:32Z
    Message:                       
    Observed Generation:           1
    Reason:                        ReplicationComplete
    Status:                        True
    Type:                          Available
    Last Transition Time:          2025-07-16T21:51:32Z
    Message:                       
    Observed Generation:           1
    Reason:                        ReplicationComplete
    Status:                        False
    Type:                          Progressing
  Replication Completion Percent:  100

Recover Application in Remote NKP Cluster

Switch context to secondary NKP cluster nkpsecondary

Command Sample Command

export KUBECONFIG=$HOME/nkp/_nkp_secondary_cluster_name.conf

export KUBECONFIG=$HOME/nkp/nkpsecondary.conf

Confirm if the ApplicationSnapshot has been replicated

Command Command output

kubectl get ApplicationSnapshot -n default

k get applicationsnapshot -n default

NAMESPACE   NAME        AGE   READY-TO-USE   BOUND-SNAPSHOTCONTENT                                  SNAPSHOT-AGE
default     app1-snap   8m   true           asc-aee3f794-190c-403b-a245-bcac8859bb88-19815381630   8m

Restore the replicated ApplicationSnapshot

Command Command output

# Restore
kubectl apply -f - <<EOF
apiVersion: dataservices.nutanix.com/v1alpha1
kind: ApplicationSnapshotRestore
metadata:
  name: app1-restore
spec:
  applicationSnapshotName: app1-snap
EOF

applicationsnapshotrestore.dataservices.nutanix.com/app1-restore created

Monitor the restore

Command Command output

k get applicationsnapshotrestore.dataservices.nutanix.com/app1-restore -w

NAME           SNAPSHOT-NAME   COMPLETED
app1-restore   app1-snap       false
app1-restore   app1-snap       false
app1-restore   app1-snap       true

Monitor the restore steps to understand the flow

Command Command output

k describe applicationsnapshotrestore.dataservices.nutanix.com/app1-restore

NAME           SNAPSHOT-NAME   COMPLETED
status:
completed: true
conditions:
- lastTransitionTime: "2025-07-16T21:57:53Z"
  message: All prechecks passed and finalizers on dependent resources set
  observedGeneration: 1
  reason: PrechecksPassed
  status: "True"
  type: PrechecksPassed
- lastTransitionTime: "2025-07-16T21:59:00Z"
  message: All eligible application configs restored
  observedGeneration: 1
  reason: ApplicationConfigRestored
  status: "True"
  type: ApplicationConfigRestored
- lastTransitionTime: "2025-07-16T21:59:15Z"
  message: All eligible volumes restored
  observedGeneration: 1
  reason: VolumesRestored
  status: "True"
  type: VolumesRestored
- lastTransitionTime: "2025-07-16T21:59:15Z"
  message: Application restore successfully finalised
  observedGeneration: 1
  reason: ApplicationRestoreFinalised
  status: "True"
  type: ApplicationRestoreFinalised
finishTime: "2025-07-16 21:59:15"
startTime: "2025-07-16 21:57:52"

Verify if app1 pvc and pod are restored

Command Command Output

kubectl get po,pvc -l app=app1

$  kubectl get po,pvc -l app=app1

NAME        READY   STATUS    RESTARTS   AGE
pod/app-1   1/1     Running   0          4m53s

NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/az-claim-1   Bound    pvc-55a6b812-35de-4db2-b0e1-8a55e1b4e41f   4Gi        RWO            nutanix-volume   <unset>                 5m54s

Check if data is present within the data mount /data inside the pod

Command Command Output

kubectl exec -it app-1 -- /bin/sh -c "wc -l /data/abc.txt"

kubectl exec -it app-1 -- /bin/sh -c "wc -l /data/abc.txt"
10000 /data/abc.txt

We have successfully replicated application data to a secondary NKP cluster and recovered it using NDK.