Deploying Nutanix Enterprise AI (NAI) NVD Reference Application

Version 2.6.0

This version of the NAI deployment is based on the Nutanix Enterprise AI (NAI) v2.6.0 release.

stateDiagram-v2
    direction LR

    state DeployNAI {
        [*] --> DeployNAIAdmin
        DeployNAIAdmin -->  InstallSSLCert
        InstallSSLCert --> DownloadModel
        DownloadModel --> CreateNAI
        CreateNAI --> [*]
    }

    [*] --> PreRequisites
    PreRequisites --> DeployNAI 
    DeployNAI --> TestNAI : next section
    TestNAI --> [*]

Prepare for NAI Deployment

Changes in NAI v2.6.0

Kserve is of at least of v0.15.0
Cert-manager is at least of v1.17.2
OpenTelemetry operator is at least of v0.102.0

Enable NKP Applications through NKP GUI

Enable these NKP Operators from NKP GUI.

Note

In this lab, we will be using the Management Cluster Workspace to deploy our Nutanix Enterprise AI (NAI)

However, in a customer environment, it is recommended to use a separate workload NKP cluster.

In the NKP GUI, Go to Clusters
Click on Management Cluster Workspace
Go to Applications
Search and enable the following applications: follow this order to install dependencies for NAI application
- Kube-prometheus-stack: version 71.0.0 or later (pre-installed on NKP cluster)

Enable Pre-requisite Applications

Early Access(EA)/Technical Preview(TP) Software with NAI v2.6.0

In this lab, we will deploy EA and TP version of the following software to test the following:

Envoy Gateway in a AI Gateway mode
Nutanix Enterprise AI
- Unified Endpoints - multiple endpoints for HA and token-based rate limiting
- Providers - Add remote endpoints from providers to utilize their models in Nutanix Enterprise AI workloads.

We will enable the following pre-requisite applications through command line:

Envoy Gateway v1.6.3
Kserve: v0.15.0 in raw deployment mode

Note

The following application are pre-installed on NKP cluster with Pro license

Cert Manager v1.17.2 or higher

Check if Cert Manager is installed (pre-installed on NKP cluster)

CommandOutput

kubectl get deploy -n cert-manager

$ kubectl get deploy -n cert-manager

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
cert-manager              1/1     1            1           145m
cert-manager-cainjector   1/1     1            1           145m
cert-manager-webhook      1/1     1            1           145m

If not installed, use the following command to install it

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.2/cert-manager.yaml

Open Terminal in VSCode
Run the command to load the environment variables
Command
source $HOME/.env

Install Envoy Gateway CRDs v1.6.3

Command Command output

helm template eg oci://docker.io/envoyproxy/gateway-crds-helm \
--version v1.6.3 \
--set crds.gatewayAPI.enabled=true \
--set crds.envoyGateway.enabled=true \
| kubectl apply --server-side --force-conflicts -f -

Pulled: docker.io/envoyproxy/gateway-crds-helm:v1.6.3
Digest: sha256:e94d3fdf5d4cb08e2c8efa8c1da133b9804c2e88a3acb4d0e20adb8755a60174
customresourcedefinition.apiextensions.k8s.io/backendtlspolicies.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/tcproutes.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/tlsroutes.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/udproutes.gateway.networking.k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/xbackendtrafficpolicies.gateway.networking.x-k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/xlistenersets.gateway.networking.x-k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/xmeshes.gateway.networking.x-k8s.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/backends.gateway.envoyproxy.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/backendtrafficpolicies.gateway.envoyproxy.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/clienttrafficpolicies.gateway.envoyproxy.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/envoyextensionpolicies.gateway.envoyproxy.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/envoypatchpolicies.gateway.envoyproxy.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/envoyproxies.gateway.envoyproxy.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/httproutefilters.gateway.envoyproxy.io serverside-applied
customresourcedefinition.apiextensions.k8s.io/securitypolicies.gateway.envoyproxy.io serverside-applied

Create the AI Gateway mode configuration template file eg-config-for-gateway-mode.yaml to enable advanced features

eg-config-for-gateway-mode.yaml

config:
envoyGateway:
  gateway:
    controllerName: "gateway.envoyproxy.io/gatewayclass-controller"
  logging:
    level:
      default: "info"
  provider:
    kubernetes:
      rateLimitDeployment:
        container:
          image: "docker.io/envoyproxy/ratelimit:99d85510"
        patch:
          type: "StrategicMerge"
          value:
            spec:
              template:
                spec:
                  containers:
                    - imagePullPolicy: "IfNotPresent"
                      name: "envoy-ratelimit"
                      image: "docker.io/envoyproxy/ratelimit:99d85510"
    type: "Kubernetes"
  extensionApis:
    enableEnvoyPatchPolicy: true
    enableBackend: true
  extensionManager:
    maxMessageSize: 11Mi
    backendResources:
      - group: inference.networking.k8s.io
        kind: InferencePool
        version: v1
    hooks:
      xdsTranslator:
        translation:
          listener:
            includeAll: true
          route:
            includeAll: true
          cluster:
            includeAll: true
          secret:
            includeAll: true
        post:
          - "Translation"
          - "Cluster"
          - "Route"
    service:
      fqdn:
        hostname: "ai-gateway-controller.nai-system.svc.cluster.local"
        port: 1063
  rateLimit:
    backend:
      type: "Redis"
      redis:
        url: "redis-sentinel.nai-system.svc.cluster.local:6379"

Install Envoy Gateway v1.6.3

Command Output

helm upgrade --install eg oci://docker.io/envoyproxy/gateway-helm --version v1.6.3 \
-n envoy-gateway-system --create-namespace --skip-crds \
-f ./eg-config-for-gateway-mode.yaml

Pulled: docker.io/envoyproxy/gateway-helm:v1.6.3
Digest: sha256:6dca101fdc0d41c702c1070eb42db119a2768a33388ba28041ae615cbe262aaf
Release "eg" has been upgraded. Happy Helming!
NAME: eg
LAST DEPLOYED: Sat Apr  4 08:34:21 2026
NAMESPACE: envoy-gateway-system
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
**************************************************************************
*** PLEASE BE PATIENT: Envoy Gateway may take a few minutes to install ***
**************************************************************************

Envoy Gateway is an open source project for managing Envoy Proxy as a standalone or Kubernetes-based application gateway.

Thank you for installing Envoy Gateway! 🎉

Your release is named: eg. 🎉

Your release is in namespace: envoy-gateway-system. 🎉

To learn more about the release, try:

  $ helm status eg -n envoy-gateway-system
  $ helm get all eg -n envoy-gateway-system

To have a quickstart of Envoy Gateway, please refer to https://gateway.envoyproxy.io/latest/tasks/quickstart.

To get more details, please visit https://gateway.envoyproxy.io and https://github.com/envoyproxy/gateway.

Check if Envoy Gateway resources are ready

Command Output

kubectl wait --timeout=5m -n envoy-gateway-system deployment/envoy-gateway \
--for=condition=Available

kubectl wait --timeout=5m -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available
#
deployment.apps/envoy-gateway condition met

Open $HOME/.env file in VSCode
Add (append) the following line and save it
```
export KSERVE_VERSION=v0.15.0
```
Load the .env variables
Command
source $HOME/.env

Install kserve using the following commands

Command Output

helm upgrade --install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
--version ${KSERVE_VERSION} -n kserve --create-namespace --wait

helm upgrade --install kserve oci://ghcr.io/kserve/charts/kserve \
--version ${KSERVE_VERSION} --namespace kserve --create-namespace --wait \
--set kserve.controller.deploymentMode=RawDeployment \
--set kserve.controller.gateway.disableIngressCreation=true

Pulled: ghcr.io/kserve/charts/kserve-crd:v0.15.0
Digest: sha256:57ad1a5475fd625cb558214ba711752aa77b7d91686a391a5f5320cfa72f3fa8
Release "kserve-crd" has been upgraded. Happy Helming!
NAME: kserve-crd
LAST DEPLOYED: Sat Apr  4 08:40:05 2026
NAMESPACE: kserve
STATUS: deployed
REVISION: 2
TEST SUITE: None

Pulled: ghcr.io/kserve/charts/kserve:v0.15.0
Digest: sha256:905abce80e975c53b40fba7a12b0b9a1e24bdf65cceebb88fba4ef62bba01406
Release "kserve" has been upgraded. Happy Helming!
NAME: kserve
LAST DEPLOYED: Sat Apr  4 08:41:02 2026
NAMESPACE: kserve
STATUS: deployed
REVISION: 2
TEST SUITE: None

Check if kserve pods are running

Command Output

kubens kserve
kubectl get pods    # (1)!

Make sure both the containers are running for kserve-controller-manager pod

NAME                                         READY   STATUS    RESTARTS   AGE
kserve-controller-manager-69b6dbf9cf-ft55b   2/2     Running   0          4d23h

Install OpenTelemetry Operator:

Command Output

helm upgrade --install opentelemetry-operator opentelemetry-operator \
--repo https://open-telemetry.github.io/opentelemetry-helm-charts \
--version=0.102.0 -n opentelemetry \
--create-namespace --wait

Release "opentelemetry-operator" has been upgraded. Happy Helming!
NAME: opentelemetry-operator
LAST DEPLOYED: Sat Apr  4 08:42:43 2026
NAMESPACE: opentelemetry
STATUS: deployed
REVISION: 2
NOTES:
[WARNING] No resource limits or requests were set. Consider setter resource requests and limits via the `resources` field.


opentelemetry-operator has been installed. Check its status by running:
  kubectl --namespace opentelemetry get pods -l "app.kubernetes.io/instance=opentelemetry-operator"

Visit https://github.com/open-telemetry/opentelemetry-operator for instructions on how to create & configure OpenTelemetryCollector and Instrumentation custom resources by using the Operator.

Note

It may take a few minutes for each application to be up and running. Monitor the deployment to make sure that these applications are running before moving on to the next section.

Deploy NAI

We will use the Docker login credentials we created in the previous section to download the NAI Docker images.

Change the Docker login credentials

The following Docker based environment variable values need to be changed from your own Docker environment variables to the credentials downloaded from Nutanix Portal.

$DOCKER_NAI_USERNAME
$DOCKER_NAI_ASSWORD
$DOCKER_NAI_EMAIL

Open $HOME/.env file in VSCode

Add (append) the following environment variables and save it

Template .env Sample .env

export REGISTRY_SECRET_NAME=_k8s_secret_for_nai
export DOCKER_SERVER=https://index.docker.io/v1/
export DOCKER_NAI_USERNAME=_GA_release_docker_username
export DOCKER_NAI_PASSWORD=_GA_release_docker_password
export DOCKER_NAI_EMAIL=_GA_release_docker_email
export NAI_CORE_VERSION=_GA_release_nai_core_version
export NAI_API_RWX_STORAGECLASS=_nkp_rwx_storage_class
export NAI_DEFAULT_RWO_STORAGECLASS=_nkp_rwo_storage_class
export NKP_WORKSPACE_NAMESPACE=_nkp_workspace name

export REGISTRY_SECRET_NAME=nai-regcred
export DOCKER_SERVER=https://index.docker.io/v1/
export DOCKER_NAI_USERNAME=ntnxsvcgpt
export DOCKER_NAI_PASSWORD=dckr_pat_XXXXXXXXXXXXXXXXXXXXXXXXX
export DOCKER_NAI_EMAIL=ntnxsvcgpt
export NAI_CORE_VERSION=2.6.0
export NAI_API_RWX_STORAGECLASS=nai-nfs-storage
export NAI_DEFAULT_RWO_STORAGECLASS=nutanix-volume
export NKP_WORKSPACE_NAMESPACE=kommander-workspace

Source the environment variables
Command
source $HOME/.env

Create the nai-system namespace to install Nutanix Enterprise AI

Command Command output

kubectl create namespace nai-system --dry-run=client -o yaml | kubectl apply -f -

kubectl create namespace nai-system --dry-run=client -o yaml | kubectl apply -f -
#
namespace/nai-system created

Create docker registry Secrets in both nai-systemand envoy-gateway-system namespaces.

Command Command output

kubectl -n nai-system create secret docker-registry ${REGISTRY_SECRET_NAME} \
--docker-server=${DOCKER_SERVER} \
--docker-username=${DOCKER_NAI_USERNAME} \
--docker-password=${DOCKER_NAI_PASSWORD} \
--docker-email=${DOCKER_NAI_EMAIL} \
--dry-run=client -o yaml | kubectl apply -f -

kubectl -n envoy-gateway-system create secret docker-registry ${REGISTRY_SECRET_NAME} \
 --docker-server=${DOCKER_SERVER} \
 --docker-username=${DOCKER_NAI_USERNAME} \
 --docker-password=${DOCKER_NAI_PASSWORD} \
 --docker-email=${DOCKER_NAI_EMAIL} \
 --dry-run=client -o yaml | kubectl apply -f -

secret/nai-regcred configured

secret/nai-regcred configured

Add NAI helm charts

Command

helm repo add ntnx-charts https://nutanix.github.io/helm-releases
helm repo update ntnx-charts

Install NAI operator

Command Command output

helm upgrade --install nai-operators ntnx-charts/nai-operators --version=$NAI_CORE_VERSION  -n nai-system --create-namespace --wait \
--set "naiAIGateway.enabled=true" \
--set "global.imagePullSecrets[0].name=${REGISTRY_SECRET_NAME}" \
--insecure-skip-tls-verify

Release "nai-operators" has been upgraded. Happy Helming!
NAME: nai-operators
LAST DEPLOYED: Sat Apr  4 08:58:31 2026
NAMESPACE: nai-system
STATUS: deployed
REVISION: 3
TEST SUITE: None

Check if all NAI operator pods are running

Command Command output

kubens nai-system
kubectl get pods

ai-gateway-controller-6b786974b5-sqvft                  1/1     Running       0          3m5s
nai-operators-nai-clickhouse-operator-f8f666db9-xzjj8   2/2     Running       0          3m5s
redis-standalone-684f6dd8f7-hlwpl                       2/2     Running       0          3m5s

Install NAI core in AI gateway mode

Info

This installation takes about 5-10 minutes depending on the available resources

Command Sample command Command output

helm upgrade --install nai-core ntnx-charts/nai-core --version=$NAI_CORE_VERSION  -n nai-system --create-namespace --wait \
--set "global.imagePullSecrets[0].name=${REGISTRY_SECRET_NAME}" \
--set "naiAIGateway.enabled=true" \
--set "naiApi.storageClassName=${NAI_API_RWX_STORAGECLASS}" \
--set "defaultStorageClassName=${NAI_DEFAULT_RWO_STORAGECLASS}" \
--set "naiMonitoring.opentelemetry.storageClassName=${NAI_API_RWX_STORAGECLASS}" \
--set "nai-clickhouse-keeper.clickhouseKeeper.storage.storageClass=${NAI_DEFAULT_RWO_STORAGECLASS}" \
--set "nai-clickhouse-server.clickhouse.storage.storageClass=${NAI_DEFAULT_RWO_STORAGECLASS}" \
--set "naiMonitoring.nodeExporter.serviceMonitor.namespaceSelector.matchNames[0]=${NKP_WORKSPACE_NAMESPACE}" \
--set "naiMonitoring.dcgmExporter.serviceMonitor.namespaceSelector.matchNames[0]=${NKP_WORKSPACE_NAMESPACE}" \
--insecure-skip-tls-verify

helm upgrade --install nai-core ntnx-charts/nai-core --version=2.6.0  -n nai-system --create-namespace --wait \
--set "global.imagePullSecrets[0].name=nai-regcred" \
--set "naiAIGateway.enabled=true" \
--set "naiApi.storageClassName=nai-nfs-storage" \
--set "defaultStorageClassName=nutanix-volume" \
--set "naiMonitoring.opentelemetry.storageClassName=nai-nfs-storage" \
--set "nai-clickhouse-keeper.clickhouseKeeper.storage.storageClass=nutanix-volume" \
--set "nai-clickhouse-server.clickhouse.storage.storageClass=nutanix-volume" \
--set "naiMonitoring.nodeExporter.serviceMonitor.namespaceSelector.matchNames[0]=kommander" \
--set "naiMonitoring.dcgmExporter.serviceMonitor.namespaceSelector.matchNames[0]=kommander" \
--insecure-skip-tls-verify

NAME: nai-core
LAST DEPLOYED: Sat Apr  4 11:03:09 2026
NAMESPACE: nai-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

Verify that the NAI Core Pods are running and healthy - there should be more jobs completing and pods coming up to establish NAI functionality

Command Command output

kubens nai-system
kubectl get po,deploy

Active namespace is "nai-system".

NAME                                                    READY   STATUS      RESTARTS   AGE
ai-gateway-controller-6b786974b5-6gqt6                  1/1     Running     0          13m
chi-nai-clickhouse-server-chcluster1-0-0-0              1/1     Running     0          2m1s
chk-nai-clickhouse-keeper-chkeeper-0-0-0                1/1     Running     0          105s
iam-database-bootstrap-vaszw-s7mpb                      0/1     Completed   0          2m16s
iam-proxy-686fff8f6d-cbgjs                              1/1     Running     0          2m16s
iam-proxy-control-plane-854c76b8cc-tblt8                1/1     Running     0          2m16s
iam-themis-757776777b-mq9pz                             1/1     Running     0          2m15s
iam-themis-bootstrap-oeiws-7vmcr                        0/1     Completed   0          2m16s
iam-ui-7f6bb5b477-cb9tz                                 1/1     Running     0          2m16s
iam-user-authn-78d6b7d8df-tsgs4                         1/1     Running     0          2m16s
nai-api-557d94c66f-hxx57                                1/1     Running     0          2m16s
nai-api-db-migrate-g2bni-mbdt4                          0/1     Completed   0          2m16s
nai-clickhouse-schema-job-1775300590-4gfnh              0/1     Completed   0          2m16s
nai-db-0                                                1/1     Running     0          2m16s
nai-iep-model-controller-77f44f88c-ndpp2                1/1     Running     0          2m16s
nai-labs-86cb964886-xhdxk                               1/1     Running     0          2m16s
nai-oauth2-proxy-bdf7f85cf-2nxp5                        1/1     Running     0          2m15s
nai-oidc-client-registration-fbb1j-jdwt5                0/1     Completed   0          2m16s
nai-operators-nai-clickhouse-operator-f8f666db9-z8vfc   2/2     Running     0          13m
nai-otel-collector-collector-94mm9                      1/1     Running     0          2m14s
nai-otel-collector-collector-cpdx6                      1/1     Running     0          2m14s
nai-otel-collector-collector-pqk5q                      1/1     Running     0          2m14s
nai-otel-collector-collector-qxg5r                      1/1     Running     0          2m14s
nai-otel-collector-collector-rpl4f                      1/1     Running     0          2m14s
nai-otel-collector-collector-wf2fl                      1/1     Running     0          2m14s
nai-otel-collector-collector-wz5qz                      1/1     Running     0          2m14s
nai-otel-collector-targetallocator-fbc8688d7-c9vm5      1/1     Running     0          2m14s
nai-ui-8648bd7dbc-mgf5z                                 1/1     Running     0          2m16s
redis-standalone-684f6dd8f7-7rz2v                       2/2     Running     0          13m

The Prometheus monitoring from the NKP catalog has specific RBAC rules applied. Create the required clusterRole to enable Nutanix Enterprise AI to fetch metrics

Command Command output

kubectl patch clusterrole nai-otel-role --type='json' -p='[
  {
    "op": "add",
    "path": "/rules/-",
    "value": {
      "apiGroups": [""],
      "resources": ["services/kube-prometheus-stack-prometheus-node-exporter"],
      "verbs": ["get"]
    }
  }
]'

kubectl patch servicemonitor nai-node-exporter-monitor -n nai-system --type='json' -p='[
  {"op": "add", "path": "/spec/endpoints/0/bearerTokenFile", "value": "/var/run/secrets/kubernetes.io/serviceaccount/token"},
  {"op": "replace", "path": "/spec/endpoints/0/scheme", "value": "https"},
  {"op": "add", "path": "/spec/endpoints/0/tlsConfig", "value": {"insecureSkipVerify": true}}
]'

clusterrole.rbac.authorization.k8s.io/nai-otel-role patched

servicemonitor.monitoring.coreos.com/nai-node-exporter-monitor patched

Install SSL Certificate and Gateway Elements

In this section we will install SSL Certificate to access the NAI UI. This is required as the endpoint will only work with a ssl endpoint with a valid certificate.

NAI UI is accessible using the Ingress Gateway.

The following steps show how cert-manager can be used to generate a self signed certificate using the default selfsigned-issuer present in the cluster.

If you are using Public Certificate Authority (CA) for NAI SSL Certificate

If an organization generates certificates using a different mechanism then obtain the certificate + key and create a kubernetes secret manually using the following command:

kubectl -n istio-system create secret tls nai-cert --cert=path/to/nai.crt --key=path/to/nai.key

Skip the steps in this section to create a self-signed certificate resource.

Get the NAI UI ingress gateway host using the following command:

Command

NAI_UI_ENDPOINT=$(kubectl get svc -n envoy-gateway-system -l "gateway.envoyproxy.io/owning-gateway-name=nai-ingress-gateway,gateway.envoyproxy.io/owning-gateway-namespace=nai-system" -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}' | grep -v '^$' || kubectl get svc -n envoy-gateway-system -l "gateway.envoyproxy.io/owning-gateway-name=nai-ingress-gateway,gateway.envoyproxy.io/owning-gateway-namespace=nai-system" -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}')

Get the value of NAI_UI_ENDPOINT environment variable

=== " Command"
```
```bash
echo $NAI_UI_ENDPOINT
```
```
Command output
10.x.x.216
We will use the command output e.g: 10.x.x.216 as the IP address for NAI as reserved in this section
Construct the FQDN of NAI UI using nip.io and we will use this FQDN as the certificate's Common Name (CN).
Template URLSample URL
nai.${NAI_UI_ENDPOINT}.nip.io
nai.10.x.x.216.nip.io

Create the ingress resource certificate using the following command:

Command Command output

cat << EOF | k apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: nai-cert
  namespace: nai-system
spec:
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
  secretName: nai-cert
  commonName: nai.${NAI_UI_ENDPOINT}.nip.io
  dnsNames:
  - nai.${NAI_UI_ENDPOINT}.nip.io
  ipAddresses:
  - ${NAI_UI_ENDPOINT}
EOF

certificate.cert-manager.io/nai-cert created

Patch the Envoy gateway with the nai-cert certificate details

Command Command output

kubectl patch gateway nai-ingress-gateway -n nai-system --type='json' -p='[{"op": "replace", "path": "/spec/listeners/1/tls/certificateRefs/0/name", "value": "nai-cert"}]'

gateway.gateway.networking.k8s.io/nai-ingress-gateway patched

Create EnvoyProxy

Command Command output

k apply -f -<<EOF
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: envoy-service-config
  namespace: nai-system
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyService:
        type: LoadBalancer
EOF

envoyproxy.gateway.envoyproxy.io/envoy-service-config created

Patch the nai-ingress-gateway resource with the new EnvoyProxy details

Command Command output

kubectl patch gateway nai-ingress-gateway -n nai-system --type=merge \
-p '{
    "spec": {
        "infrastructure": {
            "parametersRef": {
                "group": "gateway.envoyproxy.io",
                "kind": "EnvoyProxy",
                "name": "envoy-service-config"
            }
        }
    }
}'

gateway.gateway.networking.k8s.io/nai-ingress-gateway patched

Accessing the UI

In a browser, open the following URL to connect to the NAI UI
```
https://nai.10.x.x.216.nip.io
```
Change the password for the admin user
Login using admin user and password.

Download Model

We will download and user llama3 8B model which we sized for in the previous section.

In the NAI GUI, go to Models
Click on Import Model from Hugging Face
Choose the meta-llama/Meta-Llama-3.1-8B-Instruct model
Input your Hugging Face token that was created in the previous section and click Import
Provide the Model Instance Name as Meta-Llama-3.1-8B-Instruct and click Import

Go to VSC Terminal to monitor the download

Command Command output

Get jobs in nai-admin namespace

kubens nai-admin

kubectl get jobs

Validate creation of pods and PVC

kubectl get po,pvc

Verify download of model using pod logs

kubectl logs -f _pod_associated_with_job

Get jobs in nai-admin namespace

kubens nai-admin

✔ Active namespace is "nai-admin"

kubectl get jobs

NAME                                       COMPLETIONS   DURATION   AGE
nai-c0d6ca61-1629-43d2-b57a-9f-model-job   0/1           4m56s      4m56

Validate creation of pods and PVC

kubectl get po,pvc

NAME                                             READY   STATUS    RESTARTS   AGE
nai-c0d6ca61-1629-43d2-b57a-9f-model-job-9nmff   1/1     Running   0          4m49s

NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
nai-c0d6ca61-1629-43d2-b57a-9f-pvc-claim   Bound    pvc-a63d27a4-2541-4293-b680-514b8b890fe0   28Gi       RWX            nai-nfs-storage   <unset>                 2d

Verify download of model using pod logs

kubectl logs -f nai-c0d6ca61-1629-43d2-b57a-9f-model-job-9nmff 

/venv/lib/python3.9/site-packages/huggingface_hub/file_download.py:983: UserWarning: Not enough free disk space to download the file. The expected file size is: 0.05 MB. The target location /data/model-files only has 0.00 MB free disk space.
warnings.warn(
tokenizer_config.json: 100%|██████████| 51.0k/51.0k [00:00<00:00, 3.26MB/s]
tokenizer.json: 100%|██████████| 9.09M/9.09M [00:00<00:00, 35.0MB/s]<00:30, 150MB/s]
model-00004-of-00004.safetensors: 100%|██████████| 1.17G/1.17G [00:12<00:00, 94.1MB/s]
model-00001-of-00004.safetensors: 100%|██████████| 4.98G/4.98G [04:23<00:00, 18.9MB/s]
model-00003-of-00004.safetensors: 100%|██████████| 4.92G/4.92G [04:33<00:00, 18.0MB/s]
model-00002-of-00004.safetensors: 100%|██████████| 5.00G/5.00G [04:47<00:00, 17.4MB/s]
Fetching 16 files: 100%|██████████| 16/16 [05:42<00:00, 21.43s/it]:33<00:52, 9.33MB/s]
## Successfully downloaded model_files|██████████| 5.00G/5.00G [04:47<00:00, 110MB/s] 

Deleting directory : /data/hf_cache

Optional - verify the events in the namespace for the pvc creation

Command Command output

k get events | awk '{print $1, $3}'

$ k get events | awk '{print $1, $3}'

3m43s Scheduled
3m43s SuccessfulAttachVolume
3m36s Pulling
3m29s Pulled
3m29s Created
3m29s Started
3m43s SuccessfulCreate
90s   Completed
3m53s Provisioning
3m53s ExternalProvisioning
3m45s ProvisioningSucceeded
3m53s PvcCreateSuccessful
3m48s PvcNotBound
3m43s ModelProcessorJobActive
90s   ModelProcessorJobComplete

The model is downloaded to the Nutanix Files pvc volume.

After a successful model import, you will see it in Active status in the NAI UI under Models menu

Create and Test Inference Endpoint

In this section we will create an inference endpoint using the downloaded model.

Navigate to Inference Endpoints menu and click on Create Endpoint button
Fill the following details based on GPU or CPU availability:

Tip

NAI from v2.3 can host a model up to 7 billion parameters on CPU only Nutanix nodes
GPU AccessCPU Access
- Endpoint Name: llama-8b
- Model Instance Name: Meta-LLaMA-8B-Instruct
- Use GPUs for running the models : Checked
- No of GPUs (per instance):
- GPU Card: NVIDIA-L40S (or other available GPU)
- No of Instances: 1
- API Keys: Create a new API key or use an existing one
- Endpoint Name: llama-8b
- Model Instance Name: Meta-LLaMA-8B-Instruct
- Use GPUs for running the models : leave unchecked
- No of Instances: 1
- API Keys: Create a new API key or use an existing one
Click on Create

Monitor the nai-admin namespace to check if the services are coming up

Command Command output

kubens nai-admin
kubectl get po,deploy

kubens nai-admin
get po,deploy
NAME                                                     READY   STATUS        RESTARTS   AGE
pod/llama8b-predictor-00001-deployment-9ffd786db-6wkzt   2/2     Running       0          71m

NAME                                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/llama8b-predictor-00001-deployment   1/1     1            0           3d17h

Check the events in the nai-admin namespace for resource usage to make sure there are no errors

Command Command output

kubectl get events -n nai-admin --sort-by='.lastTimestamp' | awk '{print $1, $3, $5}'

$ kubectl get events -n nai-admin --sort-by='.lastTimestamp' | awk '{print $1, $3, $5}'

110s FinalizerUpdate Updated
110s FinalizerUpdate Updated
110s RevisionReady Revision
110s ConfigurationReady Configuration
110s LatestReadyUpdate LatestReadyRevisionName
110s Created Created
110s Created Created
110s Created Created
110s InferenceServiceReady InferenceService
110s Created Created

Once the services are running, check the status of the inference service

Command Command output

kubectl get isvc

kubectl get isvc

NAME      URL                                          READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION       AGE
llama8b   http://llama8b.nai-admin.svc.cluster.local   True           100                              llama8b-predictor-00001   3d17h