Deploying GPT-in-a-Box NVD Reference Application using GitOps (FluxCD)

stateDiagram-v2
    direction LR

    state DeployLLMV1 {
        [*] --> BootStrapMgmtCluster
        BootStrapMgmtCluster -->  BootStrapDevCluster
        BootStrapDevCluster --> MonitorResourcesDeployment
        MonitorResourcesDeployment --> [*]
    }

    [*] --> PreRequisites
    PreRequisites --> DeployLLMV1 
    DeployLLMV1 --> TestLLMApp : next section
    TestLLMApp --> [*]

Bootstrap Management Cluster

A .envfile is provided at $HOME/nainai-llm-fleet-infra folder for ease of configuration. We need to make copies of this for mgmt-cluster and dev-cluster kubernetes clusters that you deployed in the previous sections.

Set K8S_CLUSTER_NAME environment variable and make a copy of ./.env.sample.yaml for mgmt-cluster kubernetes cluster
```
export K8S_CLUSTER_NAME=mgmt-cluster
cp ./.env.sample.yaml ./.env.${K8S_CLUSTER_NAME}.yaml
```
Open .env.mgmt-cluster.yaml file in VSC

Change the highlighted fields to match your information (see Example file)

Note

There are a few yaml key value pair blocks of configuration to be updated in .env.mgmt-cluster.yaml file

Remember to use your own information for the following:

Github repo and api token
Docker registry information - for container downloads without rate limiting
Prism Central/Element details
Nutanix Objects store and bucket details (for Milvus)
Two IPs for KubeVIP to assign to Ingress and Istio
Nutanix NFS share to store the llama-2-13b-chat model

Template fileExample file

.env.sample.yaml
k8s_cluster:

  ## kubernetes distribution - supported "nke" "kind"
  distribution: nke
  ## kubernetes cluster name
  name: _required
  ## cluster_profile_type - anything under clusters/_profiles (e.g., llm-management, llm-workloads, etc.)
  profile: _required
  ## environment name - based on profile selected under clusters/_profiles/<profile>/<environment> (e.g., prod, non-prod, etc.)
  environment: _required

  ## docker hub registry configs
  registry:
    docker_hub:
      user: _required
      password: _required

  ## nvidia gpu specific configs
  gpu_operator:
    enabled: false
    version: v23.9.0
    cuda_toolkit_version: v1.14.3-centos7
    ## time slicing typically only configured on dev scenarios. 
    ## ideal for jupyter notebooks
    time_slicing:
      enabled: false
      replica_count: 2

flux:
  ## flux specific configs for github repo
  github:
    repo_url: _required
    repo_user: _required
    repo_api_token: _required

infra:
  ## Global nutanix configs
  nutanix:
    ## Nutanix Prism Creds, required to download NKE creds
    prism_central:
      enabled: false
      # endpoint: _required_if_enabled
      # user: _required_if_enabled
      # password: _required_if_enabled

    ## Nutanix Objects Store Configs
    objects:
      enabled: false
      # host: _required_if_enabled
      # port: _required_if_enabled
      # region: _required_if_enabled
      # use_ssl: _required_if_enabled
      # access_key: _required_if_enabled
      # secret_key: _required_if_enabled

services:
  #####################################################
  ## Required variables for kube-vip and depedent services
  ## kube-vip specific configs required for any services needing to be configured with LoadBalancer Virtual IP Addresses
  kube_vip:
    enabled: false
    ## Used to configure default global IPAM pool. A minimum of 2 ips should be provide in a range
    ## For Example: ipam_range: 172.20.0.22-172.20.0.23
    #ipam_range: _required_if_enabled

  ## required for all platform services that are leveraging nginx-ingress
  nginx_ingress:
    enabled: false
    version: 4.8.3
    ## Virtual IP Address (VIP) dedicated for nginx-ingress controller. 
    ## This will be used to configure kube-vip IPAM pool to provide Services of Type: LoadBalancer
    ## Example: vip: 172.20.0.20
    #vip: _required_if_enabled

    ## NGINX Wildcard Ingress Subdomain used for all default ingress objects created within cluster 
    ## For DEMO purposes, it is common to prefix subdomain with cluster-name as each cluster would require dedicated wildcard domain.
    ## EXISTING A Host DNS Records are pre-requisites. Example: If DNS is equal to *.example.com, then value is example.com
    ## For DEMO purposes, you can leverage the NIP.IO with the nginx_ingress vip and self-signed certificates. 
    ## For Example: wildcard_ingress_subdomain:flux-kind-local.172.20.0.20.nip.io
    #wildcard_ingress_subdomain: _required_if_enabled

    ## Wildcard Ingress Subdomain for management cluster.
    ## For DEMO purposes, you can leverage the NIP.IO with the nginx_ingress vip and self-signed certificates
    #management_cluster_ingress_subdomain: _required_if_enabled

  istio:
    enabled: false
    version: 1.17.2
    ## Virtual IP Address (VIP) dedicated for istio ingress gateway. 
    ## This will be used to configure kube-vip IPAM pool to provide Services of Type: LoadBalancer
    ## This address should be mapped to wildcard_ingress_subdomain defined below. For Example: vip: 172.20.0.21
    #vip: _required_if_enabled

    ## Istio Ingress Gateway - Wildcard Subdomain used for all knative/kserve llm inference endpoints. 
    ## EXISTING A Host DNS Records are pre-requisites. Example: If DNS is equal to *.llm.example.com, then value is llm.example.com
    ## If leveraging AWS Route 53 DNS with Let's Encrypt (below), make sure to enable/configure AWS credentials needed to 
    ## support CertificateSigningRequests using ACME DNS Challenges.
    ## For DEMO purposes, you can leverage the NIP.IO with the nginx_ingress vip and self-signed certificates. 
    ## For Example: llm.flux-kind-local.172.20.0.21.nip.io
    #wildcard_ingress_subdomain: _required_if_enabled

  cert_manager:
    ## if enabled - cluster issuer will be self-signed-issuer
    enabled: false
    version: v1.13.5
    ## if aws_route53_acme_dns.enabled - the cluster issuer across all services will be set to "letsencrypt-issuer"
    ## Following AWS Route53 Access Creds required for Lets Encrypt ACME DNS Challenge
    ## For additional details, https://cert-manager.io/docs/configuration/acme/dns01/route53/
    ## minimum supported cert-manager version is v1.9.1 https://cert-manager.io/docs/releases/release-notes/release-notes-1.9/#v191
    aws_route53_acme_dns:
      enabled: false
      # email: _required_if_enabled
      # zone: _required_if_enabled
      # hosted_zone_id: _required_if_enabled
      # region: _required_if_enabled
      # key_id: _required_if_enabled
      # key_secret: _required_if_enabled

  ## do not disable kyverno unless you know what you're doing
  ## this is needed to keep docker hub creds synchronized between namespaces.
  kyverno:
    enabled: true
    version: 3.1.4

  ## the following versions and dependencies kserve are aligned with GPT In A Box Opendocs
  ## the only exception is with cert-manager due to usage of aws route 53
  ## https://opendocs.nutanix.com/gpt-in-a-box/kubernetes/v0.2/getting_started/

  kserve:
    enabled: false
    version: v0.11.2

  knative_serving:
    enabled: false
    version: knative-v1.10.1

  knative_istio:
    enabled: false
    version: knative-v1.10.0

  ## The following components are leveraged to support Nutanix Validated Designs
  ## The NVD for GPT in a Box leverages a RAG Pipeline with Serverless Functions 
  ## to demonstrate end to end workflow with Nutanix Integration

  ## Milvus is vector database 
  milvus:
    enabled: false
    version: 4.1.13
    milvus_bucket_name: milvus

  ## Knative Eventing used to receive Event notifications from Nutanix Objects Document Bucket
  knative_eventing:
    enabled: false
    version: knative-v1.10.1

  ## Kafka is messaging broker used by both knative eventing Document Ingestion serverless function
  ## and integrates with Nutanix Objects Events Notification Kafka Endpoints
  ## Kafka is also leveraged by Milvus as a Messaging Broker for Milvus related events, as opposed to the default Apache Pulsar
  kafka:
    enabled: false
    version: 26.8.5

  ## OpenTelemetry Collector version is used for both the Deployment and Daemon is used to collect data for monitoring
  opentelemetry_collector:
    enabled: false
    version: 0.80.1

  ## OpenTelemetry Operator is used to deploy opentelemetry components
  opentelemetry_operator:
    enabled: false
    version: 0.47.0

  ## Uptrace is Observability / Monitoring UI
  uptrace:
    enabled: false
    version: 1.5.7

  ## Jupyterhub is deployed on non-prod workload clusters in NVD Reference
  jupyterhub:
    enabled: false
    version: 3.1.0

  redis:
    enabled: false
    version: 18.1.6

  elasticsearch:
    enabled: false
    version: 19.13.10

  kubernetes_dashboard:
    enabled: false
    version: 7.3.2

  weave_gitops:
    enabled: true
    version: 4.0.36

apps:
  ## Required GPT NVD Reference Application Helm Chart Configs
  gptnvd_reference_app:
    enabled: false
    version: 0.2.7
    #documents_bucket_name: documents01
  ## Required NAI LLM Helm Chart Configs
  ### huggingFaceToken required when useExistingNFS. This will download model when llm is initialized
  nai_helm:
    enabled: false
    version: 0.1.1
    #model: llama2_7b_chat
    #revision: 94b07a6e30c3292b8265ed32ffdeccfdadf434a8
    #maxTokens: 4000
    #repPenalty: 1.2
    #temperature: 0.2
    #topP: 0.9
    #useExistingNFS: false
    #nfs_export: /llm-model-store
    #nfs_server: _required
    #huggingFaceToken: _required

.env.mgmt-cluster.yaml
k8s_cluster:

  ## kubernetes distribution - supported "nke" "kind"
  distribution: nke
  ## kubernetes cluster name
  name: mgmt-cluster
  ## cluster_profile_type - anything under clusters/_profiles (e.g., llm-management, llm-workloads, etc.)
  profile: llm-management
  ## environment name - based on profile selected under clusters/_profiles/<profile>/<environment> (e.g., prod, non-prod, etc.)
  environment: non-prod

  ## docker hub registry configs
  registry:
    docker_hub:
      user: your_docker_username
      password: your_docker_password

  ## nvidia gpu specific configs
  gpu_operator:
    enabled: false
    version: v23.9.0
    cuda_toolkit_version: v1.14.3-centos7
    ## time slicing typically only configured on dev scenarios. 
    ## ideal for jupyter notebooks
    time_slicing:
      enabled: false
      replica_count: 2

flux:
  ## flux specific configs for github repo
  github:
    repo_url: https://github.com/<your_github_org>/nai-llm-fleet-infra.git
    repo_user: your_github_username
    repo_api_token: your_github_api_token

infra:
  ## Global nutanix configs
  nutanix:
    ## Nutanix Prism Creds, required to download NKE creds
    prism_central:
      enabled: true
      endpoint: <PC FQDN>
      user: <PC user>
      password: <PC password>

    ## Nutanix Objects Store Configs
    objects:
      enabled: true
      host: objects.example.com
      port: 80
      region: us-east-1
      use_ssl: false
      access_key: your_bucket_access_key
      secret_key: your_bucket_secret_key

services:
  #####################################################
  ## Required variables for kube-vip and depedent services
  ## kube-vip specific configs required for any services needing to be configured with LoadBalancer Virtual IP Addresses
  kube_vip:
    enabled: true
    ## Used to configure default global IPAM pool. A minimum of 2 ips should be provide in a range
    ## For Example: ipam_range: 172.20.0.22-172.20.0.23
    ipam_range: 10.x.x.214-10.x.x.215

  ## required for all platform services that are leveraging nginx-ingress
  nginx_ingress:
    enabled: true
    version: 4.8.3
    ## Virtual IP Address (VIP) dedicated for nginx-ingress controller. 
    ## This will be used to configure kube-vip IPAM pool to provide Services of Type: LoadBalancer
    ## Example: vip: 172.20.0.20
    vip: 10.x.x.214

    ## NGINX Wildcard Ingress Subdomain used for all default ingress objects created within cluster 
    ## For DEMO purposes, it is common to prefix subdomain with cluster-name as each cluster would require dedicated wildcard domain.
    ## EXISTING A Host DNS Records are pre-requisites. Example: If DNS is equal to *.example.com, then value is example.com
    ## For DEMO purposes, you can leverage the NIP.IO with the nginx_ingress vip and self-signed certificates. 
    ## For Example: wildcard_ingress_subdomain:flux-kind-local.172.20.0.20.nip.io
    wildcard_ingress_subdomain: mgmt-cluster.10.x.x.214.nip.io

    ## Wildcard Ingress Subdomain for management cluster.
    ## For DEMO purposes, you can leverage the NIP.IO with the nginx_ingress vip and self-signed certificates
    management_cluster_ingress_subdomain: mgmt-cluster.10.x.x.214.nip.io

  istio:
    enabled: false
    version: 1.17.2
    ## Virtual IP Address (VIP) dedicated for istio ingress gateway. 
    ## This will be used to configure kube-vip IPAM pool to provide Services of Type: LoadBalancer
    ## This address should be mapped to wildcard_ingress_subdomain defined below. For Example: vip: 172.20.0.21
    #vip: _required_if_enabled

    ## Istio Ingress Gateway - Wildcard Subdomain used for all knative/kserve llm inference endpoints. 
    ## EXISTING A Host DNS Records are pre-requisites. Example: If DNS is equal to *.llm.example.com, then value is llm.example.com
    ## If leveraging AWS Route 53 DNS with Let's Encrypt (below), make sure to enable/configure AWS credentials needed to 
    ## support CertificateSigningRequests using ACME DNS Challenges.
    ## For DEMO purposes, you can leverage the NIP.IO with the nginx_ingress vip and self-signed certificates. 
    ## For Example: llm.flux-kind-local.172.20.0.21.nip.io
    #wildcard_ingress_subdomain: _required_if_enabled

  cert_manager:
    ## if enabled - cluster issuer will be self-signed-issuer
    enabled: false
    version: v1.13.5
    ## if aws_route53_acme_dns.enabled - the cluster issuer across all services will be set to "letsencrypt-issuer"
    ## Following AWS Route53 Access Creds required for Lets Encrypt ACME DNS Challenge
    ## For additional details, https://cert-manager.io/docs/configuration/acme/dns01/route53/
    ## minimum supported cert-manager version is v1.9.1 https://cert-manager.io/docs/releases/release-notes/release-notes-1.9/#v191
    aws_route53_acme_dns:
      enabled: false
      # email: _required_if_enabled
      # zone: _required_if_enabled
      # hosted_zone_id: _required_if_enabled
      # region: _required_if_enabled
      # key_id: _required_if_enabled
      # key_secret: _required_if_enabled

  ## do not disable kyverno unless you know what you're doing
  ## this is needed to keep docker hub creds synchronized between namespaces.
  kyverno:
    enabled: true
    version: 3.1.4

  ## the following versions and dependencies kserve are aligned with GPT In A Box Opendocs
  ## the only exception is with cert-manager due to usage of aws route 53
  ## https://opendocs.nutanix.com/gpt-in-a-box/kubernetes/v0.2/getting_started/

  kserve:
    enabled: false
    version: v0.11.2

  knative_serving:
    enabled: false
    version: knative-v1.10.1

  knative_istio:
    enabled: false
    version: knative-v1.10.0

  ## The following components are leveraged to support Nutanix Validated Designs
  ## The NVD for GPT in a Box leverages a RAG Pipeline with Serverless Functions 
  ## to demonstrate end to end workflow with Nutanix Integration

  ## Milvus is vector database 
  milvus:
    enabled: true
    version: 4.1.13
    milvus_bucket_name: mgmt-cluster-milvus

  ## Knative Eventing used to receive Event notifications from Nutanix Objects Document Bucket
  knative_eventing:
    enabled: false
    version: knative-v1.10.1

  ## Kafka is messaging broker used by both knative eventing Document Ingestion serverless function
  ## and integrates with Nutanix Objects Events Notification Kafka Endpoints
  ## Kafka is also leveraged by Milvus as a Messaging Broker for Milvus related events, as opposed to the default Apache Pulsar
  kafka:
    enabled: true
    version: 26.8.5

  ## OpenTelemetry Collector version is used for both the Deployment and Daemon is used to collect data for monitoring
  opentelemetry_collector:
    enabled: true
    version: 0.80.1

  ## OpenTelemetry Operator is used to deploy opentelemetry components
  opentelemetry_operator:
    enabled: true
    version: 0.47.0

  ## Uptrace is Observability / Monitoring UI
  uptrace:
    enabled: true
    version: 1.5.7

  ## Jupyterhub is deployed on non-prod workload clusters in NVD Reference
  jupyterhub:
    enabled: false
    version: 3.1.0

  redis:
    enabled: false
    version: 18.1.6

  elasticsearch:
    enabled: false
    version: 19.13.10

  kubernetes_dashboard:
    enabled: false
    version: 7.3.2

  weave_gitops:
    enabled: true
    version: 4.0.36

apps:
  ## Required GPT NVD Reference Application Helm Chart Configs
  gptnvd_reference_app:
    enabled: false
    version: 0.2.7
    #documents_bucket_name: documents01
  ## Required NAI LLM Helm Chart Configs
  ### huggingFaceToken required when useExistingNFS. This will download model when llm is initialized
  nai_helm:
    enabled: false
    version: 0.1.1
    #model: llama2_7b_chat
    #revision: 94b07a6e30c3292b8265ed32ffdeccfdadf434a8
    #maxTokens: 4000
    #repPenalty: 1.2
    #temperature: 0.2
    #topP: 0.9
    #useExistingNFS: false
    #nfs_export: /llm-model-store
    #nfs_server: _required
    #huggingFaceToken: _required

Install workstation packages and export krew path

task workstation:install-packages

export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"

Generate and Validate Configurations

task bootstrap:generate_cluster_configs

Verify the generated cluster configs

cat .local/${K8S_CLUSTER_NAME}/.env
cat clusters/${K8S_CLUSTER_NAME}/platform/cluster-configs.yaml

Validate Encrypted Secrets and make sure the values match what you entered in .env.mgmt-cluster.yaml file
```
task sops:decrypt
```

Select New (or Switching to Existing) Cluster and Download NKE creds for mgmt-cluster

eval $(task nke:switch-shell-env) && \
task nke:download-creds && \
kubectl get nodes

# command execution example

$ eval $(task nke:switch-shell-env) && \
task nke:download-creds && \
kubectl get nodes
Select existing cluster instance to load from .local/ directory.

> mgmt-cluster                          <<< choose mgmt-cluster.

Run Flux Bootstrapping - task bootstrap:silent
```
task bootstrap:silent
```
Note

This may take up to 10 minutes.

If there are any issues, update local git repo, push up changes and run task flux:reconcile
Monitor on New Terminal to make sure READY status is TRUE for all resources using the following command
```
eval $(task nke:switch-shell-env) && \
task flux:watch
```
Note

If there are any issues, update local git repo, push up changes and run task flux:reconcile

Set Kafka Endpoint in Nutanix Objects

After successful bootstrap of the mgmt-cluster, get the Kafka ingress endpoint to set the value in Nutanix Objects store.

Nutanix Objects store will send a message to kafka endpoint if an object gets stored in the bucket.

On VSC terminal on the jumpbox VM, get the ingress endpoints

kubectl get ingress -A | grep kafka

NAMESPACE      NAME                          CLASS   HOSTS                                         ADDRESS      PORTS     AGE
kafka          kafka-ingress                 nginx   kafka.mgmt-cluster.10.x.x.214.nip.io          10.x.x.214   80        3h2m

Copy the URL value in HOSTS column (note this will be different for you) and add the port number 9096 as follows
```
kafka.mgmt-cluster.10.x.x.214.nip.io:9096
```

Check if the Kafka endpoint is alive and well

nc -zv kafka.mgmt-cluster.10.x.x.214.nip.io 9096

# command output
Connection to kafka.mgmt-cluster.10.x.x.214.nip.io port 9096 [tcp/*] succeeded!`

Login to Prism Central, go to Objects and choose the ntnx-objects store (Objects store name could be different for you)
Go to Settings > Notification Endpoints
Choose the Kafka tab
Toggle the Enable button to enabled
Paste the ingress endpoint of your Kafka instance

9. Click on Save

Configure documents01 Bucket to send Messages to Kafka Endpoint

Go to Buckets
Click on documents01 bucket and choose Data Event Notification from the top menu
Click on Add Rule
Choose the following:
Endpoint - Kafka
Scope - All Objects
Data Events - All Events
Click on Save

Check Milvus Database Status

To make sure Milvus database and associated components are running.

On VSC terminal, check if the Kafka endpoint is alive and well

nc -zv milvus.mgmt-cluster.10.x.x.214.nip.io 19530

# command output
Connection to milvus.mgmt-cluster.10.x.x.214.nip.io port 19530 [tcp/*] succeeded!`

Get the Milvus ingress endpoint

kubectl get ingress -A | grep attu

NAMESPACE      NAME                          CLASS   HOSTS                                       ADDRESS      PORTS     AGE
milvus         milvus-milvus-vectordb-attu   nginx   attu.mgmt-cluster.10.x.x.214.nip.io         10.x.x.214   80, 443   3h2m

Copy the URL value in HOSTS column (note this will be different for you)
```
attu.mgmt-cluster.10.x.x.214.nip.io
```
Paste the URL in the browser and you should be able to see Milvus database management page.
There is no user name and password for Milvus database as this is a test environment. Feel free to update password for root user in the user settings.