Pre-requisites for MGMT and DEV Cluster

In this part of the lab we will prepare pre-requisites for LLM application on GPU nodes.

The following is the flow of the applications lab:

stateDiagram-v2
    direction LR

    state PreRequisites {
        [*] --> ReserveIPs
        ReserveIPs --> CreateBuckets
        CreateBuckets --> CreateFilesShare
        CreateFilesShare --> [*]
    }


    [*] --> PreRequisites
    PreRequisites --> DeployLLMV1 : next section
    DeployLLMV1 --> TestLLMApp
    TestLLMApp --> [*]

Prepare the following pre-requisites for mgmt-cluster and dev-cluster kubernetes clusters.

Fork and Clone GiaB NVD Gitops Repository

Warning

The following steps are only required if Deploying GPT-In-A-Box v1 using NVD GitOps workflow

Open the following URL and fork the repo to your Github org
```
https://github.com/jesse-gonzalez/nai-llm-fleet-infra.git
```
2. From VSC, logon to your jumphost VM (if not already done) 3. Open Terminal

From the $HOME directory, clone the fork of your sol-cnai-infra git repo and change working directory

CommandSample command

git clone https://github.com/_your_github_org/sol-cnai-infra.git
cd $HOME/sol-cnai-infra/

git clone https://github.com/rahuman/sol-cnai-infra.git
cd $HOME/sol-cnai-infra/

Finally set your github config

git config --user.email "your_github_email"
git config --user.name "your_github_username"

In VSCode > Terminal Login to your Github account using the following command:

gh auth login # (1)

If you do not have gh client installed, see Github CLI Installation Docs.

# Execution example

❯ gh auth login                                                                                                               ─╯
? What account do you want to log into? GitHub.com
? What is your preferred protocol for Git operations on this host? HTTPS 
? Authenticate Git with your GitHub credentials? Yes
? How would you like to authenticate GitHub CLI?  [Use arrows to move, type to filter]
    Login with a web browser
>   Paste an authentication token

Successfully logged in to Github.

Now the jumphost VM is ready for deploying our app. We will do this in the next section.

Reserve Ingress and Istio Endpoint IPs

Nutanix AHV IPAM network allows you to black list IPs that needs to be reserved for specific application endpoints. We will use this feature to find and reserve four IPs.

We will need a total of four IPs for the following:

Cluster Role	Cluster Name	Ingress IP	Istio IP
Management	`mgmt-cluster`	1	1
Dev	`dev-cluster`	1	1

Get the CIDR range for the AHV network(subnet) where the application will be deployed
CIDR example for your Nutanix cluster
```
10.x.x.0/24
```
From VSC, logon to your jumpbox VM (if not already done)
Open Terminal

Install nmap tool (if not already done)

cd $HOME/nai-llm-fleet-infra
devbox add nmap

Find four unused static IP addresses in the subnet

Template commandSample command

nmap -v -sn  <your CIDR>

nmap -v -sn 10.x.x.0/24

Sample output - choose the first four consecutive IPs

Nmap scan report for 10.x.x.214 [host down]
Nmap scan report for 10.x.x.215 [host down]
Nmap scan report for 10.x.x.216 [host down]
Nmap scan report for 10.x.x.217 [host down]
Nmap scan report for 10.x.x.218
Host is up (-0.098s latency).

Logon to any CVM in your Nutanix cluster and execute the following to add chosen static IPs to the AHV IPAM network

Username: nutanix
Password: your Prism Element password

Template commandSample command

acli net.add_to_ip_blacklist <your-ipam-ahv-network> \
ip_list=10.x.x.214,10.x.x.215,10.x.x.216,10.x.x.217

acli net.add_to_ip_blacklist User1 \
ip_list=10.x.x.214,10.x.x.215,10.x.x.216,10.x.x.217

Create Nginx Ingress and Istio VIP/FDQN

We will use nip.io address to assign FQDNs for our Nginx Ingress and Istio by using the 4 IPs that we just reserved in the previous section for use in the next section. We will leverage the NIP.IO with the vip and self-signed certificates.

We will need a total of four IPs for the following:

Management Cluster

Assign the first two reserved IPs to Management cluster.

Component	Sub-component	IP/FQDN
Ingress	`vip`	`10.x.x.214`
Ingress Wildcard	`wildcard_ingress_subdomain`	`mgmt-cluster.10.x.x.214.nip.io`
Ingress Subdomain	`management_cluster_ingress_subdomain`	`mgmt-cluster.10.x.x.214.nip.io`
Reserved for future	`troubleshooting or debugging`	`10.x.x.215`

Note

We only need 1 IP for Management cluster. However, KubeVIP needs a range of at least two IPs. We will reserve the second IP for future use and/or troubleshooting purposes.

Dev Cluster

Assign the next two reserved IPs to Dev cluster.

Note

The management_cluster_ingress_subdomain appears in this table once again and it is just a reference for dev-cluster to mgmt-cluster. This entry will be used in the .env.dev-cluster.yaml file during the Deploy Dev Cluster section.

Component	Sub-component	IP/FQDN
Nginx Ingress	`vip`	`10.x.x.216`
Nginx Ingress	`wildcard_ingress_subdomain`	`dev-cluster.10.x.x.216.nip.io`
Nginx Ingress	`management_cluster_ingress_subdomain`	`mgmt-cluster.10.x.x.214.nip.io`
Istio	`vip`	`10.x.x.217`
Istio	`wildcard_ingress_subdomain`	`dev-cluster.10.x.x.217.nip.io`

Create Buckets in Nutanix Objects

We will create access keys to buckets that we will be using in the project.

Generating Access Keys for Buckets

Note

Follow instructions here to create a Nutanix Objects Store (if you do not have it)

We are assuming that the name of the Objects Store is ntnx-objects.

Go to Prism Central > Objects > ntnx-objects
On the right-hand pane, click on Access Keys
Click on + Add people
Select Add people not in a directory service
Enter an email llm-admin@example.com and name llm-admin
Click on Next
Click on Generate Keys
Once generated, click on Download Keys
Once downloaded, click on Close

Open the downloaded file to verify contents

Username: llm-admin@example.com
Access Key: 1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Secret Key: gxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Display Name: llm-admin

Store the access key and secret key in a safe place for access

Create Buckets

We will create buckets for Milvus database store and document store for uploaded files for querying will be stored.

On the top menu, click on Object Stores
Click on ntnx-objects
Click on Create Bucket
Enter mgmt-cluster-milvus as the bucket name
Click on Create
Follow the same steps to create another bucket called documents01

Provide Access to Buckets

In the list of buckets, click on the mgmt-cluster-milvus bucket
Click on User Access menu and Edit User Access
In the mgmt-cluster-milvus window, type in the llm-admin@example.com email that you configured in the Generating Access Keys for Buckets section
Give Full Access permissions
Click on Save
Follow the same steps to give Full Access to the llm-admin@example.com email for documents01 bucket

Create NFS share for hosting the LLM model file llama-2-13b-chat and model archive file

Note

Follow instructions here to create a Nutanix Files cluster (if you do not have it)

We are assuming that the name of the Files cluster is ntnx-files.

Go to Prism Central > Files > ntnx-files
Click on Shares & Exports
Click on + New Share or Export
Enter the following details:
- Name - llm-model-store
- Enable compression - checked
- Authentication - system
- Default Access - Read-Write
- Squash - Root Squash
Click on Create
Copy the Share/Export Path from the list of shares and note it down for later use (e.g: /llm-model-store)

The LLM application will use the model archive file (MAR) stored in the file share. A few commands need to be executed to download and extract the model file from Hugging Face to the Files share.

Note

The following steps are directly from opendocs.nutanix.com GPT-in-a-Box documentation.

Logon to the jumphost VM you created in the previous section
```
ssh -l ubuntu <jumphost vm IP>
```

Download nutanix package and extract it.

curl -LO https://github.com/nutanix/nai-llm-k8s/archive/refs/tags/v0.2.2.tar.gz
tar xvf v0.2.2.tar.gz --strip-components=1

Install pip
```
sudo apt-get install python3-pip
```
Install the python library requirements
```
cd llm
pip install -r requirements.txt
```

Mount the file share created in the previous section

Template commandExample command

sudo mount -t nfs <files server fqdn>:<share path> <NFS_LOCAL_MOUNT_LOCATION>

sudo mount -t nfs ntnx-files.pe.example.com:/llm-model-store /mnt/llm-model-store

Download and extract the model file to the local mount of file share

Template commandExample command

python3 generate.py [--hf_token <HUGGINGFACE_HUB_TOKEN> \
--repo_version <REPO_COMMIT_ID>] --model_name <MODEL_NAME> \
--output <NFS_LOCAL_MOUNT_LOCATION>

python3 generate.py --model_name llama2_7b_chat \
--output /mnt/llm-model-store \
--hf_token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxx

# Sample output

## Starting model files download

Deleted all contents from '/mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download' 

The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download 

The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download/tmp_hf_cache 

generation_config.json: 100%|██████████████████████████████████████████████████████████████████████| 188/188 [00:00<00:00, 1.15MB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 614/614 [00:00<00:00, 7.51MB/s]
LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████| 7.02k/7.02k [00:00<00:00, 81.3MB/s]
USE_POLICY.md: 100%|███████████████████████████████████████████████████████████████████████████| 4.77k/4.77k [00:00<00:00, 11.2MB/s]
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 10.1MB/s]
README.md: 100%|████████████████████████████████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 113MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████| 1.62k/1.62k [00:00<00:00, 13.5MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.22MB/s]
model.safetensors.index.json: 100%|████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 13.5MB/s]
pytorch_model.bin.index.json: 100%|████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 12.4MB/s]
tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 6.12MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 7.42MB/s]
model-00002-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████| 3.50G/3.50G [00:23<00:00, 149MB/s]
model-00001-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████| 9.98G/9.98G [01:01<00:00, 163MB/s]
Fetching 14 files: 100%|████████████████████████████████████████████████████████████████████████████| 14/14 [01:02<00:00,  4.47s/it]
Deleted all contents from '/mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download/tmp_hf_cache' MB/s]

## Successfully downloaded model_files


## Generating MAR file for custom model files: llama2_7b_chat 

The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/model-store 

## Generating MAR file, will take few mins.
## Successfully generated MAR files

## Generating MAR file, will take few mins.

Model Archive File is Generating...

Creating Model Archive:  42%|███████████████████████████▉                                      | 4.97G/11.7G [10:40<12:56, 8.69MB/s]Creating Model Archive: 100%|██████████████████████████████████████████████████████████████████| 11.7G/11.7G [21:01<00:00, 9.29MB/s]

Model Archive file size: 9.66 GB

## llama2_7b_chat.mar is generated.

The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/config

Prepare Github Repository and API Token

We need to fork this projects Github repository to you github organization(hadle). This repository will be used to hold the flux files for GitOps sections.

A repo_api_token needs to be created to allow for git changes.

Log in to GitHub: go to GitHub and log in to your account.

Fork the following source repository

https://github.com/jesse-gonzalez/nai-llm-fleet-infra

Access Settings: click on your profile picture in the top-right corner and select Settings from the dropdown menu.
Developer Settings: scroll down in the left sidebar and click on Developer settings.
Personal Access Tokens: in the left sidebar, click on Personal access tokens and then Tokens (classic).
Generate New Token: click on the Generate new token button. You might need to re-enter your GitHub password.
Configure Token:
- Give your token a descriptive name: for nai-llm-fleet-infra actions
- Expiration: choose an expiration period of 7 days
- Scopes:
- Select repo (and every option under it)
- Select write:packages and read:packages
- Select admin:org and read:org under it
- Select gist
Generate Token: after selecting the scopes, click on the Generate token button at the bottom of the page.
Copy Token: GitHub will generate the token and display it. Copy this token and store it securely. It can't be seen again.

Single repository access?

Restricting Token to a Single Repository

GitHub's PATs are account-wide and cannot be restricted to a single repository directly. However, you can control access using repository permissions and organizational policies. Here are some options:

Create a Dedicated GitHub User:

Create a new GitHub user specifically for accessing the repository. Invite this user to your repository with the necessary permissions, then generate a PAT for this user.

We will use this token in the next section.

Prepare Docker Hub Credentials

Docker hub credentials are required to prevent rate limits on image downloads.

If you do not have docker account, please create it here.

Store the docker username and password securely for use in the next section.

We are now ready to deploy the LLM app.

Pre-requisites for MGMT and DEV Cluster

Fork and Clone GiaB NVD Gitops Repository

Reserve Ingress and Istio Endpoint IPs

Create Nginx Ingress and Istio VIP/FDQN

Management Cluster

Dev Cluster

Create Buckets in Nutanix Objects

Generating Access Keys for Buckets

Create Buckets

Provide Access to Buckets

Create Nutanix Files Share

Extract the Model Archive file to Files Share

Prepare Github Repository and API Token

Prepare Docker Hub Credentials