Pre-requisites for MGMT and DEV Cluster
In this part of the lab we will prepare pre-requisites for LLM application on GPU nodes.
The following is the flow of the applications lab:
stateDiagram-v2
direction LR
state PreRequisites {
[*] --> ReserveIPs
ReserveIPs --> CreateBuckets
CreateBuckets --> CreateFilesShare
CreateFilesShare --> [*]
}
[*] --> PreRequisites
PreRequisites --> DeployLLMV1 : next section
DeployLLMV1 --> TestLLMApp
TestLLMApp --> [*]
Prepare the following pre-requisites for mgmt-cluster and dev-cluster kubernetes clusters.
Fork and Clone GiaB NVD Gitops Repository
Warning
The following steps are only required if Deploying GPT-In-A-Box v1 using NVD GitOps workflow
-
Open the following URL and fork the repo to your Github org
2. From VSC, logon to your jumphost VM (if not already done) 3. Open Terminal -
From the
$HOME
directory, clone the fork of yoursol-cnai-infra
git repo and change working directory -
Finally set your github config
-
In
VSCode
>Terminal
Login to your Github account using the following command:- If you do not have
gh
client installed, see Github CLI Installation Docs.
# Execution example ❯ gh auth login ─╯ ? What account do you want to log into? GitHub.com ? What is your preferred protocol for Git operations on this host? HTTPS ? Authenticate Git with your GitHub credentials? Yes ? How would you like to authenticate GitHub CLI? [Use arrows to move, type to filter] Login with a web browser > Paste an authentication token Successfully logged in to Github.
- If you do not have
Now the jumphost VM is ready for deploying our app. We will do this in the next section.
Reserve Ingress and Istio Endpoint IPs
Nutanix AHV IPAM network allows you to black list IPs that needs to be reserved for specific application endpoints. We will use this feature to find and reserve four IPs.
We will need a total of four IPs for the following:
Cluster Role | Cluster Name | Ingress IP | Istio IP |
---|---|---|---|
Management | mgmt-cluster |
1 | 1 |
Dev | dev-cluster |
1 | 1 |
-
Get the CIDR range for the AHV network(subnet) where the application will be deployed
-
From VSC, logon to your jumpbox VM (if not already done)
-
Open Terminal
-
Install
nmap
tool (if not already done) -
Find four unused static IP addresses in the subnet
-
Logon to any CVM in your Nutanix cluster and execute the following to add chosen static IPs to the AHV IPAM network
- Username: nutanix
- Password: your Prism Element password
Create Nginx Ingress and Istio VIP/FDQN
We will use nip.io address to assign FQDNs for our Nginx Ingress and Istio by using the 4 IPs that we just reserved in the previous section for use in the next section. We will leverage the NIP.IO
with the vip and self-signed certificates.
We will need a total of four IPs for the following:
Management Cluster
Assign the first two reserved IPs to Management cluster.
Component | Sub-component | IP/FQDN |
---|---|---|
Ingress | vip |
10.x.x.214 |
Ingress Wildcard | wildcard_ingress_subdomain |
mgmt-cluster.10.x.x.214.nip.io |
Ingress Subdomain | management_cluster_ingress_subdomain |
mgmt-cluster.10.x.x.214.nip.io |
Reserved for future | troubleshooting or debugging |
10.x.x.215 |
Note
We only need 1 IP for Management cluster. However, KubeVIP needs a range of at least two IPs. We will reserve the second IP for future use and/or troubleshooting purposes.
Dev Cluster
Assign the next two reserved IPs to Dev cluster.
Note
The management_cluster_ingress_subdomain
appears in this table once again and it is just a reference for dev-cluster
to mgmt-cluster
. This entry will be used in the .env.dev-cluster.yaml
file during the Deploy Dev Cluster section.
Component | Sub-component | IP/FQDN |
---|---|---|
Nginx Ingress | vip |
10.x.x.216 |
Nginx Ingress | wildcard_ingress_subdomain |
dev-cluster.10.x.x.216.nip.io |
Nginx Ingress | management_cluster_ingress_subdomain |
mgmt-cluster.10.x.x.214.nip.io |
Istio | vip |
10.x.x.217 |
Istio | wildcard_ingress_subdomain |
dev-cluster.10.x.x.217.nip.io |
Create Buckets in Nutanix Objects
We will create access keys to buckets that we will be using in the project.
Generating Access Keys for Buckets
Note
Follow instructions here to create a Nutanix Objects Store (if you do not have it)
We are assuming that the name of the Objects Store is ntnx-objects
.
-
Go to Prism Central > Objects > ntnx-objects
-
On the right-hand pane, click on Access Keys
-
Click on + Add people
-
Select Add people not in a directory service
-
Enter an email
llm-admin@example.com
and namellm-admin
-
Click on Next
-
Click on Generate Keys
-
Once generated, click on Download Keys
-
Once downloaded, click on Close
-
Open the downloaded file to verify contents
-
Store the access key and secret key in a safe place for access
Create Buckets
We will create buckets for Milvus database store and document store for uploaded files for querying will be stored.
-
On the top menu, click on Object Stores
-
Click on ntnx-objects
-
Click on Create Bucket
-
Enter mgmt-cluster-milvus as the bucket name
-
Click on Create
-
Follow the same steps to create another bucket called documents01
Provide Access to Buckets
-
In the list of buckets, click on the mgmt-cluster-milvus bucket
-
Click on User Access menu and Edit User Access
-
In the mgmt-cluster-milvus window, type in the
llm-admin@example.com
email that you configured in the Generating Access Keys for Buckets section -
Give Full Access permissions
-
Click on Save
-
Follow the same steps to give Full Access to the
llm-admin@example.com
email for documents01 bucket
Create Nutanix Files Share
Create NFS share for hosting the LLM model file llama-2-13b-chat
and model archive file
Note
Follow instructions here to create a Nutanix Files cluster (if you do not have it)
We are assuming that the name of the Files cluster is ntnx-files
.
-
Go to Prism Central > Files > ntnx-files
-
Click on Shares & Exports
- Click on + New Share or Export
-
Enter the following details:
- Name - llm-model-store
- Enable compression - checked
- Authentication - system
- Default Access - Read-Write
- Squash - Root Squash
-
Click on Create
-
Copy the Share/Export Path from the list of shares and note it down for later use (e.g:
/llm-model-store
)
Extract the Model Archive file to Files Share
The LLM application will use the model archive file (MAR) stored in the file share. A few commands need to be executed to download and extract the model file from Hugging Face to the Files share.
Note
The following steps are directly from opendocs.nutanix.com GPT-in-a-Box documentation.
-
Logon to the jumphost VM you created in the previous section
-
Download nutanix package and extract it.
-
Install pip
-
Install the python library requirements
-
Mount the file share created in the previous section
-
Download and extract the model file to the local mount of file share
# Sample output ## Starting model files download Deleted all contents from '/mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download' The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download/tmp_hf_cache generation_config.json: 100%|██████████████████████████████████████████████████████████████████████| 188/188 [00:00<00:00, 1.15MB/s] config.json: 100%|█████████████████████████████████████████████████████████████████████████████████| 614/614 [00:00<00:00, 7.51MB/s] LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████| 7.02k/7.02k [00:00<00:00, 81.3MB/s] USE_POLICY.md: 100%|███████████████████████████████████████████████████████████████████████████| 4.77k/4.77k [00:00<00:00, 11.2MB/s] .gitattributes: 100%|██████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 10.1MB/s] README.md: 100%|████████████████████████████████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 113MB/s] tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████| 1.62k/1.62k [00:00<00:00, 13.5MB/s] special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.22MB/s] model.safetensors.index.json: 100%|████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 13.5MB/s] pytorch_model.bin.index.json: 100%|████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 12.4MB/s] tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 6.12MB/s] tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 7.42MB/s] model-00002-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████| 3.50G/3.50G [00:23<00:00, 149MB/s] model-00001-of-00002.safetensors: 100%|█████████████████████████████████████████████████████████| 9.98G/9.98G [01:01<00:00, 163MB/s] Fetching 14 files: 100%|████████████████████████████████████████████████████████████████████████████| 14/14 [01:02<00:00, 4.47s/it] Deleted all contents from '/mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/download/tmp_hf_cache' MB/s] ## Successfully downloaded model_files ## Generating MAR file for custom model files: llama2_7b_chat The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/model-store ## Generating MAR file, will take few mins. ## Successfully generated MAR files ## Generating MAR file, will take few mins. Model Archive File is Generating... Creating Model Archive: 42%|███████████████████████████▉ | 4.97G/11.7G [10:40<12:56, 8.69MB/s]Creating Model Archive: 100%|██████████████████████████████████████████████████████████████████| 11.7G/11.7G [21:01<00:00, 9.29MB/s] Model Archive file size: 9.66 GB ## llama2_7b_chat.mar is generated. The new directory is created! - /mnt/llm-model-store/llama2_7b_chat/94b07a6e30c3292b8265ed32ffdeccfdadf434a8/config
Prepare Github Repository and API Token
We need to fork this projects Github repository to you github organization(hadle). This repository will be used to hold the flux files for GitOps sections.
A repo_api_token
needs to be created to allow for git changes.
-
Log in to GitHub: go to GitHub and log in to your account.
-
Fork the following source repository
-
Access Settings: click on your profile picture in the top-right corner and select Settings from the dropdown menu.
-
Developer Settings: scroll down in the left sidebar and click on Developer settings.
-
Personal Access Tokens: in the left sidebar, click on Personal access tokens and then Tokens (classic).
-
Generate New Token: click on the Generate new token button. You might need to re-enter your GitHub password.
-
Configure Token:
- Give your token a descriptive name:
for nai-llm-fleet-infra actions
- Expiration: choose an expiration period of
7 days
- Scopes:
- Select
repo
(and every option under it) - Select
write:packages
andread:packages
- Select
admin:org
andread:org
under it - Select
gist
- Give your token a descriptive name:
-
Generate Token: after selecting the scopes, click on the Generate token button at the bottom of the page.
-
Copy Token: GitHub will generate the token and display it. Copy this token and store it securely. It can't be seen again.
Single repository access?
Restricting Token to a Single Repository
GitHub's PATs are account-wide and cannot be restricted to a single repository directly. However, you can control access using repository permissions and organizational policies. Here are some options:
-
Create a Dedicated GitHub User:
Create a new GitHub user specifically for accessing the repository. Invite this user to your repository with the necessary permissions, then generate a PAT for this user.
We will use this token in the next section.
Prepare Docker Hub Credentials
Docker hub credentials are required to prevent rate limits on image downloads.
If you do not have docker account, please create it here.
Store the docker username and password securely for use in the next section.
We are now ready to deploy the LLM app.