Visualize how LLM model deployments are scheduled across Kubernetes nodepools based on compute and GPU requirements
CPU: 16 cores
Memory: 64 GB
GPU: 2 × NVIDIA H100-80G
CPU-Only Deployment (No GPU required)
Storage: 290 GB
Engine: vLLM
Context Length: 128K
Default compute node pool
gpu-pool
: General GPU nodescpu-pool
: CPU-only nodeshigh-memory
: Memory-optimized nodesa100-pool
: For specific GPU typeGemma 2B on CPU nodes
Llama 70B on H100 GPUs
Hardware compatibility testing
Mixtral 8x7B with advanced settings
Multiple model types and node pools
Testing scheduler with limited resources
Pre-validated NIM 70B RAG stack
Tests deployment of a large model (Meta Llama 3.1 70B) requiring multiple H100 GPUs per pod.
Each pod requires 2 H100 GPUs, 16 CPU cores, and 64 GB memory. With the recommended node pool, each node can run 2 pods, allowing all 2 replicas to be scheduled.