docs: update GPU specs (RTX 4000 Ada) and Vision Encoder performance with GPU acceleration

2025-11-17 11:39:31 -08:00
parent a945b42ba0
commit df1cd614cd
1 changed files with 16 additions and 14 deletions
--- a/SYSTEM-INVENTORY.md
+++ b/SYSTEM-INVENTORY.md
@@ -15,15 +15,16 @@

 #### GPU Configuration

-**GPU Model:** NVIDIA GeForce RTX 3090 (estimated based on typical setup)  
-**VRAM:** 24 GB GDDR6X  
-**CUDA Cores:** 10,496  
-**Tensor Cores:** 328 (3rd Gen)  
-**Architecture:** Ampere  
-**CUDA Version:** 12.1+  
-**Driver Version:** 535.104.05+
+**GPU Model:** NVIDIA RTX 4000 SFF Ada Generation  
+**VRAM:** 20 GB GDDR6  
+**Architecture:** Ada Lovelace  
+**CUDA Version:** 12.2  
+**Driver Version:** 535.274.02

-**Note:** Actual GPU model to be confirmed with `nvidia-smi` on server.
+**Current VRAM Usage:**
+- Ollama (qwen3:8b): ~5.6 GB
+- Vision Encoder (ViT-L/14): ~1.9 GB
+- **Total:** ~7.5 GB / 20 GB (37.5% usage)

 #### CPU & RAM (Typical GEX44)
 - **CPU:** AMD Ryzen 9 5950X (16 cores, 32 threads) or similar
@@ -80,12 +81,13 @@ curl http://localhost:11434/api/generate -d '{
 | **OpenAI CLIP** | CLIP (Contrastive Language-Image Pre-training) | - | 768 | - | Pretrained weights |

 **Capabilities:**
- ✅ Text → 768-dim embedding (10-20ms on GPU)
- ✅ Image → 768-dim embedding (30-50ms on GPU)
- ✅ Text-to-image search
- ✅ Image-to-image similarity search
- ✅ Zero-shot image classification (planned)
- ✅ CLIP score calculation (planned)
+- ✅ Text → 768-dim embedding (0.1-0.5s on GPU, ~10-15s on CPU)
+- ✅ Image → 768-dim embedding (0.3-1s on GPU, ~15-20s on CPU)
+- ✅ Text-to-image search (via Qdrant)
+- ✅ Image-to-image similarity search (via Qdrant)
+- ✅ GPU acceleration: **~20-30x speedup** vs CPU
+- ⏳ Zero-shot image classification (planned)
+- ⏳ CLIP score calculation (planned)

 **API Endpoints:**
 ```bash