VAL-9000
6/1/2025
A self-hosted AI assistant built from the ground up — local GPU inference, custom personality, voice synthesis, a multi-source RAG ingestion pipeline, and a physical hardware interface.
Stack
All services run containerized via Docker Compose on local hardware with NVIDIA GPU passthrough:
- Ollama — GPU-accelerated local LLM inference (Llama 3)
- Kokoro — local text-to-speech synthesis
- OpenWebUI — web-based chat interface
- Ingestion service — custom FastAPI service for feeding RAG content
- Preprocessor — cleans and prepares ingested content for retrieval
Custom Personality
VAL-9000 runs a hand-authored Modelfile defining a consistent personality with Big Five and MBTI trait scores, a defined value system, and specific behavioral constraints. The result is a coherent assistant character that remains consistent across sessions rather than a generic chatbot.
RAG Ingestion Pipeline
A FastAPI ingestion service accepts content from multiple source types and normalizes it into RAG-ready text:
- Web pages (main content extraction)
- YouTube videos (transcript ingestion)
- Git repositories
- Local files
Hardware Interface
The physical presence is VAL-EYE — an ESP32-S3 with a 240x240 circular display running a real-time software-rasterized 3D scene that reacts to VAL-9000's audio output, changing visual state based on whether the assistant is idle or speaking.
← Portfolio