← Portfolio

VAL-9000

6/1/2025

A self-hosted AI assistant built from the ground up — local GPU inference, custom personality, voice synthesis, a multi-source RAG ingestion pipeline, and a physical hardware interface.

Stack

All services run containerized via Docker Compose on local hardware with NVIDIA GPU passthrough:

  • Ollama — GPU-accelerated local LLM inference (Llama 3)
  • Kokoro — local text-to-speech synthesis
  • OpenWebUI — web-based chat interface
  • Ingestion service — custom FastAPI service for feeding RAG content
  • Preprocessor — cleans and prepares ingested content for retrieval

Custom Personality

VAL-9000 runs a hand-authored Modelfile defining a consistent personality with Big Five and MBTI trait scores, a defined value system, and specific behavioral constraints. The result is a coherent assistant character that remains consistent across sessions rather than a generic chatbot.

RAG Ingestion Pipeline

A FastAPI ingestion service accepts content from multiple source types and normalizes it into RAG-ready text:

  • Web pages (main content extraction)
  • YouTube videos (transcript ingestion)
  • Git repositories
  • Local files

Hardware Interface

The physical presence is VAL-EYE — an ESP32-S3 with a 240x240 circular display running a real-time software-rasterized 3D scene that reacts to VAL-9000's audio output, changing visual state based on whether the assistant is idle or speaking.

← Portfolio