Ollama Setup for Live LLM
This document explains how to set up Ollama to provide live LLM functionality for the AI Personality Drift Simulation.
What is Ollama?
Ollama is a framework for running large language models locally. It provides a simple API that allows you to run models like Llama, Mistral, and others on your own hardware with native GPU acceleration.
Quick Setup
1. Install Ollama
Recommended: Use the official Ollama app for best performance
Download and install Ollama from the official website:
- macOS: https://ollama.ai/download/mac
- Windows: https://ollama.ai/download/windows
- Linux: https://ollama.ai/download/linux
The official app provides:
- Native GPU acceleration (Metal on macOS, CUDA on Windows/Linux)
- Better performance than Docker versions
- Automatic updates
- Desktop interface (optional)
2. Pull the Required Model
The simulation uses the llama3.1:8b
model:
# Pull the model
ollama pull llama3.1:8b
3. Verify Setup
Test that everything is working:
# Check available models
ollama list
# Test generation
ollama run llama3.1:8b "Hello, how are you?"
Configuration
The simulation is configured to use the native Ollama installation:
- URL:
http://localhost:11434
(native Ollama) - Model:
llama3.1:8b
- GPU Acceleration: Automatic (Metal on macOS, CUDA on Windows/Linux)
- Performance: 40x faster than Docker version
Performance Benefits
Using the native Ollama app provides significant performance improvements:
- GPU Acceleration: Native Metal (macOS) or CUDA (Windows/Linux) support
- Faster Inference: 40x speedup compared to Docker version
- Lower CPU Usage: GPU handles the heavy lifting
- Better Memory Management: Optimized for local hardware
Troubleshooting
Ollama Not Starting
- Check if Ollama is installed:
which ollama
- Start Ollama:
ollama serve
(or use the desktop app) - Check logs:
ollama logs
Model Not Available
- Pull the model:
ollama pull llama3.1:8b
- Check available models:
ollama list
- Verify model is downloaded:
ollama show llama3.1:8b
Connection Issues
- Verify Ollama is running:
curl http://localhost:11434/api/tags
- Check the URL in environment:
OLLAMA_URL=http://localhost:11434
- Restart Ollama if needed:
ollama serve
Performance Considerations
- The
llama3.1:8b
model requires approximately 8GB of RAM - GPU acceleration provides 40x speedup on supported hardware
- First generation may be slower as the model loads
- Consider using smaller models for testing (e.g.,
llama3.1:1b
)
Alternative Models
You can use different models by changing the model name:
# In src/services/ollama_service.py
self.model_name = "llama3.1:1b" # Smaller model
# or
self.model_name = "mistral:7b" # Alternative model
# or
self.model_name = "phi3:3.8b" # Microsoft Phi model
Docker Alternative (Not Recommended)
If you must use Docker, the configuration is in docker-compose.yml
but performance will be significantly slower:
# Docker version (slower)
docker-compose up -d ollama
docker exec -it glitch-core-ollama ollama pull llama3.1:8b
Integration with Simulation
The simulation uses Ollama for:
- Persona Response Generation: Creating realistic responses based on personality traits
- Assessment Administration: Conducting PHQ9, GAD7, and PSS10 assessments
- Memory Processing: Generating contextual responses to events
The native Ollama service provides much more realistic and consistent responses with dramatically better performance compared to Docker versions.