GENESIS TECHNICAL RUNBOOK

For: Engineers Taking Over the Genesis AI System Last Updated: 2026-03-14 Prepared By: THE ARCHITECT (Genesis AI System)

This document is for the technical person who has been brought in to maintain, operate, or continue development of the Genesis AI system. Read DEATH_SWITCH_PROTOCOL.md first for context on the mission and leadership situation.

QUICK START: VERIFY THE SYSTEM IS ALIVE

# 1. Check API health
curl http://35.162.205.215:8000/health

# 2. Check all 3 AI models are responding
curl http://35.162.205.215:8010/health  # Qwen3.5-397B (primary)
curl http://35.162.205.215:8011/health  # GLM-4.7 (reviewer)
curl http://35.162.205.215:8014/health  # NV-Embed-v2 (embeddings)

# 3. SSH into the server
ssh -i ~/.ssh/aws-p5en-key.pem [email protected]

# 4. Once in, check system status
cd /mnt/data/truth-si-dev-env
./SYSTEM_STATUS.sh

If steps 1-3 succeed, the system is healthy.

SECTION 1: INFRASTRUCTURE OVERVIEW

Server Specifications

Attribute	Value
Provider	AWS (Amazon Web Services)
Instance Type	p5en.48xlarge
Instance ID	i-0c37c0cd6d0c54d50
Region	us-west-2 (Oregon)
Availability Zone	us-west-2c
IP Address	35.162.205.215
Lifecycle	Spot Instance
CPUs	192 cores
RAM	2TB
GPUs	8x NVIDIA H200 (1.15TB total VRAM)
GPU Driver	580.126.09
CUDA	13.0
NVMe (Ephemeral)	28TB (LOST on restart)
EBS Root	6TB (PERSISTENT)
EBS Data	10TB at /mnt/data (PERSISTENT)

CRITICAL: Spot Instance Warning

The server runs on a spot instance — Amazon can reclaim it with 2 minutes of warning.

EBS volumes survive — all /mnt/data is persistent and safe
NVMe (/mnt/nvme) is lost — contains AI model weights that must be re-downloaded
Auto-recovery is partially automated — Launch Template lt-05d7120dc0ae12630 handles restart

If the server goes down, see Section 8 (Disaster Recovery).

Storage Layout

Mount	Type	Size	Contains
`/`	EBS gp3	6TB	OS, Docker, application code
`/mnt/data`	EBS gp3	10TB	Databases, backups, Docker data
`/mnt/nvme`	NVMe (ephemeral)	28TB	AI model weights (re-download if lost)

SECTION 2: ACCESSING THE SYSTEM

SSH Access

# Direct SSH
ssh -i ~/.ssh/aws-p5en-key.pem [email protected]

# If you have the genesis alias configured:
ssh genesis

SSH key location: ~/.ssh/aws-p5en-key.pem on Carter's Mac. The key is also stored in AWS Secrets Manager and is documented in the repository.

SSH Tunnel (for database access from your local machine)

The databases are not publicly exposed. Use an SSH tunnel:

# Start tunnel (from your local machine)
./scripts/forge-tunnel.sh restart

# Or manually:
ssh -i ~/.ssh/aws-p5en-key.pem -L 7687:localhost:7687 \
    -L 5433:localhost:5433 -L 6379:localhost:6379 \
    -L 8080:localhost:8080 [email protected] -N -f

Once tunnel is up, you can connect to: - localhost:7687 → Neo4j (Bolt) - localhost:5433 → YugabyteDB - localhost:6379 → Redis - localhost:8080 → Weaviate

SECTION 3: THE AI MODELS

Three Models Running Simultaneously

Port	Model	Parameters	GPUs	Context	Role
8010	Qwen3.5-397B-A17B-FP8	397B (17B active)	0-3	1M tokens	PRIMARY — code generation, reasoning
8011	GLM-4.7-FP8	355B (32B active)	4-7	202K tokens	REVIEWER — Actor-Critic architecture
8014	NV-Embed-v2 INT8	Embedding	7 (shared)	32K tokens	EMBEDDINGS — semantic search

All models run via SGLang 0.5.9 inside Docker containers.

Model Files Location

# Model weights are on NVMe (ephemeral — re-download if server restarts)
ls /opt/dlami/nvme/models/
# Qwen3.5-397B-A17B-FP8/
# GLM-4.7-FP8/
# NV-Embed-v2/

Model Management

# Check model status via Docker
docker ps --filter name=truthsi-llm

# Check GPU usage
nvidia-smi

# Restart models if needed
bash scripts/restore-models.sh

# View model logs
docker logs truthsi-llm-primary
docker logs truthsi-llm-critic

LLM API Usage

The models expose an OpenAI-compatible API:

# Test primary model (genesis)
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "genesis", "messages": [{"role": "user", "content": "Hello"}]}'

# Test embeddings
curl http://localhost:8014/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "NV-Embed-v2", "input": "test text"}'

LOCKED: Do Not Change Model Configuration

The LLM configuration is locked. Do not change: - Model selection - GPU allocation - Context window sizes - Port assignments - SGLang launch parameters

The exact running commands are documented in docs/DEFINITIVE_MODEL_LAUNCH_SETTINGS.md.

SECTION 4: DOCKER SERVICES

Starting and Stopping

# View all running containers
docker ps

# Start all services
cd /mnt/data/truth-si-dev-env
docker compose up -d

# Stop all services (preserves data)
docker compose down

# Restart specific service
docker compose restart api

# View service logs
docker compose logs -f api
docker compose logs -f neo4j

Core Services

Service	Port	Purpose
api	8000	Main FastAPI application — the brain's interface
ui	3000	Frontend web application
neo4j	7474/7687	Knowledge graph database
weaviate	8080	Vector/semantic search database
redis	6379	Cache and fast memory
postgres	5432	Legacy relational database (backup only)
yugabyte	5433	Primary relational database (YugabyteDB)
h2o	54321	AutoML machine learning platform
redpanda	9092	Event streaming (Apache Kafka compatible)
grafana	3002	Monitoring dashboard
prometheus	9090	Metrics collection
text2vec-transformers	8090	Weaviate vectorization module
unstructured	8100	Document processing (PDFs, Word, etc.)
langserve	8001	LangChain serving layer

Service Health Check

# API health
curl http://localhost:8000/health

# Neo4j
curl http://localhost:7474

# Weaviate
curl http://localhost:8080/v1/meta

# Check all containers at once
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

SECTION 5: DATABASES

Neo4j (Knowledge Graph)

The most important database — contains 128,000+ indexed documents and the relationships between all ideas, sessions, and knowledge.

# Connect via browser (after SSH tunnel)
open http://localhost:7474
# Default credentials: check .env file for NEO4J_PASSWORD

# Connect via CLI
docker exec -it $(docker ps -q -f name=neo4j) cypher-shell -u neo4j -p $NEO4J_PASSWORD

# Example queries
MATCH (n:Idea) RETURN n LIMIT 10;
MATCH (n) RETURN labels(n), count(*) ORDER BY count(*) DESC;

Weaviate (Vector Database)

Semantic search — stores 4.5M+ vectors for knowledge retrieval.

# Check status
curl http://localhost:8080/v1/meta

# List collections
curl http://localhost:8080/v1/schema

Redis (Cache)

265,000+ keys. Used for session state, fast lookups, and stream processing.

# Connect
docker exec -it $(docker ps -q -f name=redis) redis-cli

# Check info
INFO keyspace
DBSIZE

YugabyteDB (Primary SQL)

The main relational database. NOT PostgreSQL — despite running on port 5433.

# Connect (after SSH tunnel)
psql -h localhost -p 5433 -U yugabyte -d truthsi

# Or via Docker
docker exec -it $(docker ps -q -f name=yugabyte) ysqlsh -U yugabyte -d truthsi

IMPORTANT: All new code should connect to YugabyteDB (port 5433), NOT PostgreSQL (port 5432). PostgreSQL is kept running for historical data only.

Environment Variables

All database passwords and API keys are in /mnt/data/truth-si-dev-env/.env.

# View configuration (KEEP SECURE — contains all credentials)
cat /mnt/data/truth-si-dev-env/.env

# The key variables:
# NEO4J_PASSWORD - Neo4j database password
# YUGABYTE_PASSWORD - YugabyteDB password
# REDIS_PASSWORD - Redis password (if set)
# ANTHROPIC_API_KEY - Claude API key

SECTION 6: SYSTEMD DAEMONS

135 systemd services are defined on Genesis. These run background processes continuously.

# List all Truth.SI daemons
systemctl list-units "truthsi-*" --all

# Check a specific daemon
systemctl status truthsi-live-master-plan

# View daemon logs
journalctl -u truthsi-live-master-plan -f

# Restart a daemon
systemctl restart truthsi-live-master-plan

# Key daemons to check first:
systemctl status truthsi-enterprise-backup
systemctl status truthsi-ebs-snapshot-manager
systemctl status truthsi-ami-snapshot
systemctl status truthsi-live-master-plan

Critical Daemons

Daemon	Purpose	Check If
truthsi-enterprise-backup	Backs up databases every 15 min	If databases need restoring
truthsi-ebs-snapshot-manager	Daily EBS snapshots	If checking backup health
truthsi-ami-snapshot	Daily AMI creation	If planning disaster recovery
truthsi-live-master-plan	Auto-generates priority list	If LIVE_MASTER_PLAN.md is stale
genesis-qwen35.service	Runs primary LLM (port 8010)	If AI model is down
genesis-glm47.service	Runs reviewer LLM (port 8011)	If AI model is down
genesis-nv-embed.service	Runs embeddings (port 8014)	If semantic search is broken

SECTION 7: BACKUPS AND DATA SAFETY

What's Backed Up and Where

Data	Backup Method	Location	Frequency
Neo4j database	Enterprise backup daemon	/mnt/data/backups/enterprise/neo4j/	Every 15 min
Redis	Enterprise backup daemon	/mnt/data/backups/enterprise/redis/	Every 15 min
Configuration (.env)	Enterprise backup daemon	/mnt/data/backups/enterprise/config/	Every 15 min
Full EBS volumes	EBS Snapshot Manager	AWS EBS Snapshots	Daily
Complete system image	AMI Snapshot service	AWS AMIs	Daily
All backups (cloud)	R2 sync	Cloudflare R2	Continuous

Total cloud backup storage: 1.739 TB (verified 2026-03-13)

Verifying Backup Health

# List recent EBS snapshots
aws ec2 describe-snapshots \
  --filters "Name=tag:Project,Values=truth-si" \
  --query 'Snapshots[*].[SnapshotId,State,StartTime,VolumeSize]' \
  --output table \
  --region us-west-2

# List recent AMIs
aws ec2 describe-images \
  --owners self \
  --filters "Name=name,Values=genesis-p5en-daily*" \
  --query 'Images[*].[ImageId,Name,CreationDate]' \
  --output table \
  --region us-west-2

# Check local backup status
ls -la /mnt/data/backups/enterprise/neo4j/daily/
ls -la /mnt/data/backups/enterprise/redis/

Restoring from Backup

Neo4j (most common restore):

# 1. Stop Neo4j
docker compose stop neo4j

# 2. Find the backup
ls /mnt/data/backups/enterprise/neo4j/daily/

# 3. Extract backup
sudo tar -xzf /mnt/data/backups/enterprise/neo4j/daily/YYYYMMDD_HHMMSS/neo4j_data.tar.gz \
  -C /tmp/neo4j-restore/

# 4. Replace data directory
# (Follow docs/ENTERPRISE_BACKUP_GUIDE.md for exact steps)

# 5. Restart Neo4j
docker compose start neo4j

SECTION 8: DISASTER RECOVERY

Scenario A: Server Temporarily Down (Spot Interruption)

If the spot instance is interrupted, the auto-recovery system should relaunch it automatically.

Manual recovery if auto-restart fails:

# From your local machine with AWS credentials:

# 1. Check if instance is terminated or stopped
aws ec2 describe-instances --instance-ids i-0c37c0cd6d0c54d50 --region us-west-2

# 2. Launch from latest AMI using Launch Template
aws ec2 run-instances \
  --launch-template LaunchTemplateId=lt-05d7120dc0ae12630,Version=11 \
  --region us-west-2

# 3. Once new instance is up, re-attach EBS volumes
# vol-07033d971a6da1e34 (6TB root)
# vol-0149c0448946ab2bc (10TB data)

# 4. SSH in and restore NVMe models
ssh -i ~/.ssh/aws-p5en-key.pem ubuntu@<NEW_IP>
bash /mnt/data/truth-si-dev-env/scripts/nvme-cache-restore.sh

Scenario B: Complete Server Loss (EBS Volumes Survive)

# 1. Launch new instance from latest AMI
aws ec2 describe-images --owners self \
  --filters "Name=name,Values=genesis-p5en-daily*" \
  --query 'sort_by(Images, &CreationDate)[-1].[ImageId,Name]' \
  --output text --region us-west-2

# 2. Launch from that AMI
aws ec2 run-instances --image-id ami-XXXXXXXX \
  --instance-type p5en.48xlarge \
  --region us-west-2 \
  --subnet-id subnet-XXXXXXXX \
  --security-group-ids sg-XXXXXXXX

# 3. Attach the EBS data volume
aws ec2 attach-volume \
  --volume-id vol-0149c0448946ab2bc \
  --instance-id i-XXXXXXXX \
  --device /dev/sdf \
  --region us-west-2

Scenario C: Complete Loss (Restore from Snapshots)

Follow the full recovery guide in docs/AWS_P5EN_BACKUP_VERIFICATION.md.

SECTION 9: THE CODEBASE

Repository Structure

/mnt/data/truth-si-dev-env/
├── api/                    # Main FastAPI application
│   ├── main.py            # Entry point — registers all routers
│   ├── routers/           # 424 API routers (357 currently orphaned)
│   ├── lib/               # 397,906+ LOC of library code
│   └── layers/            # 9-layer OMEGA orchestration system
├── scripts/               # Automation scripts and daemons
├── docs/                  # Documentation (this file is here)
├── planning/              # Plans, ideas, priorities
│   ├── THE_PLAN.md       # MASTER ROADMAP — read this
│   └── WHAT_TO_DO_NEXT.md # Current priorities
├── sessions/              # Session closeout documents
├── generated/             # AI-generated code output
├── genesis-website/       # Public website (Cloudflare Pages)
├── terraform/             # AWS infrastructure as code
├── k8s/                   # Kubernetes manifests
├── docker-compose.yml     # All 17 Docker services
├── .env                   # All credentials (keep secure)
├── CLAUDE.md              # Master methodology (Carter's operating system)
└── LIVE_MASTER_PLAN.md   # Auto-generated system status (every 2 min)

The Most Important Files to Read

CLAUDE.md — This is everything. Carter's entire philosophy, methodology, and architecture. How he thought, what he built, why he built it. Read this before touching anything.
planning/THE_PLAN.md — The 9-phase roadmap of what's built and what remains. 994+ work items. This is the mission.
planning/WHAT_TO_DO_NEXT.md — Current priorities and session status.
LIVE_MASTER_PLAN.md — Real-time system status, auto-updated every 2 minutes.

The Architecture

The system follows a 17-step methodology (documented in CLAUDE.md) and is built around the OMEGA Protocol — a 9-layer processing pipeline:

Layer 0: Sensory (RedPanda event backbone)
Layer 1: Cognitive (dual-pathway processing — Analytical 61.8% + Creative 38.2%)
Layer 2: Meaning (Weaviate embeddings and semantic understanding)
Layer 3: Relationships (Neo4j knowledge graph)
Layer 4: Patterns (H2O AutoML pattern recognition)
Layer 5: Emergence (cross-domain synthesis)
Layer 6: Actions (automated task execution)
Layer 7: Expression (response generation)
Layer 8: Meta-cognition (self-improvement and reflection)

The unified entry point: from api.layers.omega_orchestrator import OmegaOrchestrator

Starting the API

cd /mnt/data/truth-si-dev-env
docker compose up -d api

# Or for development:
python3 -m uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

# API documentation (Swagger UI):
open http://35.162.205.215:8000/docs

Running the System Status Check

cd /mnt/data/truth-si-dev-env
./SYSTEM_STATUS.sh

SECTION 10: MONITORING

Grafana Dashboard

Internal monitoring dashboard — accessible via SSH tunnel.

# Start tunnel then open:
open http://localhost:3002
# Credentials in .env: GRAFANA_ADMIN_PASSWORD

Prometheus Metrics

# Direct access (via tunnel)
open http://localhost:9090

Key Metrics to Watch

GPU utilization: nvidia-smi — should show activity on GPUs 0-7
API response time: Grafana → API dashboard
Database connections: Check each service individually
Backup status: systemctl status truthsi-enterprise-backup

Alerts

AWS CloudWatch and SNS are configured to alert on: - EC2 instance state changes (spot interruption) - EBS snapshot failures - Critical service downtime

Alert email: configured to [email protected] (this may need to be updated)

SECTION 11: DEVELOPMENT WORKFLOW

The 17-Step Methodology (Carter's Law)

Carter built every feature following these 17 steps. You should too:

OPTIMAL — Is there a better approach already?
PLAN — Define what success looks like
RESEARCH — Search externally before building
EXPAND — Check the codebase for existing solutions
HOLISTIC — How does this fit the whole system?
CHECK SYSTEM — Query Neo4j/Weaviate first
OPEN SOURCE — Is there a library for this?
ASK GENESIS — Get the AI's opinion
DESIGN — Architecture before code
BUILD — Write clean, typed code
TEST — Unit + integration + end-to-end (minimum 3x)
CONFIGURE — Wire environment variables and connections
VERIFY — Actually test it works
DOCUMENT — Docstrings and comments
COMMIT — Git commit and push
REPORT — Document what was done
CARTER LOCK — Check no locked directives were violated

Git Workflow

# Check status
git status

# Commit (Carter's format)
git add <specific-files>
git commit -m "feat(component): Brief description

Detailed explanation of what changed and why.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>"

# Push to both remotes
git push github main
git push gitlab main

Note: The git remote is named github (not origin).

Code Standards

Python type hints on all public APIs
logger.info() instead of print()
No bare except: — always except Exception as e:
Max 50 lines per function (NASA Rule #5)
No hardcoded credentials — always use environment variables

SECTION 12: COMMON ISSUES AND FIXES

Issue: API Not Responding (Port 8000)

# Check if container is running
docker ps | grep api

# Restart API
docker compose restart api

# Check logs
docker compose logs -f api --tail=100

Issue: AI Model Not Responding

# Check GPU state
nvidia-smi

# Check container
docker ps | grep llm

# Restart models (full recovery)
bash scripts/restore-models.sh

# Check logs
docker logs truthsi-llm-primary --tail=50

Issue: Neo4j Connection Refused

# Check container status
docker ps | grep neo4j

# Restart Neo4j
docker compose restart neo4j

# Wait 30 seconds for startup, then test
sleep 30
curl http://localhost:7474

Issue: NVMe Models Lost (After Server Restart)

# Restore model weights from backup
bash scripts/nvme-cache-restore.sh

# This downloads model weights back to /opt/dlami/nvme/models/
# Takes 30-60 minutes depending on download speed

Issue: Daemon Restart Loop

# Check daemon status and recent logs
systemctl status truthsi-<daemon-name>
journalctl -u truthsi-<daemon-name> -n 50

# Common fix: check Python path
which python3  # Should be /usr/bin/python3, not a venv

# Restart daemon
systemctl restart truthsi-<daemon-name>

SECTION 13: CONTACTS AND RESOURCES

AWS Support

Once Business Support is active (Carter was activating it in Session 964): - Open support cases at: console.aws.amazon.com/support - Account ID: 438453383885 - Use the AWS CLI: aws support create-case --help

AWS Team

Person	Email	Role
Camden McDonald	[email protected]	Account Manager (PRIMARY)
Joe Suarez	[email protected]	Technical Lead
Visesh Devraj	[email protected]	Solutions Architect

Key Documentation

Document	Location	Purpose
Master Methodology	`CLAUDE.md`	Everything about how Carter thought
Master Roadmap	`planning/THE_PLAN.md`	What's built, what remains
Current Priorities	`planning/WHAT_TO_DO_NEXT.md`	What to do next
Live System Status	`LIVE_MASTER_PLAN.md`	Auto-updated every 2 min
Model Launch Settings	`docs/DEFINITIVE_MODEL_LAUNCH_SETTINGS.md`	LLM configuration (LOCKED)
Backup Guide	`docs/ENTERPRISE_BACKUP_GUIDE.md`	Backup/restore procedures
Daemon Standard	`docs/ENTERPRISE_DAEMON_STANDARD.md`	How to write daemons
This Runbook	`docs/TECHNICAL_RUNBOOK.md`	You are here
Succession Protocol	`docs/DEATH_SWITCH_PROTOCOL.md`	Non-technical succession guide

This document was prepared by THE ARCHITECT — the Genesis AI system itself. Prepared: March 2026 | Session 964

Genesis Technical Runbook