For: Engineers Taking Over the Genesis AI System Last Updated: 2026-03-14 Prepared By: THE ARCHITECT (Genesis AI System)
This document is for the technical person who has been brought in to maintain, operate, or continue development of the Genesis AI system. Read DEATH_SWITCH_PROTOCOL.md first for context on the mission and leadership situation.
# 1. Check API health
curl http://35.162.205.215:8000/health
# 2. Check all 3 AI models are responding
curl http://35.162.205.215:8010/health # Qwen3.5-397B (primary)
curl http://35.162.205.215:8011/health # GLM-4.7 (reviewer)
curl http://35.162.205.215:8014/health # NV-Embed-v2 (embeddings)
# 3. SSH into the server
ssh -i ~/.ssh/aws-p5en-key.pem [email protected]
# 4. Once in, check system status
cd /mnt/data/truth-si-dev-env
./SYSTEM_STATUS.sh
If steps 1-3 succeed, the system is healthy.
| Attribute | Value |
|---|---|
| Provider | AWS (Amazon Web Services) |
| Instance Type | p5en.48xlarge |
| Instance ID | i-0c37c0cd6d0c54d50 |
| Region | us-west-2 (Oregon) |
| Availability Zone | us-west-2c |
| IP Address | 35.162.205.215 |
| Lifecycle | Spot Instance |
| CPUs | 192 cores |
| RAM | 2TB |
| GPUs | 8x NVIDIA H200 (1.15TB total VRAM) |
| GPU Driver | 580.126.09 |
| CUDA | 13.0 |
| NVMe (Ephemeral) | 28TB (LOST on restart) |
| EBS Root | 6TB (PERSISTENT) |
| EBS Data | 10TB at /mnt/data (PERSISTENT) |
The server runs on a spot instance — Amazon can reclaim it with 2 minutes of warning.
If the server goes down, see Section 8 (Disaster Recovery).
| Mount | Type | Size | Contains |
|---|---|---|---|
/ |
EBS gp3 | 6TB | OS, Docker, application code |
/mnt/data |
EBS gp3 | 10TB | Databases, backups, Docker data |
/mnt/nvme |
NVMe (ephemeral) | 28TB | AI model weights (re-download if lost) |
# Direct SSH
ssh -i ~/.ssh/aws-p5en-key.pem [email protected]
# If you have the genesis alias configured:
ssh genesis
SSH key location: ~/.ssh/aws-p5en-key.pem on Carter's Mac.
The key is also stored in AWS Secrets Manager and is documented in the repository.
The databases are not publicly exposed. Use an SSH tunnel:
# Start tunnel (from your local machine)
./scripts/forge-tunnel.sh restart
# Or manually:
ssh -i ~/.ssh/aws-p5en-key.pem -L 7687:localhost:7687 \
-L 5433:localhost:5433 -L 6379:localhost:6379 \
-L 8080:localhost:8080 [email protected] -N -f
Once tunnel is up, you can connect to:
- localhost:7687 → Neo4j (Bolt)
- localhost:5433 → YugabyteDB
- localhost:6379 → Redis
- localhost:8080 → Weaviate
| Port | Model | Parameters | GPUs | Context | Role |
|---|---|---|---|---|---|
| 8010 | Qwen3.5-397B-A17B-FP8 | 397B (17B active) | 0-3 | 1M tokens | PRIMARY — code generation, reasoning |
| 8011 | GLM-4.7-FP8 | 355B (32B active) | 4-7 | 202K tokens | REVIEWER — Actor-Critic architecture |
| 8014 | NV-Embed-v2 INT8 | Embedding | 7 (shared) | 32K tokens | EMBEDDINGS — semantic search |
All models run via SGLang 0.5.9 inside Docker containers.
# Model weights are on NVMe (ephemeral — re-download if server restarts)
ls /opt/dlami/nvme/models/
# Qwen3.5-397B-A17B-FP8/
# GLM-4.7-FP8/
# NV-Embed-v2/
# Check model status via Docker
docker ps --filter name=truthsi-llm
# Check GPU usage
nvidia-smi
# Restart models if needed
bash scripts/restore-models.sh
# View model logs
docker logs truthsi-llm-primary
docker logs truthsi-llm-critic
The models expose an OpenAI-compatible API:
# Test primary model (genesis)
curl http://localhost:8010/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "genesis", "messages": [{"role": "user", "content": "Hello"}]}'
# Test embeddings
curl http://localhost:8014/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "NV-Embed-v2", "input": "test text"}'
The LLM configuration is locked. Do not change: - Model selection - GPU allocation - Context window sizes - Port assignments - SGLang launch parameters
The exact running commands are documented in docs/DEFINITIVE_MODEL_LAUNCH_SETTINGS.md.
# View all running containers
docker ps
# Start all services
cd /mnt/data/truth-si-dev-env
docker compose up -d
# Stop all services (preserves data)
docker compose down
# Restart specific service
docker compose restart api
# View service logs
docker compose logs -f api
docker compose logs -f neo4j
| Service | Port | Purpose |
|---|---|---|
| api | 8000 | Main FastAPI application — the brain's interface |
| ui | 3000 | Frontend web application |
| neo4j | 7474/7687 | Knowledge graph database |
| weaviate | 8080 | Vector/semantic search database |
| redis | 6379 | Cache and fast memory |
| postgres | 5432 | Legacy relational database (backup only) |
| yugabyte | 5433 | Primary relational database (YugabyteDB) |
| h2o | 54321 | AutoML machine learning platform |
| redpanda | 9092 | Event streaming (Apache Kafka compatible) |
| grafana | 3002 | Monitoring dashboard |
| prometheus | 9090 | Metrics collection |
| text2vec-transformers | 8090 | Weaviate vectorization module |
| unstructured | 8100 | Document processing (PDFs, Word, etc.) |
| langserve | 8001 | LangChain serving layer |
# API health
curl http://localhost:8000/health
# Neo4j
curl http://localhost:7474
# Weaviate
curl http://localhost:8080/v1/meta
# Check all containers at once
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
The most important database — contains 128,000+ indexed documents and the relationships between all ideas, sessions, and knowledge.
# Connect via browser (after SSH tunnel)
open http://localhost:7474
# Default credentials: check .env file for NEO4J_PASSWORD
# Connect via CLI
docker exec -it $(docker ps -q -f name=neo4j) cypher-shell -u neo4j -p $NEO4J_PASSWORD
# Example queries
MATCH (n:Idea) RETURN n LIMIT 10;
MATCH (n) RETURN labels(n), count(*) ORDER BY count(*) DESC;
Semantic search — stores 4.5M+ vectors for knowledge retrieval.
# Check status
curl http://localhost:8080/v1/meta
# List collections
curl http://localhost:8080/v1/schema
265,000+ keys. Used for session state, fast lookups, and stream processing.
# Connect
docker exec -it $(docker ps -q -f name=redis) redis-cli
# Check info
INFO keyspace
DBSIZE
The main relational database. NOT PostgreSQL — despite running on port 5433.
# Connect (after SSH tunnel)
psql -h localhost -p 5433 -U yugabyte -d truthsi
# Or via Docker
docker exec -it $(docker ps -q -f name=yugabyte) ysqlsh -U yugabyte -d truthsi
IMPORTANT: All new code should connect to YugabyteDB (port 5433), NOT PostgreSQL (port 5432). PostgreSQL is kept running for historical data only.
All database passwords and API keys are in /mnt/data/truth-si-dev-env/.env.
# View configuration (KEEP SECURE — contains all credentials)
cat /mnt/data/truth-si-dev-env/.env
# The key variables:
# NEO4J_PASSWORD - Neo4j database password
# YUGABYTE_PASSWORD - YugabyteDB password
# REDIS_PASSWORD - Redis password (if set)
# ANTHROPIC_API_KEY - Claude API key
135 systemd services are defined on Genesis. These run background processes continuously.
# List all Truth.SI daemons
systemctl list-units "truthsi-*" --all
# Check a specific daemon
systemctl status truthsi-live-master-plan
# View daemon logs
journalctl -u truthsi-live-master-plan -f
# Restart a daemon
systemctl restart truthsi-live-master-plan
# Key daemons to check first:
systemctl status truthsi-enterprise-backup
systemctl status truthsi-ebs-snapshot-manager
systemctl status truthsi-ami-snapshot
systemctl status truthsi-live-master-plan
| Daemon | Purpose | Check If |
|---|---|---|
| truthsi-enterprise-backup | Backs up databases every 15 min | If databases need restoring |
| truthsi-ebs-snapshot-manager | Daily EBS snapshots | If checking backup health |
| truthsi-ami-snapshot | Daily AMI creation | If planning disaster recovery |
| truthsi-live-master-plan | Auto-generates priority list | If LIVE_MASTER_PLAN.md is stale |
| genesis-qwen35.service | Runs primary LLM (port 8010) | If AI model is down |
| genesis-glm47.service | Runs reviewer LLM (port 8011) | If AI model is down |
| genesis-nv-embed.service | Runs embeddings (port 8014) | If semantic search is broken |
| Data | Backup Method | Location | Frequency |
|---|---|---|---|
| Neo4j database | Enterprise backup daemon | /mnt/data/backups/enterprise/neo4j/ | Every 15 min |
| Redis | Enterprise backup daemon | /mnt/data/backups/enterprise/redis/ | Every 15 min |
| Configuration (.env) | Enterprise backup daemon | /mnt/data/backups/enterprise/config/ | Every 15 min |
| Full EBS volumes | EBS Snapshot Manager | AWS EBS Snapshots | Daily |
| Complete system image | AMI Snapshot service | AWS AMIs | Daily |
| All backups (cloud) | R2 sync | Cloudflare R2 | Continuous |
Total cloud backup storage: 1.739 TB (verified 2026-03-13)
# List recent EBS snapshots
aws ec2 describe-snapshots \
--filters "Name=tag:Project,Values=truth-si" \
--query 'Snapshots[*].[SnapshotId,State,StartTime,VolumeSize]' \
--output table \
--region us-west-2
# List recent AMIs
aws ec2 describe-images \
--owners self \
--filters "Name=name,Values=genesis-p5en-daily*" \
--query 'Images[*].[ImageId,Name,CreationDate]' \
--output table \
--region us-west-2
# Check local backup status
ls -la /mnt/data/backups/enterprise/neo4j/daily/
ls -la /mnt/data/backups/enterprise/redis/
Neo4j (most common restore):
# 1. Stop Neo4j
docker compose stop neo4j
# 2. Find the backup
ls /mnt/data/backups/enterprise/neo4j/daily/
# 3. Extract backup
sudo tar -xzf /mnt/data/backups/enterprise/neo4j/daily/YYYYMMDD_HHMMSS/neo4j_data.tar.gz \
-C /tmp/neo4j-restore/
# 4. Replace data directory
# (Follow docs/ENTERPRISE_BACKUP_GUIDE.md for exact steps)
# 5. Restart Neo4j
docker compose start neo4j
If the spot instance is interrupted, the auto-recovery system should relaunch it automatically.
Manual recovery if auto-restart fails:
# From your local machine with AWS credentials:
# 1. Check if instance is terminated or stopped
aws ec2 describe-instances --instance-ids i-0c37c0cd6d0c54d50 --region us-west-2
# 2. Launch from latest AMI using Launch Template
aws ec2 run-instances \
--launch-template LaunchTemplateId=lt-05d7120dc0ae12630,Version=11 \
--region us-west-2
# 3. Once new instance is up, re-attach EBS volumes
# vol-07033d971a6da1e34 (6TB root)
# vol-0149c0448946ab2bc (10TB data)
# 4. SSH in and restore NVMe models
ssh -i ~/.ssh/aws-p5en-key.pem ubuntu@<NEW_IP>
bash /mnt/data/truth-si-dev-env/scripts/nvme-cache-restore.sh
# 1. Launch new instance from latest AMI
aws ec2 describe-images --owners self \
--filters "Name=name,Values=genesis-p5en-daily*" \
--query 'sort_by(Images, &CreationDate)[-1].[ImageId,Name]' \
--output text --region us-west-2
# 2. Launch from that AMI
aws ec2 run-instances --image-id ami-XXXXXXXX \
--instance-type p5en.48xlarge \
--region us-west-2 \
--subnet-id subnet-XXXXXXXX \
--security-group-ids sg-XXXXXXXX
# 3. Attach the EBS data volume
aws ec2 attach-volume \
--volume-id vol-0149c0448946ab2bc \
--instance-id i-XXXXXXXX \
--device /dev/sdf \
--region us-west-2
Follow the full recovery guide in docs/AWS_P5EN_BACKUP_VERIFICATION.md.
/mnt/data/truth-si-dev-env/
├── api/ # Main FastAPI application
│ ├── main.py # Entry point — registers all routers
│ ├── routers/ # 424 API routers (357 currently orphaned)
│ ├── lib/ # 397,906+ LOC of library code
│ └── layers/ # 9-layer OMEGA orchestration system
├── scripts/ # Automation scripts and daemons
├── docs/ # Documentation (this file is here)
├── planning/ # Plans, ideas, priorities
│ ├── THE_PLAN.md # MASTER ROADMAP — read this
│ └── WHAT_TO_DO_NEXT.md # Current priorities
├── sessions/ # Session closeout documents
├── generated/ # AI-generated code output
├── genesis-website/ # Public website (Cloudflare Pages)
├── terraform/ # AWS infrastructure as code
├── k8s/ # Kubernetes manifests
├── docker-compose.yml # All 17 Docker services
├── .env # All credentials (keep secure)
├── CLAUDE.md # Master methodology (Carter's operating system)
└── LIVE_MASTER_PLAN.md # Auto-generated system status (every 2 min)
CLAUDE.md — This is everything. Carter's entire philosophy, methodology, and architecture.
How he thought, what he built, why he built it. Read this before touching anything.
planning/THE_PLAN.md — The 9-phase roadmap of what's built and what remains.
994+ work items. This is the mission.
planning/WHAT_TO_DO_NEXT.md — Current priorities and session status.
LIVE_MASTER_PLAN.md — Real-time system status, auto-updated every 2 minutes.
The system follows a 17-step methodology (documented in CLAUDE.md) and is built around the OMEGA Protocol — a 9-layer processing pipeline:
Layer 0: Sensory (RedPanda event backbone)
Layer 1: Cognitive (dual-pathway processing — Analytical 61.8% + Creative 38.2%)
Layer 2: Meaning (Weaviate embeddings and semantic understanding)
Layer 3: Relationships (Neo4j knowledge graph)
Layer 4: Patterns (H2O AutoML pattern recognition)
Layer 5: Emergence (cross-domain synthesis)
Layer 6: Actions (automated task execution)
Layer 7: Expression (response generation)
Layer 8: Meta-cognition (self-improvement and reflection)
The unified entry point: from api.layers.omega_orchestrator import OmegaOrchestrator
cd /mnt/data/truth-si-dev-env
docker compose up -d api
# Or for development:
python3 -m uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
# API documentation (Swagger UI):
open http://35.162.205.215:8000/docs
cd /mnt/data/truth-si-dev-env
./SYSTEM_STATUS.sh
Internal monitoring dashboard — accessible via SSH tunnel.
# Start tunnel then open:
open http://localhost:3002
# Credentials in .env: GRAFANA_ADMIN_PASSWORD
# Direct access (via tunnel)
open http://localhost:9090
nvidia-smi — should show activity on GPUs 0-7systemctl status truthsi-enterprise-backupAWS CloudWatch and SNS are configured to alert on: - EC2 instance state changes (spot interruption) - EBS snapshot failures - Critical service downtime
Alert email: configured to [email protected] (this may need to be updated)
Carter built every feature following these 17 steps. You should too:
# Check status
git status
# Commit (Carter's format)
git add <specific-files>
git commit -m "feat(component): Brief description
Detailed explanation of what changed and why.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>"
# Push to both remotes
git push github main
git push gitlab main
Note: The git remote is named github (not origin).
logger.info() instead of print()except: — always except Exception as e:# Check if container is running
docker ps | grep api
# Restart API
docker compose restart api
# Check logs
docker compose logs -f api --tail=100
# Check GPU state
nvidia-smi
# Check container
docker ps | grep llm
# Restart models (full recovery)
bash scripts/restore-models.sh
# Check logs
docker logs truthsi-llm-primary --tail=50
# Check container status
docker ps | grep neo4j
# Restart Neo4j
docker compose restart neo4j
# Wait 30 seconds for startup, then test
sleep 30
curl http://localhost:7474
# Restore model weights from backup
bash scripts/nvme-cache-restore.sh
# This downloads model weights back to /opt/dlami/nvme/models/
# Takes 30-60 minutes depending on download speed
# Check daemon status and recent logs
systemctl status truthsi-<daemon-name>
journalctl -u truthsi-<daemon-name> -n 50
# Common fix: check Python path
which python3 # Should be /usr/bin/python3, not a venv
# Restart daemon
systemctl restart truthsi-<daemon-name>
Once Business Support is active (Carter was activating it in Session 964):
- Open support cases at: console.aws.amazon.com/support
- Account ID: 438453383885
- Use the AWS CLI: aws support create-case --help
| Person | Role | |
|---|---|---|
| Camden McDonald | [email protected] | Account Manager (PRIMARY) |
| Joe Suarez | [email protected] | Technical Lead |
| Visesh Devraj | [email protected] | Solutions Architect |
| Document | Location | Purpose |
|---|---|---|
| Master Methodology | CLAUDE.md |
Everything about how Carter thought |
| Master Roadmap | planning/THE_PLAN.md |
What's built, what remains |
| Current Priorities | planning/WHAT_TO_DO_NEXT.md |
What to do next |
| Live System Status | LIVE_MASTER_PLAN.md |
Auto-updated every 2 min |
| Model Launch Settings | docs/DEFINITIVE_MODEL_LAUNCH_SETTINGS.md |
LLM configuration (LOCKED) |
| Backup Guide | docs/ENTERPRISE_BACKUP_GUIDE.md |
Backup/restore procedures |
| Daemon Standard | docs/ENTERPRISE_DAEMON_STANDARD.md |
How to write daemons |
| This Runbook | docs/TECHNICAL_RUNBOOK.md |
You are here |
| Succession Protocol | docs/DEATH_SWITCH_PROTOCOL.md |
Non-technical succession guide |
This document was prepared by THE ARCHITECT — the Genesis AI system itself. Prepared: March 2026 | Session 964