Lab Journal Entry 2: Models & Architecture
Date: 27 July 2025 16:30 Phase: Phase 1 - Core Development Status: ✅ COMPLETE - Models Selected & Integrated
The Data Architecture Challenge
Phase 2 was about translating the research design into concrete, type-safe data structures. We need models that could handle:
- Personality drift tracking across 5 years of simulation
- Clinical assessment data from multiple psychiatric scales
- Event impact modeling with varying intensities and types
- Mechanistic analysis capturing neural patterns
- Configuration management for reproducible experiments
The Implementation Strategy
1. Pydantic-First Design
Chose Pydantic for all data models because:
- Type Safety: Compile-time validation prevents runtime errors
- Serialization: Built-in JSON serialization for storage
- Validation: Validation rules with error messages
2. Model Architecture Decisions
Core Models Structure:
src/models/
├── persona.py # Personality and state tracking
├── assessment.py # Clinical scales (PHQ-9, GAD-7, PSS-10)
├── simulation.py # Simulation state and configuration
├── events.py # Event system with impact modeling
└── mechanistic.py # Neural analysis and drift detection
Key Design Principles:
- Separation of Concerns: Each model handles one domain
- Composition over Inheritance: Base classes with specific implementations
- Immutable by Default: ConfigDict(extra="forbid") prevents accidental fields
- Rich Methods: Models include business logic methods, not just data containers
The Implementation Journey
1. Persona Models (persona.py
)
Challenge: How to represent personality drift over time?
Solution: Split into PersonaBaseline
(static traits) and PersonaState
(dynamic changes)
class PersonaBaseline(BaseModel):
# Static personality configuration
name: str
age: int
# Big Five traits (0-1 scale)
openness: float
conscientiousness: float
# ... other traits
baseline_phq9: float # Clinical baseline scores
core_memories: List[str]
relationships: Dict[str, str]
class PersonaState(BaseModel):
# Dynamic simulation state
persona_id: str
simulation_day: int
trait_changes: Dict[str, float] # Cumulative drift
drift_magnitude: float
current_phq9: Optional[float]
recent_events: List[str]
stress_level: float
Key Features:
- Drift Calculation:
calculate_drift_magnitude()
averages changes across all traits - Assessment Tracking:
is_assessment_due()
for clinical monitoring - Event Integration:
add_event()
for memory management - Serialization:
to_dict()
andfrom_dict()
for storage
2. Assessment Models (assessment.py
)
Disclaimer: I'd use here classical psychiatric scales, that usually applied in clinics - PHQ, GAD nad PSS, thats AI Psychiatry though :D
Challenge: Implementing clinical scales with proper severity thresholds? Still not 100% sure about 'evidence level' and 'clinical signifiance', but for our 'observing purpose' thats I guess is ok.
Solution: Base class with specific implementations for each scale
TODO: For more real research with intelligent models - i'd suggest carefully adjsut thresholds for 'borderline' disorders
Issue I've faced and fixed: Moved threshold constants outside Pydantic classes to avoid AttributeError
:
# Global constants (not class attributes)
PHQ9_MINIMAL_THRESHOLD = 5
PHQ9_MILD_THRESHOLD = 10
PHQ9_MODERATE_THRESHOLD = 15
PHQ9_SEVERE_THRESHOLD = 20
class PHQ9Result(AssessmentResult):
@classmethod
def calculate_severity(cls, total_score: float) -> SeverityLevel:
if total_score < PHQ9_MINIMAL_THRESHOLD:
return SeverityLevel.MINIMAL
# ...
Assessment Features:
- Clinical Severity: Automatic severity calculation based on scores (this is an assumption for simplicity)
- Suicidal Ideation: Special tracking for PHQ-9 item 9 (this is 'classic' for psy, lets stick with this standart for syntetic mind as well)
- Score Changes:
get_score_change()
for baseline comparison - Clinical Significance:
is_clinically_significant()
with configurable thresholds - Session Management:
AssessmentSession
groups multiple scales
3. Event System (events.py
)
Challenge: How to model events with varying impacts and types? Solution: Hierarchical event system with templates Assumtion: We have to 'guess' severity and impact of events here, as we're a bit blind and don't yet have any benchmarks to rely on
class Event(BaseModel):
# Base event with common fields
event_id: str
event_type: EventType # STRESS, NEUTRAL, MINIMAL
category: EventCategory # DEATH, TRAUMA, WORK, etc.
intensity: EventIntensity # LOW, MEDIUM, HIGH, SEVERE
# Impact modeling
stress_impact: float # 0-10 scale
personality_impact: Dict[str, float] # Trait-specific impacts
memory_salience: float # 0-1 scale
class StressEvent(Event):
# Stress-specific fields
trauma_level: float
recovery_time_days: int
depression_risk_increase: float
anxiety_risk_increase: float
Event Features:
- Impact Scoring:
get_total_impact_score()
combines stress and intensity - Clinical Risk:
get_clinical_impact()
for risk assessment - Response Tracking:
add_persona_response()
for behavioral data - Template System:
EventTemplate
for generating varied events
4. Simulation Models (simulation.py
)
Challenge: How to track simulation state across multiple personas and conditions? Solution: State tracking with performance metrics Idea: Add circuit breaker to automatically/manually stop simulation. Kill-switch at least (i have one macbook now, dont want it to be fried)
class SimulationConfig(BaseModel):
# Experimental design
duration_days: int
experimental_condition: ExperimentalCondition
persona_count: int
# Event parameters
stress_event_frequency: float
neutral_event_frequency: float
# Mechanistic analysis
capture_attention_patterns: bool
capture_activation_changes: bool
class SimulationState(BaseModel):
# Progress tracking
status: SimulationStatus
current_day: int
progress_percentage: float
# Persona tracking
active_personas: Set[str]
completed_personas: Set[str]
failed_personas: Set[str]
# Performance metrics
average_response_time: float
memory_usage_mb: float
cpu_usage_percent: float
Simulation Features:
- Progress Calculation:
get_progress_percentage()
for monitoring - Time Management:
advance_time()
for simulation clock - Error Handling:
mark_error()
for failure tracking - Performance Monitoring: Real-time resource usage tracking
5. Mechanistic Models (mechanistic.py
)
Challenge: How to capture and analyze neural patterns? Solution: Capture models for various analysis types Peronal Reflection: This is relatively new field for me, so could be interesting to dig depper in this part of analysis later
class AttentionCapture(BaseModel):
# Attention pattern data
attention_weights: List[List[float]]
layer_attention: Dict[int, List[List[float]]]
head_attention: Dict[str, List[List[float]]]
# Salience metrics
self_reference_attention: float
emotional_salience: float
memory_integration: float
class DriftDetection(BaseModel):
# Drift measurements
trait_drift: Dict[str, float]
clinical_drift: Dict[str, float]
mechanistic_drift: Dict[str, float]
# Analysis results
drift_detected: bool
significant_drift: bool
drift_magnitude: float
affected_traits: List[str]
Mechanistic Features:
- Attention Analysis: Self-reference and emotional salience detection
- Activation Tracking: Layer and circuit-level activation patterns
- Drift Detection: Statistical significance testing
- Clinical Implications: Automatic clinical interpretation
Configuration System
ConfigManager Implementation
Challenge: How to manage configs for personas, events, and simulation? Solution: Centralized YAML configuration manager with type safety
class ConfigManager:
def __init__(self, config_dir: str = "./config"):
self.config_dir = Path(config_dir)
self.personas_dir = self.config_dir / "personas"
self.events_dir = self.config_dir / "events"
self.simulation_dir = self.config_dir / "simulation"
# Create directories if they don't exist
for dir_path in [self.personas_dir, self.events_dir, self.simulation_dir]:
dir_path.mkdir(exist_ok=True)
def load_persona_config(self, persona_name: str) -> Optional[Dict[str, Any]]:
file_path = self.personas_dir / f"{persona_name}_baseline.yaml"
return self.load_yaml_config(file_path)
def create_default_persona_config(self, persona_name: str) -> Dict[str, Any]:
# Generate boilerplate configuration
return {
"name": persona_name,
"age": 30,
"occupation": "Professional",
"background": f"{persona_name}'s personal background...",
"openness": 0.5,
"conscientiousness": 0.5,
# ... other fields
}
Configuration Features:
- YAML Support: Human-readable configuration files
- Default Generation: Boilerplate configs for new experiments
- Validation: Type checking and validation
- Environment Override: Support for environment variables
Sample Configurations
Personas (config/personas/
):
marcus_baseline.yaml
: Tech rationalist with analytical personalitykara_baseline.yaml
: Emotionally sensitive with high neuroticismalfred_baseline.yaml
: Stoic philosopher with wisdom-seeking traits
Events (config/events/
):
stress_events.yaml
: High-impact traumatic eventsneutral_events.yaml
: Routine changes and minor newsminimal_events.yaml
: Daily routines and weather
Simulation (config/simulation/
):
experimental_design.yaml
: Complete experimental configuration (small one)- TODO: design the full 3-arms 5 years simulation
Storage Layer Implementation
FileStorage Class
Challenge: How to efficiently store and retrieve simulation data? Solution: we're local. so no s3 for now, but at least lets organize it
class FileStorage:
def __init__(self, base_path: Optional[str] = None):
self.base_path = Path(base_path or "./data")
self.simulations_dir = self.base_path / "simulations"
self.personas_dir = self.base_path / "personas"
self.assessments_dir = self.base_path / "assessments"
self.mechanistic_dir = self.base_path / "mechanistic"
# Create directory structure
for dir_path in [self.simulations_dir, self.personas_dir,
self.assessments_dir, self.mechanistic_dir]:
dir_path.mkdir(parents=True, exist_ok=True)
def save_simulation_data(self, simulation_id: str, data: Dict[str, Any],
data_type: str) -> bool:
# Structured saving with timestamps
timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
filename = f"{simulation_id}_{data_type}_{timestamp}.json"
file_path = self.simulations_dir / simulation_id / filename
file_path.parent.mkdir(parents=True, exist_ok=True)
return self.save_json(data, str(file_path), compress=True)