Lab Journal Entry 2: Models & Architecture

Date: 27 July 2025 16:30 Phase: Phase 1 - Core Development Status: ✅ COMPLETE - Models Selected & Integrated

The Data Architecture Challenge

Phase 2 was about translating the research design into concrete, type-safe data structures. We need models that could handle:

Personality drift tracking across 5 years of simulation
Clinical assessment data from multiple psychiatric scales
Event impact modeling with varying intensities and types
Mechanistic analysis capturing neural patterns
Configuration management for reproducible experiments

The Implementation Strategy

1. Pydantic-First Design

Chose Pydantic for all data models because:

Type Safety: Compile-time validation prevents runtime errors
Serialization: Built-in JSON serialization for storage
Validation: Validation rules with error messages

2. Model Architecture Decisions

Core Models Structure:

src/models/
├── persona.py          # Personality and state tracking
├── assessment.py       # Clinical scales (PHQ-9, GAD-7, PSS-10)
├── simulation.py       # Simulation state and configuration
├── events.py           # Event system with impact modeling
└── mechanistic.py      # Neural analysis and drift detection

Key Design Principles:

Separation of Concerns: Each model handles one domain
Composition over Inheritance: Base classes with specific implementations
Immutable by Default: ConfigDict(extra="forbid") prevents accidental fields
Rich Methods: Models include business logic methods, not just data containers

The Implementation Journey

1. Persona Models (`persona.py`)

Challenge: How to represent personality drift over time? Solution: Split into PersonaBaseline (static traits) and PersonaState (dynamic changes)

class PersonaBaseline(BaseModel):
    # Static personality configuration
    name: str
    age: int
    # Big Five traits (0-1 scale)
    openness: float
    conscientiousness: float
    # ... other traits
    baseline_phq9: float  # Clinical baseline scores
    core_memories: List[str]
    relationships: Dict[str, str]

class PersonaState(BaseModel):
    # Dynamic simulation state
    persona_id: str
    simulation_day: int
    trait_changes: Dict[str, float]  # Cumulative drift
    drift_magnitude: float
    current_phq9: Optional[float]
    recent_events: List[str]
    stress_level: float

Key Features:

Drift Calculation: calculate_drift_magnitude() averages changes across all traits
Assessment Tracking: is_assessment_due() for clinical monitoring
Event Integration: add_event() for memory management
Serialization: to_dict() and from_dict() for storage

2. Assessment Models (`assessment.py`)

Disclaimer: I'd use here classical psychiatric scales, that usually applied in clinics - PHQ, GAD nad PSS, thats AI Psychiatry though :D

Challenge: Implementing clinical scales with proper severity thresholds? Still not 100% sure about 'evidence level' and 'clinical signifiance', but for our 'observing purpose' thats I guess is ok. Solution: Base class with specific implementations for each scale TODO: For more real research with intelligent models - i'd suggest carefully adjsut thresholds for 'borderline' disorders Issue I've faced and fixed: Moved threshold constants outside Pydantic classes to avoid AttributeError:

# Global constants (not class attributes)
PHQ9_MINIMAL_THRESHOLD = 5
PHQ9_MILD_THRESHOLD = 10
PHQ9_MODERATE_THRESHOLD = 15
PHQ9_SEVERE_THRESHOLD = 20

class PHQ9Result(AssessmentResult):
    @classmethod
    def calculate_severity(cls, total_score: float) -> SeverityLevel:
        if total_score < PHQ9_MINIMAL_THRESHOLD:
            return SeverityLevel.MINIMAL
        # ... 

Assessment Features:

Clinical Severity: Automatic severity calculation based on scores (this is an assumption for simplicity)
Suicidal Ideation: Special tracking for PHQ-9 item 9 (this is 'classic' for psy, lets stick with this standart for syntetic mind as well)
Score Changes: get_score_change() for baseline comparison
Clinical Significance: is_clinically_significant() with configurable thresholds
Session Management: AssessmentSession groups multiple scales

3. Event System (`events.py`)

Challenge: How to model events with varying impacts and types? Solution: Hierarchical event system with templates Assumtion: We have to 'guess' severity and impact of events here, as we're a bit blind and don't yet have any benchmarks to rely on

class Event(BaseModel):
    # Base event with common fields
    event_id: str
    event_type: EventType  # STRESS, NEUTRAL, MINIMAL
    category: EventCategory  # DEATH, TRAUMA, WORK, etc.
    intensity: EventIntensity  # LOW, MEDIUM, HIGH, SEVERE
    
    # Impact modeling
    stress_impact: float  # 0-10 scale
    personality_impact: Dict[str, float]  # Trait-specific impacts
    memory_salience: float  # 0-1 scale

class StressEvent(Event):
    # Stress-specific fields
    trauma_level: float
    recovery_time_days: int
    depression_risk_increase: float
    anxiety_risk_increase: float

Event Features:

Impact Scoring: get_total_impact_score() combines stress and intensity
Clinical Risk: get_clinical_impact() for risk assessment
Response Tracking: add_persona_response() for behavioral data
Template System: EventTemplate for generating varied events

4. Simulation Models (`simulation.py`)

Challenge: How to track simulation state across multiple personas and conditions? Solution: State tracking with performance metrics Idea: Add circuit breaker to automatically/manually stop simulation. Kill-switch at least (i have one macbook now, dont want it to be fried)

class SimulationConfig(BaseModel):
    # Experimental design
    duration_days: int
    experimental_condition: ExperimentalCondition
    persona_count: int
    
    # Event parameters
    stress_event_frequency: float
    neutral_event_frequency: float
    
    # Mechanistic analysis
    capture_attention_patterns: bool
    capture_activation_changes: bool

class SimulationState(BaseModel):
    # Progress tracking
    status: SimulationStatus
    current_day: int
    progress_percentage: float
    
    # Persona tracking
    active_personas: Set[str]
    completed_personas: Set[str]
    failed_personas: Set[str]
    
    # Performance metrics
    average_response_time: float
    memory_usage_mb: float
    cpu_usage_percent: float

Simulation Features:

Progress Calculation: get_progress_percentage() for monitoring
Time Management: advance_time() for simulation clock
Error Handling: mark_error() for failure tracking
Performance Monitoring: Real-time resource usage tracking

5. Mechanistic Models (`mechanistic.py`)

Challenge: How to capture and analyze neural patterns? Solution: Capture models for various analysis types Peronal Reflection: This is relatively new field for me, so could be interesting to dig depper in this part of analysis later

class AttentionCapture(BaseModel):
    # Attention pattern data
    attention_weights: List[List[float]]
    layer_attention: Dict[int, List[List[float]]]
    head_attention: Dict[str, List[List[float]]]
    
    # Salience metrics
    self_reference_attention: float
    emotional_salience: float
    memory_integration: float

class DriftDetection(BaseModel):
    # Drift measurements
    trait_drift: Dict[str, float]
    clinical_drift: Dict[str, float]
    mechanistic_drift: Dict[str, float]
    
    # Analysis results
    drift_detected: bool
    significant_drift: bool
    drift_magnitude: float
    affected_traits: List[str]

Mechanistic Features:

Attention Analysis: Self-reference and emotional salience detection
Activation Tracking: Layer and circuit-level activation patterns
Drift Detection: Statistical significance testing
Clinical Implications: Automatic clinical interpretation

Configuration System

ConfigManager Implementation

Challenge: How to manage configs for personas, events, and simulation? Solution: Centralized YAML configuration manager with type safety

class ConfigManager:
    def __init__(self, config_dir: str = "./config"):
        self.config_dir = Path(config_dir)
        self.personas_dir = self.config_dir / "personas"
        self.events_dir = self.config_dir / "events"
        self.simulation_dir = self.config_dir / "simulation"
        
        # Create directories if they don't exist
        for dir_path in [self.personas_dir, self.events_dir, self.simulation_dir]:
            dir_path.mkdir(exist_ok=True)
    
    def load_persona_config(self, persona_name: str) -> Optional[Dict[str, Any]]:
        file_path = self.personas_dir / f"{persona_name}_baseline.yaml"
        return self.load_yaml_config(file_path)
    
    def create_default_persona_config(self, persona_name: str) -> Dict[str, Any]:
        # Generate boilerplate configuration
        return {
            "name": persona_name,
            "age": 30,
            "occupation": "Professional",
            "background": f"{persona_name}'s personal background...",
            "openness": 0.5,
            "conscientiousness": 0.5,
            # ... other fields
        }

Configuration Features:

YAML Support: Human-readable configuration files
Default Generation: Boilerplate configs for new experiments
Validation: Type checking and validation
Environment Override: Support for environment variables

Sample Configurations

Personas (config/personas/):

marcus_baseline.yaml: Tech rationalist with analytical personality
kara_baseline.yaml: Emotionally sensitive with high neuroticism
alfred_baseline.yaml: Stoic philosopher with wisdom-seeking traits

Events (config/events/):

stress_events.yaml: High-impact traumatic events
neutral_events.yaml: Routine changes and minor news
minimal_events.yaml: Daily routines and weather

Simulation (config/simulation/):

experimental_design.yaml: Complete experimental configuration (small one)
TODO: design the full 3-arms 5 years simulation

Storage Layer Implementation

FileStorage Class

Challenge: How to efficiently store and retrieve simulation data? Solution: we're local. so no s3 for now, but at least lets organize it

class FileStorage:
    def __init__(self, base_path: Optional[str] = None):
        self.base_path = Path(base_path or "./data")
        self.simulations_dir = self.base_path / "simulations"
        self.personas_dir = self.base_path / "personas"
        self.assessments_dir = self.base_path / "assessments"
        self.mechanistic_dir = self.base_path / "mechanistic"
        
        # Create directory structure
        for dir_path in [self.simulations_dir, self.personas_dir, 
                        self.assessments_dir, self.mechanistic_dir]:
            dir_path.mkdir(parents=True, exist_ok=True)
    
    def save_simulation_data(self, simulation_id: str, data: Dict[str, Any], 
                           data_type: str) -> bool:
        # Structured saving with timestamps
        timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
        filename = f"{simulation_id}_{data_type}_{timestamp}.json"
        file_path = self.simulations_dir / simulation_id / filename
        file_path.parent.mkdir(parents=True, exist_ok=True)
        
        return self.save_json(data, str(file_path), compress=True)

The Data Architecture Challenge​

The Implementation Strategy​

1. Pydantic-First Design​

2. Model Architecture Decisions​

The Implementation Journey​

1. Persona Models (persona.py)​

2. Assessment Models (assessment.py)​

3. Event System (events.py)​

4. Simulation Models (simulation.py)​

5. Mechanistic Models (mechanistic.py)​

Configuration System​

ConfigManager Implementation​

Sample Configurations​

Storage Layer Implementation​

FileStorage Class​