# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**cluster4npu** is a high-performance multi-stage inference pipeline system for Kneron NPU dongles. The project enables flexible single-stage and cascaded multi-stage AI inference workflows optimized for real-time video processing and high-throughput scenarios.

### Core Architecture

- **InferencePipeline**: Main orchestrator managing multi-stage workflows with automatic queue management and thread coordination
- **MultiDongle**: Hardware abstraction layer for Kneron NPU devices (KL520, KL720, etc.)
- **StageConfig**: Configuration system for individual pipeline stages
- **PipelineData**: Data structure that flows through pipeline stages, accumulating results
- **PreProcessor/PostProcessor**: Flexible data transformation components for inter-stage processing

### Key Design Patterns

- **Producer-Consumer**: Each stage runs in separate threads with input/output queues
- **Pipeline Architecture**: Linear data flow through configurable stages with result accumulation
- **Hardware Abstraction**: MultiDongle encapsulates Kneron SDK complexity
- **Callback-Based**: Asynchronous result handling via configurable callbacks

## Development Commands

### Environment Setup
```bash
# Setup virtual environment with uv
uv venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
uv pip install -r requirements.txt
```

### Running Examples
```bash
# Single-stage pipeline
uv run python src/cluster4npu/test.py --example single

# Two-stage cascade pipeline  
uv run python src/cluster4npu/test.py --example cascade

# Complex multi-stage pipeline
uv run python src/cluster4npu/test.py --example complex

# Basic MultiDongle usage
uv run python src/cluster4npu/Multidongle.py

# Complete UI application with full workflow
uv run python UI.py

# UI integration examples
uv run python ui_integration_example.py

# Test UI configuration system
uv run python ui_config.py
```

### UI Application Workflow
The UI.py provides a complete visual workflow:

1. **Dashboard/Home** - Main entry point with recent files
2. **Pipeline Editor** - Visual node-based pipeline design
3. **Stage Configuration** - Dongle allocation and hardware setup
4. **Performance Estimation** - FPS calculations and optimization
5. **Save & Deploy** - Export configurations and cost estimation
6. **Monitoring & Management** - Real-time pipeline monitoring

```bash
# Access different workflow stages directly:
# 1. Create new pipeline → Pipeline Editor
# 2. Configure Stages & Deploy → Stage Configuration
# 3. Pipeline menu → Performance Analysis → Performance Panel
# 4. Pipeline menu → Deploy Pipeline → Save & Deploy Dialog
```

### Testing
```bash
# Run pipeline tests
uv run python test_pipeline.py

# Test MultiDongle functionality
uv run python src/cluster4npu/test.py
```

## Hardware Requirements

- **Kneron NPU dongles**: KL520, KL720, etc.
- **Firmware files**: `fw_scpu.bin`, `fw_ncpu.bin` 
- **Models**: `.nef` format files
- **USB ports**: Multiple ports required for multi-dongle setups

## Critical Implementation Notes

### Pipeline Configuration
- Each stage requires unique `stage_id` and dedicated `port_ids`
- Queue sizes (`max_queue_size`) must be balanced between memory usage and throughput
- Stages process sequentially - output from stage N becomes input to stage N+1

### Thread Safety
- All pipeline operations are thread-safe
- Each stage runs in isolated worker threads
- Use callbacks for result handling, not direct queue access

### Data Flow
```
Input → Stage1 → Stage2 → ... → StageN → Output
     ↓        ↓               ↓        ↓
   Queue   Process        Process   Result
           + Results      + Results  Callback
```

### Hardware Management
- Always call `initialize()` before `start()`
- Always call `stop()` for clean shutdown
- Firmware upload (`upload_fw=True`) only needed once per session
- Port IDs must match actual USB connections

### Error Handling
- Pipeline continues on individual stage errors
- Failed stages return error results rather than blocking
- Comprehensive statistics available via `get_pipeline_statistics()`

## UI Application Architecture

### Complete Workflow Components

- **DashboardLogin**: Main entry point with project management
- **PipelineEditor**: Node-based visual pipeline design using NodeGraphQt
- **StageConfigurationDialog**: Hardware allocation and dongle assignment
- **PerformanceEstimationPanel**: Real-time performance analysis and optimization
- **SaveDeployDialog**: Export configurations and deployment cost estimation
- **MonitoringDashboard**: Live pipeline monitoring and cluster management

### UI Integration System

- **ui_config.py**: Configuration management and UI/core integration
- **ui_integration_example.py**: Demonstrates conversion from UI to core tools
- **UIIntegration class**: Bridges UI configurations to InferencePipeline

### Key UI Features

- **Auto-dongle allocation**: Smart assignment of dongles to pipeline stages
- **Performance estimation**: Real-time FPS and latency calculations
- **Cost analysis**: Hardware and operational cost projections
- **Export formats**: Python scripts, JSON configs, YAML, Docker containers
- **Live monitoring**: Real-time metrics and cluster scaling controls

## Code Patterns

### Basic Pipeline Setup
```python
config = StageConfig(
    stage_id="unique_name",
    port_ids=[28, 32],
    scpu_fw_path="fw_scpu.bin", 
    ncpu_fw_path="fw_ncpu.bin",
    model_path="model.nef",
    upload_fw=True
)

pipeline = InferencePipeline([config])
pipeline.initialize()
pipeline.start()
pipeline.set_result_callback(callback_func)
# ... processing ...
pipeline.stop()
```

### Inter-Stage Processing
```python
# Custom preprocessing for stage input
preprocessor = PreProcessor(resize_fn=custom_resize_func)

# Custom postprocessing for stage output  
postprocessor = PostProcessor(process_fn=custom_process_func)

config = StageConfig(
    # ... basic config ...
    input_preprocessor=preprocessor,
    output_postprocessor=postprocessor
)
```

## Performance Considerations

- **Queue Sizing**: Smaller queues = lower latency, larger queues = higher throughput
- **Dongle Distribution**: Spread dongles across stages for optimal parallelization
- **Processing Functions**: Keep preprocessors/postprocessors lightweight
- **Memory Management**: Monitor queue sizes to prevent memory buildup