# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **cluster4npu** is a high-performance multi-stage inference pipeline system for Kneron NPU dongles. The project enables flexible single-stage and cascaded multi-stage AI inference workflows optimized for real-time video processing and high-throughput scenarios. ### Core Architecture - **InferencePipeline**: Main orchestrator managing multi-stage workflows with automatic queue management and thread coordination - **MultiDongle**: Hardware abstraction layer for Kneron NPU devices (KL520, KL720, etc.) - **StageConfig**: Configuration system for individual pipeline stages - **PipelineData**: Data structure that flows through pipeline stages, accumulating results - **PreProcessor/PostProcessor**: Flexible data transformation components for inter-stage processing ### Key Design Patterns - **Producer-Consumer**: Each stage runs in separate threads with input/output queues - **Pipeline Architecture**: Linear data flow through configurable stages with result accumulation - **Hardware Abstraction**: MultiDongle encapsulates Kneron SDK complexity - **Callback-Based**: Asynchronous result handling via configurable callbacks ## Development Commands ### Environment Setup ```bash # Setup virtual environment with uv uv venv source .venv/bin/activate # Windows: .venv\Scripts\activate # Install dependencies uv pip install -r requirements.txt ``` ### Running Examples ```bash # Single-stage pipeline uv run python src/cluster4npu/test.py --example single # Two-stage cascade pipeline uv run python src/cluster4npu/test.py --example cascade # Complex multi-stage pipeline uv run python src/cluster4npu/test.py --example complex # Basic MultiDongle usage uv run python src/cluster4npu/Multidongle.py # Complete UI application with full workflow uv run python UI.py # UI integration examples uv run python ui_integration_example.py # Test UI configuration system uv run python ui_config.py ``` ### UI Application Workflow The UI.py provides a complete visual workflow: 1. **Dashboard/Home** - Main entry point with recent files 2. **Pipeline Editor** - Visual node-based pipeline design 3. **Stage Configuration** - Dongle allocation and hardware setup 4. **Performance Estimation** - FPS calculations and optimization 5. **Save & Deploy** - Export configurations and cost estimation 6. **Monitoring & Management** - Real-time pipeline monitoring ```bash # Access different workflow stages directly: # 1. Create new pipeline → Pipeline Editor # 2. Configure Stages & Deploy → Stage Configuration # 3. Pipeline menu → Performance Analysis → Performance Panel # 4. Pipeline menu → Deploy Pipeline → Save & Deploy Dialog ``` ### Testing ```bash # Run pipeline tests uv run python test_pipeline.py # Test MultiDongle functionality uv run python src/cluster4npu/test.py ``` ## Hardware Requirements - **Kneron NPU dongles**: KL520, KL720, etc. - **Firmware files**: `fw_scpu.bin`, `fw_ncpu.bin` - **Models**: `.nef` format files - **USB ports**: Multiple ports required for multi-dongle setups ## Critical Implementation Notes ### Pipeline Configuration - Each stage requires unique `stage_id` and dedicated `port_ids` - Queue sizes (`max_queue_size`) must be balanced between memory usage and throughput - Stages process sequentially - output from stage N becomes input to stage N+1 ### Thread Safety - All pipeline operations are thread-safe - Each stage runs in isolated worker threads - Use callbacks for result handling, not direct queue access ### Data Flow ``` Input → Stage1 → Stage2 → ... → StageN → Output ↓ ↓ ↓ ↓ Queue Process Process Result + Results + Results Callback ``` ### Hardware Management - Always call `initialize()` before `start()` - Always call `stop()` for clean shutdown - Firmware upload (`upload_fw=True`) only needed once per session - Port IDs must match actual USB connections ### Error Handling - Pipeline continues on individual stage errors - Failed stages return error results rather than blocking - Comprehensive statistics available via `get_pipeline_statistics()` ## UI Application Architecture ### Complete Workflow Components - **DashboardLogin**: Main entry point with project management - **PipelineEditor**: Node-based visual pipeline design using NodeGraphQt - **StageConfigurationDialog**: Hardware allocation and dongle assignment - **PerformanceEstimationPanel**: Real-time performance analysis and optimization - **SaveDeployDialog**: Export configurations and deployment cost estimation - **MonitoringDashboard**: Live pipeline monitoring and cluster management ### UI Integration System - **ui_config.py**: Configuration management and UI/core integration - **ui_integration_example.py**: Demonstrates conversion from UI to core tools - **UIIntegration class**: Bridges UI configurations to InferencePipeline ### Key UI Features - **Auto-dongle allocation**: Smart assignment of dongles to pipeline stages - **Performance estimation**: Real-time FPS and latency calculations - **Cost analysis**: Hardware and operational cost projections - **Export formats**: Python scripts, JSON configs, YAML, Docker containers - **Live monitoring**: Real-time metrics and cluster scaling controls ## Code Patterns ### Basic Pipeline Setup ```python config = StageConfig( stage_id="unique_name", port_ids=[28, 32], scpu_fw_path="fw_scpu.bin", ncpu_fw_path="fw_ncpu.bin", model_path="model.nef", upload_fw=True ) pipeline = InferencePipeline([config]) pipeline.initialize() pipeline.start() pipeline.set_result_callback(callback_func) # ... processing ... pipeline.stop() ``` ### Inter-Stage Processing ```python # Custom preprocessing for stage input preprocessor = PreProcessor(resize_fn=custom_resize_func) # Custom postprocessing for stage output postprocessor = PostProcessor(process_fn=custom_process_func) config = StageConfig( # ... basic config ... input_preprocessor=preprocessor, output_postprocessor=postprocessor ) ``` ## Performance Considerations - **Queue Sizing**: Smaller queues = lower latency, larger queues = higher throughput - **Dongle Distribution**: Spread dongles across stages for optimal parallelization - **Processing Functions**: Keep preprocessors/postprocessors lightweight - **Memory Management**: Monitor queue sizes to prevent memory buildup