cluster4npu/CLAUDE.md
Masonmason 0ae1f1c0e2 Add comprehensive inference pipeline system with UI framework
- Add InferencePipeline: Multi-stage inference orchestrator with thread-safe queue management
- Add Multidongle: Hardware abstraction layer for Kneron NPU devices
- Add comprehensive UI framework with node-based pipeline editor
- Add performance estimation and monitoring capabilities
- Add extensive documentation and examples
- Update project structure and dependencies

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-04 23:33:16 +08:00

6.4 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

cluster4npu is a high-performance multi-stage inference pipeline system for Kneron NPU dongles. The project enables flexible single-stage and cascaded multi-stage AI inference workflows optimized for real-time video processing and high-throughput scenarios.

Core Architecture

  • InferencePipeline: Main orchestrator managing multi-stage workflows with automatic queue management and thread coordination
  • MultiDongle: Hardware abstraction layer for Kneron NPU devices (KL520, KL720, etc.)
  • StageConfig: Configuration system for individual pipeline stages
  • PipelineData: Data structure that flows through pipeline stages, accumulating results
  • PreProcessor/PostProcessor: Flexible data transformation components for inter-stage processing

Key Design Patterns

  • Producer-Consumer: Each stage runs in separate threads with input/output queues
  • Pipeline Architecture: Linear data flow through configurable stages with result accumulation
  • Hardware Abstraction: MultiDongle encapsulates Kneron SDK complexity
  • Callback-Based: Asynchronous result handling via configurable callbacks

Development Commands

Environment Setup

# Setup virtual environment with uv
uv venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
uv pip install -r requirements.txt

Running Examples

# Single-stage pipeline
uv run python src/cluster4npu/test.py --example single

# Two-stage cascade pipeline  
uv run python src/cluster4npu/test.py --example cascade

# Complex multi-stage pipeline
uv run python src/cluster4npu/test.py --example complex

# Basic MultiDongle usage
uv run python src/cluster4npu/Multidongle.py

# Complete UI application with full workflow
uv run python UI.py

# UI integration examples
uv run python ui_integration_example.py

# Test UI configuration system
uv run python ui_config.py

UI Application Workflow

The UI.py provides a complete visual workflow:

  1. Dashboard/Home - Main entry point with recent files
  2. Pipeline Editor - Visual node-based pipeline design
  3. Stage Configuration - Dongle allocation and hardware setup
  4. Performance Estimation - FPS calculations and optimization
  5. Save & Deploy - Export configurations and cost estimation
  6. Monitoring & Management - Real-time pipeline monitoring
# Access different workflow stages directly:
# 1. Create new pipeline → Pipeline Editor
# 2. Configure Stages & Deploy → Stage Configuration
# 3. Pipeline menu → Performance Analysis → Performance Panel
# 4. Pipeline menu → Deploy Pipeline → Save & Deploy Dialog

Testing

# Run pipeline tests
uv run python test_pipeline.py

# Test MultiDongle functionality
uv run python src/cluster4npu/test.py

Hardware Requirements

  • Kneron NPU dongles: KL520, KL720, etc.
  • Firmware files: fw_scpu.bin, fw_ncpu.bin
  • Models: .nef format files
  • USB ports: Multiple ports required for multi-dongle setups

Critical Implementation Notes

Pipeline Configuration

  • Each stage requires unique stage_id and dedicated port_ids
  • Queue sizes (max_queue_size) must be balanced between memory usage and throughput
  • Stages process sequentially - output from stage N becomes input to stage N+1

Thread Safety

  • All pipeline operations are thread-safe
  • Each stage runs in isolated worker threads
  • Use callbacks for result handling, not direct queue access

Data Flow

Input → Stage1 → Stage2 → ... → StageN → Output
     ↓        ↓               ↓        ↓
   Queue   Process        Process   Result
           + Results      + Results  Callback

Hardware Management

  • Always call initialize() before start()
  • Always call stop() for clean shutdown
  • Firmware upload (upload_fw=True) only needed once per session
  • Port IDs must match actual USB connections

Error Handling

  • Pipeline continues on individual stage errors
  • Failed stages return error results rather than blocking
  • Comprehensive statistics available via get_pipeline_statistics()

UI Application Architecture

Complete Workflow Components

  • DashboardLogin: Main entry point with project management
  • PipelineEditor: Node-based visual pipeline design using NodeGraphQt
  • StageConfigurationDialog: Hardware allocation and dongle assignment
  • PerformanceEstimationPanel: Real-time performance analysis and optimization
  • SaveDeployDialog: Export configurations and deployment cost estimation
  • MonitoringDashboard: Live pipeline monitoring and cluster management

UI Integration System

  • ui_config.py: Configuration management and UI/core integration
  • ui_integration_example.py: Demonstrates conversion from UI to core tools
  • UIIntegration class: Bridges UI configurations to InferencePipeline

Key UI Features

  • Auto-dongle allocation: Smart assignment of dongles to pipeline stages
  • Performance estimation: Real-time FPS and latency calculations
  • Cost analysis: Hardware and operational cost projections
  • Export formats: Python scripts, JSON configs, YAML, Docker containers
  • Live monitoring: Real-time metrics and cluster scaling controls

Code Patterns

Basic Pipeline Setup

config = StageConfig(
    stage_id="unique_name",
    port_ids=[28, 32],
    scpu_fw_path="fw_scpu.bin", 
    ncpu_fw_path="fw_ncpu.bin",
    model_path="model.nef",
    upload_fw=True
)

pipeline = InferencePipeline([config])
pipeline.initialize()
pipeline.start()
pipeline.set_result_callback(callback_func)
# ... processing ...
pipeline.stop()

Inter-Stage Processing

# Custom preprocessing for stage input
preprocessor = PreProcessor(resize_fn=custom_resize_func)

# Custom postprocessing for stage output  
postprocessor = PostProcessor(process_fn=custom_process_func)

config = StageConfig(
    # ... basic config ...
    input_preprocessor=preprocessor,
    output_postprocessor=postprocessor
)

Performance Considerations

  • Queue Sizing: Smaller queues = lower latency, larger queues = higher throughput
  • Dongle Distribution: Spread dongles across stages for optimal parallelization
  • Processing Functions: Keep preprocessors/postprocessors lightweight
  • Memory Management: Monitor queue sizes to prevent memory buildup