Major Features: • Advanced topological sorting algorithm with cycle detection and resolution • Intelligent pipeline optimization with parallelization analysis • Critical path analysis and performance metrics calculation • Comprehensive .mflow file converter for seamless UI-to-API integration • Complete modular UI framework with node-based pipeline editor • Enhanced model node properties (scpu_fw_path, ncpu_fw_path) • Professional output formatting without emoji decorations Technical Improvements: • Graph theory algorithms (DFS, BFS, topological sort) • Automatic dependency resolution and conflict prevention • Multi-criteria pipeline optimization • Real-time stage count calculation and validation • Comprehensive configuration validation and error handling • Modular architecture with clean separation of concerns New Components: • MFlow converter with topology analysis (core/functions/mflow_converter.py) • Complete node system with exact property matching • Pipeline editor with visual node connections • Performance estimation and dongle management panels • Comprehensive test suite and demonstration scripts 🤖 Generated with Claude Code (https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
6.4 KiB
6.4 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
cluster4npu is a high-performance multi-stage inference pipeline system for Kneron NPU dongles. The project enables flexible single-stage and cascaded multi-stage AI inference workflows optimized for real-time video processing and high-throughput scenarios.
Core Architecture
- InferencePipeline: Main orchestrator managing multi-stage workflows with automatic queue management and thread coordination
- MultiDongle: Hardware abstraction layer for Kneron NPU devices (KL520, KL720, etc.)
- StageConfig: Configuration system for individual pipeline stages
- PipelineData: Data structure that flows through pipeline stages, accumulating results
- PreProcessor/PostProcessor: Flexible data transformation components for inter-stage processing
Key Design Patterns
- Producer-Consumer: Each stage runs in separate threads with input/output queues
- Pipeline Architecture: Linear data flow through configurable stages with result accumulation
- Hardware Abstraction: MultiDongle encapsulates Kneron SDK complexity
- Callback-Based: Asynchronous result handling via configurable callbacks
Development Commands
Environment Setup
# Setup virtual environment with uv
uv venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
uv pip install -r requirements.txt
Running Examples
# Single-stage pipeline
uv run python src/cluster4npu/test.py --example single
# Two-stage cascade pipeline
uv run python src/cluster4npu/test.py --example cascade
# Complex multi-stage pipeline
uv run python src/cluster4npu/test.py --example complex
# Basic MultiDongle usage
uv run python src/cluster4npu/Multidongle.py
# Complete UI application with full workflow
uv run python UI.py
# UI integration examples
uv run python ui_integration_example.py
# Test UI configuration system
uv run python ui_config.py
UI Application Workflow
The UI.py provides a complete visual workflow:
- Dashboard/Home - Main entry point with recent files
- Pipeline Editor - Visual node-based pipeline design
- Stage Configuration - Dongle allocation and hardware setup
- Performance Estimation - FPS calculations and optimization
- Save & Deploy - Export configurations and cost estimation
- Monitoring & Management - Real-time pipeline monitoring
# Access different workflow stages directly:
# 1. Create new pipeline → Pipeline Editor
# 2. Configure Stages & Deploy → Stage Configuration
# 3. Pipeline menu → Performance Analysis → Performance Panel
# 4. Pipeline menu → Deploy Pipeline → Save & Deploy Dialog
Testing
# Run pipeline tests
uv run python test_pipeline.py
# Test MultiDongle functionality
uv run python src/cluster4npu/test.py
Hardware Requirements
- Kneron NPU dongles: KL520, KL720, etc.
- Firmware files:
fw_scpu.bin,fw_ncpu.bin - Models:
.nefformat files - USB ports: Multiple ports required for multi-dongle setups
Critical Implementation Notes
Pipeline Configuration
- Each stage requires unique
stage_idand dedicatedport_ids - Queue sizes (
max_queue_size) must be balanced between memory usage and throughput - Stages process sequentially - output from stage N becomes input to stage N+1
Thread Safety
- All pipeline operations are thread-safe
- Each stage runs in isolated worker threads
- Use callbacks for result handling, not direct queue access
Data Flow
Input → Stage1 → Stage2 → ... → StageN → Output
↓ ↓ ↓ ↓
Queue Process Process Result
+ Results + Results Callback
Hardware Management
- Always call
initialize()beforestart() - Always call
stop()for clean shutdown - Firmware upload (
upload_fw=True) only needed once per session - Port IDs must match actual USB connections
Error Handling
- Pipeline continues on individual stage errors
- Failed stages return error results rather than blocking
- Comprehensive statistics available via
get_pipeline_statistics()
UI Application Architecture
Complete Workflow Components
- DashboardLogin: Main entry point with project management
- PipelineEditor: Node-based visual pipeline design using NodeGraphQt
- StageConfigurationDialog: Hardware allocation and dongle assignment
- PerformanceEstimationPanel: Real-time performance analysis and optimization
- SaveDeployDialog: Export configurations and deployment cost estimation
- MonitoringDashboard: Live pipeline monitoring and cluster management
UI Integration System
- ui_config.py: Configuration management and UI/core integration
- ui_integration_example.py: Demonstrates conversion from UI to core tools
- UIIntegration class: Bridges UI configurations to InferencePipeline
Key UI Features
- Auto-dongle allocation: Smart assignment of dongles to pipeline stages
- Performance estimation: Real-time FPS and latency calculations
- Cost analysis: Hardware and operational cost projections
- Export formats: Python scripts, JSON configs, YAML, Docker containers
- Live monitoring: Real-time metrics and cluster scaling controls
Code Patterns
Basic Pipeline Setup
config = StageConfig(
stage_id="unique_name",
port_ids=[28, 32],
scpu_fw_path="fw_scpu.bin",
ncpu_fw_path="fw_ncpu.bin",
model_path="model.nef",
upload_fw=True
)
pipeline = InferencePipeline([config])
pipeline.initialize()
pipeline.start()
pipeline.set_result_callback(callback_func)
# ... processing ...
pipeline.stop()
Inter-Stage Processing
# Custom preprocessing for stage input
preprocessor = PreProcessor(resize_fn=custom_resize_func)
# Custom postprocessing for stage output
postprocessor = PostProcessor(process_fn=custom_process_func)
config = StageConfig(
# ... basic config ...
input_preprocessor=preprocessor,
output_postprocessor=postprocessor
)
Performance Considerations
- Queue Sizing: Smaller queues = lower latency, larger queues = higher throughput
- Dongle Distribution: Spread dongles across stages for optimal parallelization
- Processing Functions: Keep preprocessors/postprocessors lightweight
- Memory Management: Monitor queue sizes to prevent memory buildup