Major Features: • Advanced topological sorting algorithm with cycle detection and resolution • Intelligent pipeline optimization with parallelization analysis • Critical path analysis and performance metrics calculation • Comprehensive .mflow file converter for seamless UI-to-API integration • Complete modular UI framework with node-based pipeline editor • Enhanced model node properties (scpu_fw_path, ncpu_fw_path) • Professional output formatting without emoji decorations Technical Improvements: • Graph theory algorithms (DFS, BFS, topological sort) • Automatic dependency resolution and conflict prevention • Multi-criteria pipeline optimization • Real-time stage count calculation and validation • Comprehensive configuration validation and error handling • Modular architecture with clean separation of concerns New Components: • MFlow converter with topology analysis (core/functions/mflow_converter.py) • Complete node system with exact property matching • Pipeline editor with visual node connections • Performance estimation and dongle management panels • Comprehensive test suite and demonstration scripts 🤖 Generated with Claude Code (https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
191 lines
6.4 KiB
Markdown
191 lines
6.4 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
**cluster4npu** is a high-performance multi-stage inference pipeline system for Kneron NPU dongles. The project enables flexible single-stage and cascaded multi-stage AI inference workflows optimized for real-time video processing and high-throughput scenarios.
|
|
|
|
### Core Architecture
|
|
|
|
- **InferencePipeline**: Main orchestrator managing multi-stage workflows with automatic queue management and thread coordination
|
|
- **MultiDongle**: Hardware abstraction layer for Kneron NPU devices (KL520, KL720, etc.)
|
|
- **StageConfig**: Configuration system for individual pipeline stages
|
|
- **PipelineData**: Data structure that flows through pipeline stages, accumulating results
|
|
- **PreProcessor/PostProcessor**: Flexible data transformation components for inter-stage processing
|
|
|
|
### Key Design Patterns
|
|
|
|
- **Producer-Consumer**: Each stage runs in separate threads with input/output queues
|
|
- **Pipeline Architecture**: Linear data flow through configurable stages with result accumulation
|
|
- **Hardware Abstraction**: MultiDongle encapsulates Kneron SDK complexity
|
|
- **Callback-Based**: Asynchronous result handling via configurable callbacks
|
|
|
|
## Development Commands
|
|
|
|
### Environment Setup
|
|
```bash
|
|
# Setup virtual environment with uv
|
|
uv venv
|
|
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
|
|
|
# Install dependencies
|
|
uv pip install -r requirements.txt
|
|
```
|
|
|
|
### Running Examples
|
|
```bash
|
|
# Single-stage pipeline
|
|
uv run python src/cluster4npu/test.py --example single
|
|
|
|
# Two-stage cascade pipeline
|
|
uv run python src/cluster4npu/test.py --example cascade
|
|
|
|
# Complex multi-stage pipeline
|
|
uv run python src/cluster4npu/test.py --example complex
|
|
|
|
# Basic MultiDongle usage
|
|
uv run python src/cluster4npu/Multidongle.py
|
|
|
|
# Complete UI application with full workflow
|
|
uv run python UI.py
|
|
|
|
# UI integration examples
|
|
uv run python ui_integration_example.py
|
|
|
|
# Test UI configuration system
|
|
uv run python ui_config.py
|
|
```
|
|
|
|
### UI Application Workflow
|
|
The UI.py provides a complete visual workflow:
|
|
|
|
1. **Dashboard/Home** - Main entry point with recent files
|
|
2. **Pipeline Editor** - Visual node-based pipeline design
|
|
3. **Stage Configuration** - Dongle allocation and hardware setup
|
|
4. **Performance Estimation** - FPS calculations and optimization
|
|
5. **Save & Deploy** - Export configurations and cost estimation
|
|
6. **Monitoring & Management** - Real-time pipeline monitoring
|
|
|
|
```bash
|
|
# Access different workflow stages directly:
|
|
# 1. Create new pipeline → Pipeline Editor
|
|
# 2. Configure Stages & Deploy → Stage Configuration
|
|
# 3. Pipeline menu → Performance Analysis → Performance Panel
|
|
# 4. Pipeline menu → Deploy Pipeline → Save & Deploy Dialog
|
|
```
|
|
|
|
### Testing
|
|
```bash
|
|
# Run pipeline tests
|
|
uv run python test_pipeline.py
|
|
|
|
# Test MultiDongle functionality
|
|
uv run python src/cluster4npu/test.py
|
|
```
|
|
|
|
## Hardware Requirements
|
|
|
|
- **Kneron NPU dongles**: KL520, KL720, etc.
|
|
- **Firmware files**: `fw_scpu.bin`, `fw_ncpu.bin`
|
|
- **Models**: `.nef` format files
|
|
- **USB ports**: Multiple ports required for multi-dongle setups
|
|
|
|
## Critical Implementation Notes
|
|
|
|
### Pipeline Configuration
|
|
- Each stage requires unique `stage_id` and dedicated `port_ids`
|
|
- Queue sizes (`max_queue_size`) must be balanced between memory usage and throughput
|
|
- Stages process sequentially - output from stage N becomes input to stage N+1
|
|
|
|
### Thread Safety
|
|
- All pipeline operations are thread-safe
|
|
- Each stage runs in isolated worker threads
|
|
- Use callbacks for result handling, not direct queue access
|
|
|
|
### Data Flow
|
|
```
|
|
Input → Stage1 → Stage2 → ... → StageN → Output
|
|
↓ ↓ ↓ ↓
|
|
Queue Process Process Result
|
|
+ Results + Results Callback
|
|
```
|
|
|
|
### Hardware Management
|
|
- Always call `initialize()` before `start()`
|
|
- Always call `stop()` for clean shutdown
|
|
- Firmware upload (`upload_fw=True`) only needed once per session
|
|
- Port IDs must match actual USB connections
|
|
|
|
### Error Handling
|
|
- Pipeline continues on individual stage errors
|
|
- Failed stages return error results rather than blocking
|
|
- Comprehensive statistics available via `get_pipeline_statistics()`
|
|
|
|
## UI Application Architecture
|
|
|
|
### Complete Workflow Components
|
|
|
|
- **DashboardLogin**: Main entry point with project management
|
|
- **PipelineEditor**: Node-based visual pipeline design using NodeGraphQt
|
|
- **StageConfigurationDialog**: Hardware allocation and dongle assignment
|
|
- **PerformanceEstimationPanel**: Real-time performance analysis and optimization
|
|
- **SaveDeployDialog**: Export configurations and deployment cost estimation
|
|
- **MonitoringDashboard**: Live pipeline monitoring and cluster management
|
|
|
|
### UI Integration System
|
|
|
|
- **ui_config.py**: Configuration management and UI/core integration
|
|
- **ui_integration_example.py**: Demonstrates conversion from UI to core tools
|
|
- **UIIntegration class**: Bridges UI configurations to InferencePipeline
|
|
|
|
### Key UI Features
|
|
|
|
- **Auto-dongle allocation**: Smart assignment of dongles to pipeline stages
|
|
- **Performance estimation**: Real-time FPS and latency calculations
|
|
- **Cost analysis**: Hardware and operational cost projections
|
|
- **Export formats**: Python scripts, JSON configs, YAML, Docker containers
|
|
- **Live monitoring**: Real-time metrics and cluster scaling controls
|
|
|
|
## Code Patterns
|
|
|
|
### Basic Pipeline Setup
|
|
```python
|
|
config = StageConfig(
|
|
stage_id="unique_name",
|
|
port_ids=[28, 32],
|
|
scpu_fw_path="fw_scpu.bin",
|
|
ncpu_fw_path="fw_ncpu.bin",
|
|
model_path="model.nef",
|
|
upload_fw=True
|
|
)
|
|
|
|
pipeline = InferencePipeline([config])
|
|
pipeline.initialize()
|
|
pipeline.start()
|
|
pipeline.set_result_callback(callback_func)
|
|
# ... processing ...
|
|
pipeline.stop()
|
|
```
|
|
|
|
### Inter-Stage Processing
|
|
```python
|
|
# Custom preprocessing for stage input
|
|
preprocessor = PreProcessor(resize_fn=custom_resize_func)
|
|
|
|
# Custom postprocessing for stage output
|
|
postprocessor = PostProcessor(process_fn=custom_process_func)
|
|
|
|
config = StageConfig(
|
|
# ... basic config ...
|
|
input_preprocessor=preprocessor,
|
|
output_postprocessor=postprocessor
|
|
)
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
- **Queue Sizing**: Smaller queues = lower latency, larger queues = higher throughput
|
|
- **Dongle Distribution**: Spread dongles across stages for optimal parallelization
|
|
- **Processing Functions**: Keep preprocessors/postprocessors lightweight
|
|
- **Memory Management**: Monitor queue sizes to prevent memory buildup |