4.6 KiB
4.6 KiB
Cluster4NPU UI - Project Summary
Vision
Create an intuitive visual tool that enables users to design parallel AI inference pipelines for Kneron NPU dongles without coding knowledge, with clear visualization of performance benefits and hardware utilization.
Current System Status
✅ Current Capabilities
Visual Pipeline Designer:
- Drag-and-drop node-based interface using NodeGraphQt
- 5 node types: Input, Model, Preprocess, Postprocess, Output
- Real-time pipeline validation and stage counting
- Property configuration panels with type-aware widgets
- Pipeline persistence in .mflow JSON format
Professional UI:
- Three-panel layout (templates, editor, configuration)
- Global status bar with live statistics
- Real-time connection analysis and error detection
- Integrated project management and recent files
Inference Engine:
- Multi-stage pipeline orchestration with threading
- Kneron NPU dongle integration (KL520, KL720, KL1080)
- Hardware auto-detection and device management
- Real-time performance monitoring (FPS, latency)
🎯 Core Use Cases
Pipeline Flow:
Input → Preprocess → Model → Postprocess → Output
↓ ↓ ↓ ↓ ↓
Camera Resize NPU Inference Format Display
Supported Sources:
- USB cameras with configurable resolution/FPS
- Video files (MP4, AVI, MOV) with frame processing
- Image files (JPG, PNG, BMP) for batch processing
- RTSP streams for live video (basic support)
Development Priorities
Immediate Goals
- Performance Visualization: Show clear speedup benefits of parallel processing
- Device Management: Enhanced control over NPU dongle allocation
- Benchmarking System: Automated performance testing and comparison
- Real-time Dashboard: Live monitoring of pipeline execution
🚨 Key Missing Features
Performance Visualization
- Parallel vs sequential execution comparison
- Visual device allocation and load balancing
- Speedup calculation and metrics display
- Performance improvement charts
Advanced Monitoring
- Live performance graphs (FPS, latency, throughput)
- Resource utilization visualization
- Bottleneck identification and alerts
- Historical performance tracking
Device Management
- Visual device status dashboard
- Manual device assignment interface
- Device health monitoring and profiling
- Optimal allocation recommendations
Pipeline Optimization
- Automated benchmark execution
- Performance prediction before deployment
- Configuration templates for common use cases
- Optimization suggestions based on analysis
🛠 Technical Architecture
Current Foundation
- Core Processing:
InferencePipelinewith multi-stage orchestration - Hardware Integration:
Multidonglewith NPU auto-detection - UI Framework: PyQt5 with NodeGraphQt visual editor
- Pipeline Analysis: Real-time validation and stage detection
Key Components Needed
- PerformanceBenchmarker: Automated speedup measurement
- DeviceManager: Advanced NPU allocation and monitoring
- VisualizationDashboard: Live performance charts and metrics
- OptimizationEngine: Automated configuration suggestions
🎯 Implementation Roadmap
Phase 1: Performance Visualization
- Implement parallel vs sequential benchmarking
- Add speedup calculation and display
- Create performance comparison charts
- Build real-time monitoring dashboard
Phase 2: Device Management
- Visual device allocation interface
- Device health monitoring and profiling
- Manual assignment capabilities
- Load balancing optimization
Phase 3: Advanced Features
- Pipeline optimization suggestions
- Configuration templates
- Performance prediction
- Advanced analytics and reporting
🎨 User Experience Goals
Target Workflow
- Design: Drag-and-drop pipeline creation (< 5 minutes)
- Configure: Automatic device detection and allocation
- Preview: Performance prediction before execution
- Monitor: Real-time speedup visualization
- Optimize: Automated suggestions for improvements
Success Metrics
- Clear visualization of parallel processing benefits
- Intuitive interface requiring minimal training
- Measurable performance improvements from optimization
- Professional-grade monitoring and analytics
📈 Business Value
For Users:
- No-code parallel processing setup
- Clear ROI demonstration through speedup metrics
- Optimal hardware utilization without expert knowledge
For Platform:
- Unique visual approach to AI inference optimization
- Lower barrier to entry for complex parallel processing
- Scalable foundation for enterprise features