Masonmason ee4d1a3e4a Add comprehensive TODO planning and new camera/video source implementations
- Add detailed TODO.md with complete project roadmap and implementation priorities
- Implement CameraSource class with multi-camera support and real-time capture
- Add VideoFileSource class with batch processing and frame control capabilities
- Create foundation for complete input/output data flow integration
- Document current auto-resize preprocessing implementation status
- Establish clear development phases and key missing components

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-16 23:19:00 +08:00

9.9 KiB

Cluster4NPU Pipeline TODO

Current Status

Pipeline Core: Multi-stage pipeline with device auto-detection working
Hardware Integration: Kneron NPU dongles connecting and initializing successfully
Auto-resize Preprocessing: Model input shape detection and automatic preprocessing implemented
Data Input Sources: Missing camera and file input implementations
Result Persistence: No result saving or output mechanisms
End-to-End Workflow: Gaps between UI configuration and core pipeline execution


Priority 1: Essential Components for Complete Inference Workflow

1. Data Source Implementation

Status: 🔴 Critical Missing Components
Location: Need to create new classes in core/functions/ or extend existing ones

1.1 Camera Input Source

  • File: core/functions/camera_source.py (new)
  • Class: CameraSource
  • Purpose: Wrapper around cv2.VideoCapture for camera input
  • Integration: Connect to InferencePipeline.put_data()
  • Features:
    • Multiple camera index support
    • Resolution and FPS configuration
    • Format conversion (BGR → model input format)
    • Error handling for camera disconnection

1.2 Video File Input Source

  • File: core/functions/video_source.py (new)
  • Class: VideoFileSource
  • Purpose: Process video files frame by frame
  • Integration: Feed frames to InferencePipeline
  • Features:
    • Support common video formats (MP4, AVI, MOV)
    • Frame rate control and seeking
    • Batch processing capabilities
    • Progress tracking

1.3 Image File Input Source

  • File: core/functions/image_source.py (new)
  • Class: ImageFileSource
  • Purpose: Process single images or image directories
  • Integration: Single-shot inference through pipeline
  • Features:
    • Support common image formats (JPG, PNG, BMP)
    • Batch directory processing
    • Image validation and error handling

1.4 RTSP/HTTP Stream Source

  • File: core/functions/stream_source.py (new)
  • Class: RTSPSource, HTTPStreamSource
  • Purpose: Process live video streams
  • Integration: Real-time streaming to pipeline
  • Features:
    • Stream connection management
    • Reconnection on failure
    • Buffer management and frame dropping

2. Result Persistence System

Status: 🔴 Critical Missing Components
Location: core/functions/result_handler.py (new)

2.1 Result Serialization

  • Class: ResultSerializer
  • Purpose: Convert inference results to standard formats
  • Features:
    • JSON export with timestamps
    • CSV export for analytics
    • Binary format for performance
    • Configurable fields and formatting

2.2 File Output Manager

  • Class: FileOutputManager
  • Purpose: Handle result file writing and organization
  • Features:
    • Timestamped file naming
    • Directory organization by date/pipeline
    • File rotation and cleanup
    • Output format configuration

2.3 Real-time Result Streaming

  • Class: ResultStreamer
  • Purpose: Stream results to external systems
  • Features:
    • WebSocket result broadcasting
    • REST API endpoints
    • Message queue integration (Redis, RabbitMQ)
    • Custom callback system

3. Input/Output Integration Bridge

Status: 🔴 Critical Missing Components
Location: core/functions/pipeline_manager.py (new)

3.1 Pipeline Configuration Manager

  • Class: PipelineConfigManager
  • Purpose: Convert UI configurations to executable pipelines
  • Integration: Bridge between UI and core pipeline
  • Features:
    • Parse UI node configurations
    • Instantiate appropriate data sources
    • Configure result handlers
    • Manage pipeline lifecycle

3.2 Unified Workflow Orchestrator

  • Class: WorkflowOrchestrator
  • Purpose: Coordinate complete data flow from input to output
  • Features:
    • Input source management
    • Pipeline execution control
    • Result handling and persistence
    • Error recovery and logging

Priority 2: Enhanced Preprocessing and Auto-resize

4. Enhanced Preprocessing System

Status: 🟡 Partially Implemented
Location: core/functions/Multidongle.py (existing) + new preprocessing modules

4.1 Current Auto-resize Implementation

  • Location: Multidongle.py:354-371 (preprocess_frame method)
  • Features: Already implemented
    • Automatic model input shape detection
    • Dynamic resizing based on model requirements
    • Format conversion (BGR565, RGB8888, YUYV, RAW8)
    • Aspect ratio handling

4.2 Enhanced Preprocessing Pipeline

  • File: core/functions/preprocessor.py (new)
  • Class: AdvancedPreprocessor
  • Purpose: Extended preprocessing capabilities
  • Features:
    • Smart cropping: Maintain aspect ratio with intelligent cropping
    • Normalization: Configurable pixel value normalization
    • Augmentation: Real-time data augmentation for training
    • Multi-model support: Different preprocessing for different models
    • Caching: Preprocessed frame caching for performance

4.3 Model-Aware Preprocessing

  • Enhancement: Extend existing Multidongle class
  • Location: core/functions/Multidongle.py:188-199 (model_input_shape detection)
  • Features:
    • Dynamic preprocessing: Adjust preprocessing based on model metadata
    • Model-specific optimization: Tailored preprocessing for different model types
    • Preprocessing profiles: Saved preprocessing configurations per model

Priority 3: UI Integration and User Experience

5. Dashboard Integration

Status: 🟡 Partially Implemented
Location: ui/windows/dashboard.py (existing)

5.1 Real-time Pipeline Monitoring

  • Enhancement: Extend existing Dashboard class
  • Features:
    • Live inference statistics
    • Real-time result visualization
    • Performance metrics dashboard
    • Error monitoring and alerts

5.2 Input Source Configuration

  • Integration: Connect UI input nodes to actual data sources
  • Features:
    • Camera selection and preview
    • File browser integration
    • Stream URL validation
    • Input source testing

6. Result Visualization

Status: 🔴 Not Implemented
Location: ui/widgets/result_viewer.py (new)

6.1 Result Display Widget

  • Class: ResultViewer
  • Purpose: Display inference results in UI
  • Features:
    • Real-time result streaming
    • Result history and filtering
    • Export capabilities
    • Customizable display formats

Priority 4: Advanced Features and Optimization

7. Performance Optimization

Status: 🟡 Basic Implementation
Location: Multiple files

7.1 Memory Management

  • Enhancement: Optimize existing queue systems
  • Files: InferencePipeline.py, Multidongle.py
  • Features:
    • Smart queue sizing based on available memory
    • Frame dropping under load
    • Memory leak detection and prevention
    • Garbage collection optimization

7.2 Multi-device Load Balancing

  • Enhancement: Extend existing multi-dongle support
  • Location: core/functions/Multidongle.py (existing auto-detection)
  • Features:
    • Intelligent device allocation
    • Load balancing across devices
    • Device health monitoring
    • Automatic failover

8. Error Handling and Recovery

Status: 🟡 Basic Implementation
Location: Throughout codebase

8.1 Comprehensive Error Recovery

  • Enhancement: Extend existing error handling
  • Features:
    • Automatic device reconnection
    • Pipeline restart on critical errors
    • Input source recovery
    • Result persistence on failure

Implementation Roadmap

Phase 1: Core Data Flow (Weeks 1-2)

  1. Complete: Pipeline deployment and device initialization
  2. 🔄 In Progress: Auto-resize preprocessing (mostly implemented)
  3. Next: Implement basic camera input source
  4. Next: Add simple result file output
  5. Next: Create basic pipeline manager

Phase 2: Complete Workflow (Weeks 3-4)

  1. Add video file input support
  2. Implement comprehensive result persistence
  3. Create UI integration bridge
  4. Add real-time monitoring

Phase 3: Advanced Features (Weeks 5-6)

  1. Enhanced preprocessing pipeline
  2. Performance optimization
  3. Advanced error handling
  4. Result visualization

Phase 4: Production Features (Weeks 7-8)

  1. Multi-device load balancing
  2. Advanced stream input support
  3. Analytics and reporting
  4. Configuration management

Key Code Locations for Current Auto-resize Implementation

Model Input Shape Detection

  • File: core/functions/Multidongle.py
  • Lines: 188-199 (model_input_shape property)
  • Status: Working - detects model input dimensions from NEF files

Automatic Preprocessing

  • File: core/functions/Multidongle.py
  • Lines: 354-371 (preprocess_frame method)
  • Status: Working - auto-resizes based on model input shape
  • Features: Format conversion, aspect ratio handling

Pipeline Data Processing

  • File: core/functions/InferencePipeline.py
  • Lines: 165-240 (_process_data method)
  • Status: Working - integrates preprocessing with inference
  • Features: Inter-stage processing, result accumulation

Format Conversion

  • File: core/functions/Multidongle.py
  • Lines: 382-396 (_convert_format method)
  • Status: Working - supports BGR565, RGB8888, YUYV, RAW8

Notes for Development

  1. Auto-resize is already implemented - The system automatically detects model input shape and resizes accordingly
  2. Priority should be on input sources - Camera and file input are the critical missing pieces
  3. Result persistence is essential - Current system only provides callbacks, need file output
  4. UI integration gap - UI configuration doesn't connect to core pipeline execution
  5. Performance is good - Multi-threading and device management are solid foundations

The core pipeline and preprocessing are working well - the focus should be on completing the input/output ecosystem around the existing robust inference engine.