Masonmason ee4d1a3e4a Add comprehensive TODO planning and new camera/video source implementations

- Add detailed TODO.md with complete project roadmap and implementation priorities
- Implement CameraSource class with multi-camera support and real-time capture
- Add VideoFileSource class with batch processing and frame control capabilities
- Create foundation for complete input/output data flow integration
- Document current auto-resize preprocessing implementation status
- Establish clear development phases and key missing components

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-07-16 23:19:00 +08:00

9.9 KiB

Raw Blame History

Cluster4NPU Pipeline TODO

Current Status

✅ Pipeline Core: Multi-stage pipeline with device auto-detection working
✅ Hardware Integration: Kneron NPU dongles connecting and initializing successfully
✅ Auto-resize Preprocessing: Model input shape detection and automatic preprocessing implemented
❌ Data Input Sources: Missing camera and file input implementations
❌ Result Persistence: No result saving or output mechanisms
❌ End-to-End Workflow: Gaps between UI configuration and core pipeline execution

Priority 1: Essential Components for Complete Inference Workflow

1. Data Source Implementation

Status: 🔴 Critical Missing Components
Location: Need to create new classes in core/functions/ or extend existing ones

1.1 Camera Input Source

File: core/functions/camera_source.py (new)
Class: CameraSource
Purpose: Wrapper around cv2.VideoCapture for camera input
Integration: Connect to InferencePipeline.put_data()
Features:
- Multiple camera index support
- Resolution and FPS configuration
- Format conversion (BGR → model input format)
- Error handling for camera disconnection

1.2 Video File Input Source

File: core/functions/video_source.py (new)
Class: VideoFileSource
Purpose: Process video files frame by frame
Integration: Feed frames to InferencePipeline
Features:
- Support common video formats (MP4, AVI, MOV)
- Frame rate control and seeking
- Batch processing capabilities
- Progress tracking

1.3 Image File Input Source

File: core/functions/image_source.py (new)
Class: ImageFileSource
Purpose: Process single images or image directories
Integration: Single-shot inference through pipeline
Features:
- Support common image formats (JPG, PNG, BMP)
- Batch directory processing
- Image validation and error handling

1.4 RTSP/HTTP Stream Source

File: core/functions/stream_source.py (new)
Class: RTSPSource, HTTPStreamSource
Purpose: Process live video streams
Integration: Real-time streaming to pipeline
Features:
- Stream connection management
- Reconnection on failure
- Buffer management and frame dropping

2. Result Persistence System

Status: 🔴 Critical Missing Components
Location: core/functions/result_handler.py (new)

2.1 Result Serialization

Class: ResultSerializer
Purpose: Convert inference results to standard formats
Features:
- JSON export with timestamps
- CSV export for analytics
- Binary format for performance
- Configurable fields and formatting

2.2 File Output Manager

Class: FileOutputManager
Purpose: Handle result file writing and organization
Features:
- Timestamped file naming
- Directory organization by date/pipeline
- File rotation and cleanup
- Output format configuration

2.3 Real-time Result Streaming

Class: ResultStreamer
Purpose: Stream results to external systems
Features:
- WebSocket result broadcasting
- REST API endpoints
- Message queue integration (Redis, RabbitMQ)
- Custom callback system

3. Input/Output Integration Bridge

Status: 🔴 Critical Missing Components
Location: core/functions/pipeline_manager.py (new)

3.1 Pipeline Configuration Manager

Class: PipelineConfigManager
Purpose: Convert UI configurations to executable pipelines
Integration: Bridge between UI and core pipeline
Features:
- Parse UI node configurations
- Instantiate appropriate data sources
- Configure result handlers
- Manage pipeline lifecycle

3.2 Unified Workflow Orchestrator

Class: WorkflowOrchestrator
Purpose: Coordinate complete data flow from input to output
Features:
- Input source management
- Pipeline execution control
- Result handling and persistence
- Error recovery and logging

Priority 2: Enhanced Preprocessing and Auto-resize

4. Enhanced Preprocessing System

Status: 🟡 Partially Implemented
Location: core/functions/Multidongle.py (existing) + new preprocessing modules

4.1 Current Auto-resize Implementation

Location: Multidongle.py:354-371 (preprocess_frame method)
Features: ✅ Already implemented
- Automatic model input shape detection
- Dynamic resizing based on model requirements
- Format conversion (BGR565, RGB8888, YUYV, RAW8)
- Aspect ratio handling

4.2 Enhanced Preprocessing Pipeline

File: core/functions/preprocessor.py (new)
Class: AdvancedPreprocessor
Purpose: Extended preprocessing capabilities
Features:
- Smart cropping: Maintain aspect ratio with intelligent cropping
- Normalization: Configurable pixel value normalization
- Augmentation: Real-time data augmentation for training
- Multi-model support: Different preprocessing for different models
- Caching: Preprocessed frame caching for performance

4.3 Model-Aware Preprocessing

Enhancement: Extend existing Multidongle class
Location: core/functions/Multidongle.py:188-199 (model_input_shape detection)
Features:
- Dynamic preprocessing: Adjust preprocessing based on model metadata
- Model-specific optimization: Tailored preprocessing for different model types
- Preprocessing profiles: Saved preprocessing configurations per model

Priority 3: UI Integration and User Experience

5. Dashboard Integration

Status: 🟡 Partially Implemented
Location: ui/windows/dashboard.py (existing)

5.1 Real-time Pipeline Monitoring

Enhancement: Extend existing Dashboard class
Features:
- Live inference statistics
- Real-time result visualization
- Performance metrics dashboard
- Error monitoring and alerts

5.2 Input Source Configuration

Integration: Connect UI input nodes to actual data sources
Features:
- Camera selection and preview
- File browser integration
- Stream URL validation
- Input source testing

6. Result Visualization

Status: 🔴 Not Implemented
Location: ui/widgets/result_viewer.py (new)

Class: ResultViewer
Purpose: Display inference results in UI
Features:
- Real-time result streaming
- Result history and filtering
- Export capabilities
- Customizable display formats

Priority 4: Advanced Features and Optimization

7. Performance Optimization

Status: 🟡 Basic Implementation
Location: Multiple files

7.1 Memory Management

Enhancement: Optimize existing queue systems
Files: InferencePipeline.py, Multidongle.py
Features:
- Smart queue sizing based on available memory
- Frame dropping under load
- Memory leak detection and prevention
- Garbage collection optimization

7.2 Multi-device Load Balancing

Enhancement: Extend existing multi-dongle support
Location: core/functions/Multidongle.py (existing auto-detection)
Features:
- Intelligent device allocation
- Load balancing across devices
- Device health monitoring
- Automatic failover

8. Error Handling and Recovery

Status: 🟡 Basic Implementation
Location: Throughout codebase

8.1 Comprehensive Error Recovery

Enhancement: Extend existing error handling
Features:
- Automatic device reconnection
- Pipeline restart on critical errors
- Input source recovery
- Result persistence on failure

Implementation Roadmap

Phase 1: Core Data Flow (Weeks 1-2)

✅ Complete: Pipeline deployment and device initialization
🔄 In Progress: Auto-resize preprocessing (mostly implemented)
Next: Implement basic camera input source
Next: Add simple result file output
Next: Create basic pipeline manager

Phase 2: Complete Workflow (Weeks 3-4)

Add video file input support
Implement comprehensive result persistence
Create UI integration bridge
Add real-time monitoring

Phase 3: Advanced Features (Weeks 5-6)

Enhanced preprocessing pipeline
Performance optimization
Advanced error handling
Result visualization

Phase 4: Production Features (Weeks 7-8)

Multi-device load balancing
Advanced stream input support
Analytics and reporting
Configuration management

Key Code Locations for Current Auto-resize Implementation

Model Input Shape Detection

File: core/functions/Multidongle.py
Lines: 188-199 (model_input_shape property)
Status: ✅ Working - detects model input dimensions from NEF files

Automatic Preprocessing

File: core/functions/Multidongle.py
Lines: 354-371 (preprocess_frame method)
Status: ✅ Working - auto-resizes based on model input shape
Features: Format conversion, aspect ratio handling

Pipeline Data Processing

File: core/functions/InferencePipeline.py
Lines: 165-240 (_process_data method)
Status: ✅ Working - integrates preprocessing with inference
Features: Inter-stage processing, result accumulation

Format Conversion

File: core/functions/Multidongle.py
Lines: 382-396 (_convert_format method)
Status: ✅ Working - supports BGR565, RGB8888, YUYV, RAW8

Notes for Development

Auto-resize is already implemented ✅ - The system automatically detects model input shape and resizes accordingly
Priority should be on input sources - Camera and file input are the critical missing pieces
Result persistence is essential - Current system only provides callbacks, need file output
UI integration gap - UI configuration doesn't connect to core pipeline execution
Performance is good - Multi-threading and device management are solid foundations

The core pipeline and preprocessing are working well - the focus should be on completing the input/output ecosystem around the existing robust inference engine.

9.9 KiB Raw Blame History