cluster4npu

History

HuangMason320 c4090b2420 perf: Optimize multi-series dongle performance and prevent bottlenecks

Key improvements:
- Add timeout mechanism (2s) for result ordering to prevent slow devices from blocking pipeline
- Implement performance-biased load balancing with 2x penalty for low-GOPS devices (< 10 GOPS)
- Adjust KL520 GOPS from 3 to 2 for more accurate performance representation
- Remove KL540 references to focus on available hardware
- Add intelligent sequence skipping with timeout results for better throughput

This resolves the issue where multi-series mode had lower FPS than single KL720
due to KL520 devices creating bottlenecks in the result ordering queue.

Performance impact:
- Reduces KL520 task allocation from ~12.5% to ~5-8%
- Prevents pipeline stalls from slow inference results
- Maintains result ordering integrity with timeout fallback

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-14 17:15:39 +08:00

__init__.py

Remove cluster4npu_ui package prefix and remove export/analysis buttons

2025-08-07 12:17:59 +08:00

base_node.py

Remove cluster4npu_ui package prefix and remove export/analysis buttons

2025-08-07 12:17:59 +08:00

exact_nodes.py

perf: Optimize multi-series dongle performance and prevent bottlenecks

2025-08-14 17:15:39 +08:00

input_node.py

Remove cluster4npu_ui package prefix and remove export/analysis buttons