HuangMason320 c4090b2420 perf: Optimize multi-series dongle performance and prevent bottlenecks
Key improvements:
- Add timeout mechanism (2s) for result ordering to prevent slow devices from blocking pipeline
- Implement performance-biased load balancing with 2x penalty for low-GOPS devices (< 10 GOPS)
- Adjust KL520 GOPS from 3 to 2 for more accurate performance representation
- Remove KL540 references to focus on available hardware
- Add intelligent sequence skipping with timeout results for better throughput

This resolves the issue where multi-series mode had lower FPS than single KL720
due to KL520 devices creating bottlenecks in the result ordering queue.

Performance impact:
- Reduces KL520 task allocation from ~12.5% to ~5-8%
- Prevents pipeline stalls from slow inference results
- Maintains result ordering integrity with timeout fallback

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-14 17:15:39 +08:00
..