|
|
cbd1b9db28
|
fix(task-scheduler): Bug #10 — convention path fallback(visionA promote/result 拿不到 NEF)
visionA e2e 撞到:promote / result endpoint 在 status=COMPLETED 仍拿不到
NEF(409 source_not_available / 404 result_not_found)。
根因:worker (services/workers/consumer.py:118) 把 NEF/BIE/ONNX 上傳到
固定 convention path `jobs/{job_id}/out.{output_name}`、但 scheduler 端
advanceJob (jobService.js:246) 沒接收 worker done event 的 output path、
所以 job.output.{source}_path 永遠 null、讀取端拿不到。
修法 A(讀取端 fallback、最低風險):
- promote.js getJobOutputKey() + result.js extractNefObjectKey() 在
status=COMPLETED + jobId 有效 + source ∈ {onnx,bie,nef} 時、反推
convention path
- 不改 worker / 不改 advanceJob / 不改 redis schema
- fallback 放最後、保留 result_object_keys / output.{source}_path 兩種
顯式設定優先級
Phase 2 backlog(待補完):
- 補完 worker → scheduler done event 寫 output path
- advanceJob 接收 output path 並寫進 redis
- 清掉本批 fallback dead branch + promote 409 source_not_available
dead branch(fallback 後 valid source 永遠拿得到 key)
Tests: 666/666 pass(無回歸)
Reviewer: ✅ 通過、guard 嚴格、對齊 worker convention、無 path traversal 風險
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-18 15:55:22 +08:00 |
|
|
|
4d381c0b50
|
feat(task-scheduler): Phase 1 — modularize server + add OAuth/JWKS + /api/v1/* routes
Refactor server.js (647 → 99 lines) into 30+ modules under src/:
- auth/: JWKS validation, JWT middleware, OAuth client_credentials
- routes/v1/: jobs (POST/GET/:id) + promote with input validation
- routes/legacy.js: existing /jobs multipart path (backward compatible)
- services/: jobService, healthService, sseService, statusMapper,
doneListener
- middleware/: requestId, errorHandler, perClientRateLimit,
uploadConcurrency, upload (multer + storage)
- redis/: Lua scripts for atomic claim/release_active_job
- storage/: local + minio adapters; fileAccessAgent/: PUT promote client
- config.js: env var validation with fail-fast
Phase 1 features (T1–T11):
- T1 Auth middleware + JWKS (Member Center OAuth2 resource server)
- T2 OAuth client (Member Center client_credentials, Basic auth)
- T3 /api/v1/* router skeleton
- T4 server.js refactor (legacy endpoints fully preserved, real-Redis
regression verified — existing worker consumer group untouched)
- T5 POST /api/v1/jobs (multipart, OWASP-audited, 2 Critical / 6 Major
fixed; Risk-A/B documented as accepted)
- T6 GET /api/v1/jobs + GET /:id (cursor pagination, ETag, IDOR-safe)
- T7 POST /jobs/:id/promote (FAA PUT with own service token, 300s
timeout, fail-fast on missing FAA URL)
- T8 /health upgrade (healthy/degraded/unhealthy + 30s background cache)
- T9 stage_timings (release_active_job in terminal states)
- T10 env + Docker integration (MULTIPART_* + concurrency limiter)
- T11 README (498 lines) + OpenAPI 3.0 spec (1588 lines)
Tests: 630 pass across 29 suites. Updated Dockerfile + .dockerignore +
docker-compose.yml env passthrough (no hardcoded secrets, fail-fast on
missing required vars).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-01 10:55:05 +08:00 |
|
|
|
efa67d59a4
|
Add web frontend, MinIO storage, monitoring, and docker-compose deployment
- Frontend: rewrite Home.vue to match backend POST /jobs API (remove single-stage options)
- Frontend: add Monitor page (/monitor) for queue and job monitoring
- Frontend: add job history with localStorage tracking (per-browser)
- Frontend: fix Nginx proxy rewrite (/api -> /) and add 500MB upload limit
- Backend: add MinIO storage support (STORAGE_BACKEND=minio) alongside local mode
- Backend: add GET /queues/stats API for queue monitoring
- Backend: fix download handler for MinIO (buffer mode for Node 18 compat)
- Workers: add S3/MinIO download/upload in consumer.py with isolated temp dirs
- Workers: add s3_storage.py helper with lifecycle rule (7-day TTL)
- Docker: add docker-compose.yml with all services (web, scheduler, redis, workers)
- Docker: ports mapped to 9500 (web) and 9501 (scheduler)
- Config: add .env to .gitignore to protect secrets
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-01 15:04:09 +08:00 |
|