Auth pillar 從 OAuth 2.0 resource server 改成 pre-shared API key (visionA ↔ converter 1:1 internal trust)。新增 GET /api/v1/jobs/:id/result streaming endpoint 給 visionA backend 中轉 NEF 下載。 Phase A(auth 切換): - 新增 apiKeyMiddleware(constant-time compare、tokenFingerprint、4 audit events) - 砍 OAuth middleware + JWKS(保留 oauthClient 供 promote → FAA 使用) - 4 個 endpoint 換掛 requireApiKey - 加 TRUST_PROXY env + Express trust proxy 設定(forensic source_ip) Phase B(/result endpoint): - streaming NEF download with 5min timeout + concurrent cap 10 - Two-tier rate limit(burst 5/10s + sustained 20/min) - Bandwidth quota(1 GB/hr + 6 GB/24hr)by token_fingerprint - Range header silently ignored + Accept-Ranges: none - filename quote-escape + RFC 5987 fallback + sanitize - 8 個 /result audit events(forensic 完整) 設計演進記錄:docs/TODO-visionA-integration-v2.md(5/2 OAuth → 5/16 API key → 5/16 download via converter;對應 visionA repo ADR-015/016) Tests: 597 → 666 (+69)、29 suites all pass Security: APPROVE WITH CONDITIONS(單 instance 部署、6 新 env、24hr 監控) npm audit: 3 vuln → 0(transitive AWS SDK xml chain) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.5 KiB
5.5 KiB
Database 設計
狀態:Phase 1 完工 — Phase 0.8b 完全不動。
配套:
design-doc.md§3.7、api/api-jobs.md。
1. 為什麼用 Redis、不用 PostgreSQL
- Phase 1 資料模式簡單:job 是 state machine、user index 是 key-value
- 既有哲學「Crash 即 Reset」對 Redis 友善(PG 引入持久化反而變複雜)
- Redis Set 做 user 索引足夠(單 user 7 天內 < 10 個 job)
- 未來若要跨 Crash recovery / 多 instance HA,再評估 PG
2. Key 規劃
| Key | 類型 | 用途 | TTL |
|---|---|---|---|
job:{job_id} |
String (JSON) | Job 完整 record | 7 天 |
user:{user_id}:jobs |
Set | 該 user 所有 job_id(不分狀態) | 每次寫入時 EXPIRE 7d |
user:{user_id}:active_job |
String | 當前 in-progress job_id(= created 或 running) |
隨 job 結束刪除 |
ratelimit:client:{client_id} |
由 express-rate-limit 管理 |
per-client_id rate limit | 5 min |
queue:onnx / queue:bie / queue:nef |
Redis Stream | Worker 任務佇列 | — |
queue:done |
Redis Stream | Worker 完成事件 | — |
queue:progress |
Redis Stream | Worker stage 內進度(選配,Phase 2) | — |
3. Job record schema
{
// 既有欄位
"job_id": "uuid",
"created_at": "...",
"updated_at": "...",
"status": "ONNX | BIE | NEF | COMPLETED | FAILED", // 內部仍用大寫
"stage": "onnx | bie | nef | null",
"progress": 0,
"parameters": {
"model_id": 1001,
"version": "0001",
"platform": "520",
"enable_evaluate": false,
"enable_sim_fp": false,
"enable_sim_fixed": false,
"enable_sim_hw": false
},
"output": { // 舊格式(向後相容)
"bie_path": null,
"nef_path": null,
"onnx_path": null
},
"result_object_keys": { // 新格式
"onnx": "jobs/{job_id}/output/out.onnx",
"bie": "jobs/{job_id}/output/out.bie",
"nef": "jobs/{job_id}/output/out.nef"
},
"error": null,
"origin": "api | web",
"user_id": "visionA-user-12345",
"tenant_id": "uuid-or-null",
"created_by_client_id": "visionA-service", // API key 模式下固定值
"source_filename": "model.onnx", // Phase 0.8b 新增(/result endpoint filename 用)
"input": {
"filename": "model.onnx",
"object_key": "jobs/{job_id}/input/model.onnx",
"size_bytes": 204800000,
"ref_images_count": 0
},
"stage_timings": {
"onnx": { "started_at": "...", "completed_at": "..." },
"bie": { "started_at": "...", "completed_at": null },
"nef": null
},
"stage_progress": 0,
"expires_at": "2026-05-23T12:00:00Z",
"metadata": {},
"promoted": false, // 冪等性 flag
"promoted_object_keys": [] // 已 promote 的目標
}
3.1 source_filename 欄位
Phase 0.8b 新增需求:/result endpoint 需要這個欄位構造 download filename。
寫入點:POST /api/v1/jobs handler 在 multer 接收 model 檔後,把 multipart.filename 寫入 job.source_filename(已 sanitized)。
Backend 端 task:確認 jobService.createJob 寫入這個欄位(檢查既有 code、可能已存在;若沒有則補上)。
4. 對外 status 映射(不變)
詳見 api/api-jobs.md §5.3。
5. User 索引設計
5.1 Key 寫入時機
建立 job:
MULTI
SET job:{id} {...}
SADD user:{user_id}:jobs {id}
EXPIRE user:{user_id}:jobs 604800
SETNX user:{user_id}:active_job {id}
EXEC
若 SETNX 回 0 → 衝突,回滾,回 409
若 SETNX 回 1 → 成功
完成 / 失敗時:
MULTI
SET job:{id} {...}
DEL user:{user_id}:active_job # 僅在 value == 當前 job_id 時才 DEL
EXEC
5.2 Lua script(claim_active_job)
-- KEYS[1] = user:{user_id}:active_job
-- KEYS[2] = job:{job_id}
-- KEYS[3] = user:{user_id}:jobs
-- ARGV[1] = job_id
-- ARGV[2] = job_json
-- ARGV[3] = ttl_seconds
if redis.call('EXISTS', KEYS[1]) == 1 then
return {'conflict', redis.call('GET', KEYS[1])}
end
redis.call('SET', KEYS[1], ARGV[1])
redis.call('SET', KEYS[2], ARGV[2])
redis.call('SADD', KEYS[3], ARGV[1])
redis.call('EXPIRE', KEYS[3], tonumber(ARGV[3]))
return {'ok'}
5.3 避免 KEYS *
錯誤做法:redis.keys('job:*') O(N) 阻塞。
正確做法:
const ids = await redis.smembers(`user:${userId}:jobs`);
const pipeline = redis.pipeline();
for (const id of ids) pipeline.get(`job:${id}`);
const results = await pipeline.exec();
6. 記憶體預估
- 每個 job record 約 2-4 KB(含 stage_timings 等)
- 每個 user index Set 每個元素 < 40 bytes
- 1000 並發 user × 10 jobs = 10k job record ≈ 40 MB
Redis 輕鬆。Converter Bucket lifecycle 7 天,Redis 也跟著 TTL 7 天,記憶體上限可控。
7. M5 方案 A:先寫 MinIO 後 Lua claim
避免「拿到 Lua claim 但 MinIO 失敗」需要 rollback Redis 的複雜度:
- MinIO 失敗 → 直接回 502,Redis 完全乾淨
- Lua conflict / throw → cleanup MinIO(fire-and-forget,靠 7d lifecycle 兜底)
- enqueue 失敗 → 補償 release Redis + cleanup MinIO
8. Phase 0.8b 變動
無。Database 完全不動。
唯一相關變動:
created_by_client_id在 API key 模式下會固定為visionA-service(middleware 設定req.auth.clientId)— 此為 handler 行為,不是 schema 改變source_filename欄位確認存在(既有實作可能已有;若無,Backend 補上 — 屬於 Phase B 任務)