jim800121chen d8a9517c9d feat(task-scheduler): Phase 0.8b — API key auth + /result endpoint
Auth pillar 從 OAuth 2.0 resource server 改成 pre-shared API key
(visionA ↔ converter 1:1 internal trust)。新增 GET /api/v1/jobs/:id/result
streaming endpoint 給 visionA backend 中轉 NEF 下載。

Phase A(auth 切換):
- 新增 apiKeyMiddleware(constant-time compare、tokenFingerprint、4 audit events)
- 砍 OAuth middleware + JWKS(保留 oauthClient 供 promote → FAA 使用)
- 4 個 endpoint 換掛 requireApiKey
- 加 TRUST_PROXY env + Express trust proxy 設定(forensic source_ip)

Phase B(/result endpoint):
- streaming NEF download with 5min timeout + concurrent cap 10
- Two-tier rate limit(burst 5/10s + sustained 20/min)
- Bandwidth quota(1 GB/hr + 6 GB/24hr)by token_fingerprint
- Range header silently ignored + Accept-Ranges: none
- filename quote-escape + RFC 5987 fallback + sanitize
- 8 個 /result audit events(forensic 完整)

設計演進記錄:docs/TODO-visionA-integration-v2.md(5/2 OAuth → 5/16 API key
→ 5/16 download via converter;對應 visionA repo ADR-015/016)

Tests: 597 → 666 (+69)、29 suites all pass
Security: APPROVE WITH CONDITIONS(單 instance 部署、6 新 env、24hr 監控)
npm audit: 3 vuln → 0(transitive AWS SDK xml chain)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:47:28 +08:00

188 lines
5.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Database 設計
> **狀態**Phase 1 完工 — Phase 0.8b **完全不動**。
>
> **配套**`design-doc.md` §3.7、`api/api-jobs.md`。
---
## 1. 為什麼用 Redis、不用 PostgreSQL
- Phase 1 資料模式簡單job 是 state machine、user index 是 key-value
- 既有哲學「Crash 即 Reset」對 Redis 友善PG 引入持久化反而變複雜)
- Redis Set 做 user 索引足夠(單 user 7 天內 < 10 job
- 未來若要跨 Crash recovery / instance HA再評估 PG
---
## 2. Key 規劃
| Key | 類型 | 用途 | TTL |
|-----|------|------|-----|
| `job:{job_id}` | String (JSON) | Job 完整 record | 7 |
| `user:{user_id}:jobs` | Set | user 所有 job_id不分狀態 | 每次寫入時 `EXPIRE 7d` |
| `user:{user_id}:active_job` | String | 當前 in-progress job_id= `created` `running`| job 結束刪除 |
| `ratelimit:client:{client_id}` | `express-rate-limit` 管理 | per-client_id rate limit | 5 min |
| `queue:onnx` / `queue:bie` / `queue:nef` | Redis Stream | Worker 任務佇列 | |
| `queue:done` | Redis Stream | Worker 完成事件 | |
| `queue:progress` | Redis Stream | Worker stage 內進度選配Phase 2| |
---
## 3. Job record schema
```jsonc
{
// 既有欄位
"job_id": "uuid",
"created_at": "...",
"updated_at": "...",
"status": "ONNX | BIE | NEF | COMPLETED | FAILED", // 內部仍用大寫
"stage": "onnx | bie | nef | null",
"progress": 0,
"parameters": {
"model_id": 1001,
"version": "0001",
"platform": "520",
"enable_evaluate": false,
"enable_sim_fp": false,
"enable_sim_fixed": false,
"enable_sim_hw": false
},
"output": { // 舊格式(向後相容)
"bie_path": null,
"nef_path": null,
"onnx_path": null
},
"result_object_keys": { // 新格式
"onnx": "jobs/{job_id}/output/out.onnx",
"bie": "jobs/{job_id}/output/out.bie",
"nef": "jobs/{job_id}/output/out.nef"
},
"error": null,
"origin": "api | web",
"user_id": "visionA-user-12345",
"tenant_id": "uuid-or-null",
"created_by_client_id": "visionA-service", // API key 模式下固定值
"source_filename": "model.onnx", // Phase 0.8b 新增(/result endpoint filename 用)
"input": {
"filename": "model.onnx",
"object_key": "jobs/{job_id}/input/model.onnx",
"size_bytes": 204800000,
"ref_images_count": 0
},
"stage_timings": {
"onnx": { "started_at": "...", "completed_at": "..." },
"bie": { "started_at": "...", "completed_at": null },
"nef": null
},
"stage_progress": 0,
"expires_at": "2026-05-23T12:00:00Z",
"metadata": {},
"promoted": false, // 冪等性 flag
"promoted_object_keys": [] // 已 promote 的目標
}
```
### 3.1 `source_filename` 欄位
Phase 0.8b 新增需求`/result` endpoint 需要這個欄位構造 download filename
**寫入點**`POST /api/v1/jobs` handler multer 接收 `model` 檔後 `multipart.filename` 寫入 `job.source_filename` sanitized)。
**Backend 端 task**確認 `jobService.createJob` 寫入這個欄位檢查既有 code可能已存在若沒有則補上)。
---
## 4. 對外 status 映射(不變)
詳見 `api/api-jobs.md` §5.3
---
## 5. User 索引設計
### 5.1 Key 寫入時機
```
建立 job:
MULTI
SET job:{id} {...}
SADD user:{user_id}:jobs {id}
EXPIRE user:{user_id}:jobs 604800
SETNX user:{user_id}:active_job {id}
EXEC
若 SETNX 回 0 → 衝突,回滾,回 409
若 SETNX 回 1 → 成功
完成 / 失敗時:
MULTI
SET job:{id} {...}
DEL user:{user_id}:active_job # 僅在 value == 當前 job_id 時才 DEL
EXEC
```
### 5.2 Lua scriptclaim_active_job
```lua
-- KEYS[1] = user:{user_id}:active_job
-- KEYS[2] = job:{job_id}
-- KEYS[3] = user:{user_id}:jobs
-- ARGV[1] = job_id
-- ARGV[2] = job_json
-- ARGV[3] = ttl_seconds
if redis.call('EXISTS', KEYS[1]) == 1 then
return {'conflict', redis.call('GET', KEYS[1])}
end
redis.call('SET', KEYS[1], ARGV[1])
redis.call('SET', KEYS[2], ARGV[2])
redis.call('SADD', KEYS[3], ARGV[1])
redis.call('EXPIRE', KEYS[3], tonumber(ARGV[3]))
return {'ok'}
```
### 5.3 避免 `KEYS *`
**錯誤做法**`redis.keys('job:*')` O(N) 阻塞
**正確做法**
```javascript
const ids = await redis.smembers(`user:${userId}:jobs`);
const pipeline = redis.pipeline();
for (const id of ids) pipeline.get(`job:${id}`);
const results = await pipeline.exec();
```
---
## 6. 記憶體預估
- 每個 job record 2-4 KB stage_timings
- 每個 user index Set 每個元素 < 40 bytes
- 1000 並發 user × 10 jobs = 10k job record 40 MB
Redis 輕鬆Converter Bucket lifecycle 7 Redis 也跟著 TTL 7 記憶體上限可控
---
## 7. M5 方案 A先寫 MinIO 後 Lua claim
避免拿到 Lua claim MinIO 失敗需要 rollback Redis 的複雜度
- MinIO 失敗 直接回 502Redis 完全乾淨
- Lua conflict / throw cleanup MinIOfire-and-forget 7d lifecycle 兜底
- enqueue 失敗 補償 release Redis + cleanup MinIO
---
## 8. Phase 0.8b 變動
**無**Database 完全不動
唯一相關變動
- `created_by_client_id` API key 模式下會固定為 `visionA-service`middleware 設定 `req.auth.clientId`)— 此為 handler 行為不是 schema 改變
- `source_filename` 欄位確認存在既有實作可能已有若無Backend 補上 屬於 Phase B 任務