Move PRD, design specs, architecture docs, and TDD from .autoflow/ (personal/per-branch layer) to docs/autoflow/ (shared layer that goes into git) per the new Autoflow workspace layout. Files moved: - 02-prd/PRD.md - 03-design/design-review.md - 03-design/user-flow-cross-system.md - 04-architecture/TDD.md - 04-architecture/design-doc.md - 04-architecture/security.md The originals were never tracked, so git mv reduced to a filesystem rename with no history to preserve. .autoflow/ remains for personal notes (progress.md, review reports, testing logs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1393 lines
51 KiB
Markdown
1393 lines
51 KiB
Markdown
# TDD — Kneron Model Converter 對外 API(Phase 1)
|
||
|
||
## 作者:Architect Agent
|
||
## 狀態:Draft(三方交叉審閱前)
|
||
## 最後更新:2026-04-25
|
||
## 配套文件:
|
||
- `design-doc.md`(架構決策)
|
||
- `../02-prd/PRD.md`(需求)
|
||
- `../03-design/design-review.md`(UX 回饋)
|
||
|
||
本 TDD 聚焦 Phase 1 實作細節。所有決策背後的「為什麼」請參考 `design-doc.md`。
|
||
|
||
## 變更歷程
|
||
|
||
| 日期 | 變更 | 作者 |
|
||
|------|------|------|
|
||
| 2026-04-25 | 初版 Draft 1.0 | Architect Agent |
|
||
| 2026-04-25 | 原始模型上傳路徑改為 visionA-backend multipart 直接上傳 Converter;POST /api/v1/jobs 改 multipart/form-data;移除 FAA `getFile()` / `headFile()` / `files:download.read` / `files:metadata.read` 相關內容;TBD-1、input_object_key、input_not_found 相關內容同步移除 | Architect Agent |
|
||
|
||
---
|
||
|
||
## 1. API 規格(Phase 1 必做)
|
||
|
||
### 1.1 通用約定
|
||
|
||
- **Base URL**:`https://<converter-host>/api/v1`(public vhost,僅此路徑對外)
|
||
- **Content-Type**:
|
||
- `POST /api/v1/jobs`:`multipart/form-data`(與既有 Web UI `POST /jobs` 一致)
|
||
- 其他端點(GET / POST `/promote`):request 為 `application/json; charset=utf-8`
|
||
- **所有 response**:`application/json; charset=utf-8`
|
||
- **時間格式**:ISO 8601 UTC(例:`2026-04-25T12:00:00Z`)
|
||
- **ID 格式**:`job_id` 採 UUIDv4(字串)
|
||
- **認證**:`Authorization: Bearer <JWT>`(除 `/health` 外全部必要)
|
||
- **Request ID**:若 client 傳 `X-Request-Id`,回應帶同一值;未傳則 server 產 UUIDv4。所有 log 必須記錄。
|
||
- **速率限制**:per `client_id` 300 req / 5min(header 回 `X-RateLimit-Limit`、`X-RateLimit-Remaining`、`X-RateLimit-Reset`)
|
||
|
||
### 1.2 統一錯誤格式
|
||
|
||
所有 4xx / 5xx 回應:
|
||
|
||
```json
|
||
{
|
||
"error": {
|
||
"code": "string_code",
|
||
"message": "human readable message (zh-TW)",
|
||
"details": { /* 可選,結構視 code 而定 */ },
|
||
"request_id": "uuid-v4"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 1.3 端點清單
|
||
|
||
| 方法 | 路徑 | 說明 | 需 scope |
|
||
|------|------|------|---------|
|
||
| GET | `/health` | 健康檢查 | — |
|
||
| POST | `/api/v1/jobs` | 建立轉檔 job | `converter:job.write` |
|
||
| GET | `/api/v1/jobs` | 列出 job(過濾條件)| `converter:job.read` |
|
||
| GET | `/api/v1/jobs/:id` | 單一 job 狀態 | `converter:job.read` |
|
||
| POST | `/api/v1/jobs/:id/promote` | 搬檔到 File Access Agent | `converter:job.write` |
|
||
|
||
**Phase 2 預留(Phase 1 回 501 Not Implemented)**:
|
||
| 方法 | 路徑 | 說明 |
|
||
|------|------|------|
|
||
| POST | `/api/v1/jobs/:id/download-tokens` | 換 delegated download token(待 Member Center)|
|
||
| DELETE | `/api/v1/jobs/:id` | 取消 job |
|
||
|
||
### 1.4 端點詳細規格
|
||
|
||
#### 1.4.1 `GET /health`(不需 auth)
|
||
|
||
**Response 200**:
|
||
```json
|
||
{
|
||
"service": "kneron-converter-api",
|
||
"status": "healthy",
|
||
"version": "1.0.0",
|
||
"timestamp": "2026-04-25T12:00:00Z",
|
||
"dependencies": {
|
||
"redis": "connected",
|
||
"member_center": "reachable",
|
||
"file_access_agent": "reachable"
|
||
}
|
||
}
|
||
```
|
||
|
||
**Response 503**(任一依賴失敗):
|
||
```json
|
||
{
|
||
"service": "kneron-converter-api",
|
||
"status": "unhealthy",
|
||
"dependencies": {
|
||
"redis": "disconnected",
|
||
"member_center": "reachable",
|
||
"file_access_agent": "reachable"
|
||
}
|
||
}
|
||
```
|
||
|
||
說明:Member Center / File Access Agent 的可達性檢查可用背景 cache(每 30s 檢查一次),避免 `/health` 自己變慢。
|
||
|
||
---
|
||
|
||
#### 1.4.2 `POST /api/v1/jobs`
|
||
|
||
**Request**:
|
||
```http
|
||
POST /api/v1/jobs
|
||
Authorization: Bearer <JWT>
|
||
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary...
|
||
X-Request-Id: <uuid> (optional)
|
||
|
||
------WebKitFormBoundary...
|
||
Content-Disposition: form-data; name="model"; filename="model.onnx"
|
||
Content-Type: application/octet-stream
|
||
|
||
<binary model file>
|
||
------WebKitFormBoundary...
|
||
Content-Disposition: form-data; name="ref_images[]"; filename="img_0.jpg"
|
||
Content-Type: image/jpeg
|
||
|
||
<binary image>
|
||
------WebKitFormBoundary...
|
||
Content-Disposition: form-data; name="user_id"
|
||
|
||
visionA-user-12345
|
||
------WebKitFormBoundary...
|
||
Content-Disposition: form-data; name="model_id"
|
||
|
||
1001
|
||
------WebKitFormBoundary...
|
||
Content-Disposition: form-data; name="version"
|
||
|
||
0001
|
||
------WebKitFormBoundary...
|
||
Content-Disposition: form-data; name="platform"
|
||
|
||
520
|
||
------WebKitFormBoundary...
|
||
Content-Disposition: form-data; name="enable_evaluate"
|
||
|
||
false
|
||
------WebKitFormBoundary...--
|
||
```
|
||
|
||
**Multer 設定**:
|
||
- `multer.memoryStorage()`(與既有 Web UI `POST /jobs` 一致)
|
||
- `limits.fileSize`: 500MB(`model` 單檔上限)
|
||
- `fields`: `model`(1 個 file)、`ref_images[]`(`maxCount: 100`)
|
||
|
||
**欄位定義**:
|
||
|
||
| 欄位 | 類型 | 位置 | 必填 | 驗證 |
|
||
|------|------|------|------|------|
|
||
| `model` | file | multipart file | ✅ | 副檔名 ∈ {`.onnx`, `.pt`, `.pth`, `.tflite`, `.h5`, `.pb`};大小 ≤ 500MB |
|
||
| `ref_images[]` | file[] | multipart file | ❌ | `image/*`;最多 100 張;與既有 Web UI 規則一致 |
|
||
| `user_id` | string | multipart field | ✅ | 1-128 字元,不含 `/`、`\`、`..`,VisionA 端決定格式 |
|
||
| `model_id` | string → int | multipart field | ✅ | 轉 int 後 1 ≤ x ≤ 65535 |
|
||
| `version` | string | multipart field | ✅ | 1-32 字元,建議數字字串 |
|
||
| `platform` | string | multipart field | ✅ | enum: `520`, `720`, `530`, `630`, `730` |
|
||
| `enable_evaluate` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
|
||
| `enable_sim_fp` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
|
||
| `enable_sim_fixed` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
|
||
| `enable_sim_hw` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
|
||
| `metadata` | string(JSON)| multipart field | ❌ | 若傳入,需為合法 JSON 物件字串;未來擴展用 |
|
||
|
||
**注意事項**:
|
||
- multipart 中所有 field value 都是字串,server 端需將 `'true'` / `'false'` → boolean,`model_id` → integer。
|
||
- 與既有 Web UI `POST /jobs` multipart 欄位完全對齊,`user_id` 是對外 API 新增的欄位(Web UI 不需要)。
|
||
- Validation 順序:先驗 OAuth token、再驗 multipart(避免未驗證就吃大檔)。實作上建議把 `requireAuth` middleware 放在 `multer` middleware 之前,這樣無效 token 會在 multer 開始 parse 前就被拒。
|
||
|
||
**Response 201 Created**:
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"status": "created",
|
||
"stage": "onnx",
|
||
"progress": 0,
|
||
"created_at": "2026-04-25T12:00:00Z",
|
||
"expires_at": "2026-05-02T12:00:00Z",
|
||
"user_id": "visionA-user-12345"
|
||
}
|
||
```
|
||
|
||
**錯誤回應**:
|
||
|
||
| 狀態 | error.code | 情境 |
|
||
|------|-----------|------|
|
||
| 400 | `validation_error` | 欄位缺漏或格式錯誤(`details.field` 列出問題欄位)|
|
||
| 400 | `invalid_multipart` | multipart parse 失敗、缺必要 file / field、副檔名不符 |
|
||
| 401 | `invalid_token` | JWT 無效 / 過期 / 缺 claim |
|
||
| 403 | `insufficient_scope` | token 缺 `converter:job.write`(`details.required_scope`)|
|
||
| 403 | `tenant_mismatch` | token 的 `tenant_id` 與 Converter 設定不符 |
|
||
| 409 | `user_has_active_job` | user_id 已有進行中 job(詳見 §1.5)|
|
||
| 413 | `file_too_large` | 上傳檔案超過 500MB(由 multer `LIMIT_FILE_SIZE` 觸發)|
|
||
| 500 | `misconfiguration` | `STORAGE_BACKEND !== 'minio'` 等 |
|
||
| 500 | `internal_error` | 其他 |
|
||
|
||
---
|
||
|
||
#### 1.4.3 `GET /api/v1/jobs/:id`
|
||
|
||
**Request**:
|
||
```http
|
||
GET /api/v1/jobs/550e8400-e29b-41d4-a716-446655440000
|
||
Authorization: Bearer <JWT>
|
||
If-None-Match: "etag-value" (optional)
|
||
```
|
||
|
||
**Response 200 OK**:
|
||
```json
|
||
{
|
||
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
"user_id": "visionA-user-12345",
|
||
"status": "running",
|
||
"stage": "bie",
|
||
"progress": 45,
|
||
"stage_progress": 60,
|
||
"created_at": "2026-04-25T12:00:00Z",
|
||
"updated_at": "2026-04-25T12:05:30Z",
|
||
"expires_at": "2026-05-02T12:00:00Z",
|
||
"stage_timings": {
|
||
"onnx": { "started_at": "2026-04-25T12:00:05Z", "completed_at": "2026-04-25T12:02:10Z" },
|
||
"bie": { "started_at": "2026-04-25T12:02:15Z", "completed_at": null },
|
||
"nef": null
|
||
},
|
||
"input": {
|
||
"filename": "model.onnx",
|
||
"object_key": "jobs/550e8400-e29b-41d4-a716-446655440000/input/model.onnx",
|
||
"size_bytes": 204800000,
|
||
"ref_images_count": 0
|
||
},
|
||
"result_object_keys": null,
|
||
"error": null,
|
||
"parameters": {
|
||
"model_id": 1001,
|
||
"version": "0001",
|
||
"platform": "520",
|
||
"enable_evaluate": false,
|
||
"enable_sim_fp": false,
|
||
"enable_sim_fixed": false,
|
||
"enable_sim_hw": false
|
||
},
|
||
"metadata": {
|
||
"source": "visionA-web",
|
||
"tags": ["experiment-001"]
|
||
},
|
||
"estimated_completion_at": null
|
||
}
|
||
```
|
||
|
||
**狀態機**(`status` 欄位):
|
||
|
||
- `created` — 剛建立,等第一階段開工
|
||
- `running` — 正在某個 stage(`stage` 欄位有值)
|
||
- `completed` — 全部完成(`result_object_keys` 有值,`stage=null`)
|
||
- `failed` — 失敗(`error` 有值)
|
||
|
||
**完成時的 `result_object_keys`**(在 Converter Bucket 的 key):
|
||
```json
|
||
"result_object_keys": {
|
||
"onnx": "jobs/{job_id}/output/out.onnx",
|
||
"bie": "jobs/{job_id}/output/out.bie",
|
||
"nef": "jobs/{job_id}/output/out.nef"
|
||
}
|
||
```
|
||
|
||
**失敗時的 `error`**:
|
||
```json
|
||
"error": {
|
||
"stage": "bie",
|
||
"code": "quantization_failed",
|
||
"message": "參考圖片不足或格式不符,BIE 量化階段失敗",
|
||
"details": { "raw": "..." }
|
||
}
|
||
```
|
||
|
||
**Response 304 Not Modified**:若 `If-None-Match` 吻合當前 ETag(ETag 建議為 `updated_at` 的 hash)。
|
||
|
||
**錯誤回應**:
|
||
|
||
| 狀態 | error.code | 情境 |
|
||
|------|-----------|------|
|
||
| 401/403 | 同上 | — |
|
||
| 404 | `job_not_found` | job 不存在,或不屬於呼叫 client_id(避免資訊洩露)|
|
||
|
||
---
|
||
|
||
#### 1.4.4 `GET /api/v1/jobs`(列表 / Recovery)
|
||
|
||
**Query 參數**:
|
||
|
||
| 參數 | 類型 | 必填 | 說明 |
|
||
|------|------|------|------|
|
||
| `user_id` | string | ❌ | 過濾 user_id(Recovery 必用)|
|
||
| `status` | string | ❌ | `in_progress`(= `created` ∪ `running`), `completed`, `failed`, `all`(預設 `all`)|
|
||
| `limit` | int | ❌ | 預設 20,上限 100 |
|
||
| `offset` | int | ❌ | 預設 0 |
|
||
| `created_after` | ISO 8601 | ❌ | 過濾 `created_at >= created_after` |
|
||
|
||
**Response 200**:
|
||
```json
|
||
{
|
||
"total": 2,
|
||
"limit": 20,
|
||
"offset": 0,
|
||
"items": [
|
||
{ /* 同 GET /jobs/:id 格式,但 items 為精簡版:可省 stage_timings.details、metadata */ }
|
||
]
|
||
}
|
||
```
|
||
|
||
**實作注意**:以 `user:{user_id}:jobs` Set 為索引,避免全掃 `KEYS job:*`(採納 Design 4.1.2 建議)。
|
||
|
||
---
|
||
|
||
#### 1.4.5 `POST /api/v1/jobs/:id/promote`
|
||
|
||
**Request**:
|
||
```http
|
||
POST /api/v1/jobs/550e8400-.../promote
|
||
Authorization: Bearer <JWT>
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"targets": [
|
||
{
|
||
"source": "nef",
|
||
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef"
|
||
},
|
||
{
|
||
"source": "bie",
|
||
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.bie"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**欄位定義**:
|
||
|
||
| 欄位 | 類型 | 必填 | 說明 |
|
||
|------|------|------|------|
|
||
| `targets` | array | ✅ | 要 promote 的檔案清單(至少 1 個)|
|
||
| `targets[].source` | string | ✅ | enum: `onnx`, `bie`, `nef` — 對應 job 輸出檔 |
|
||
| `targets[].target_object_key` | string | ✅ | File Access Agent 的目標 key(VisionA 決定命名)|
|
||
|
||
**Response 200 OK**:
|
||
```json
|
||
{
|
||
"job_id": "550e8400-...",
|
||
"promoted": [
|
||
{
|
||
"source": "nef",
|
||
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef",
|
||
"size_bytes": 10485760,
|
||
"file_access_agent_etag": "abc123",
|
||
"promoted_at": "2026-04-25T12:30:00Z"
|
||
},
|
||
{
|
||
"source": "bie",
|
||
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.bie",
|
||
"size_bytes": 5242880,
|
||
"file_access_agent_etag": "def456",
|
||
"promoted_at": "2026-04-25T12:30:02Z"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**錯誤回應**:
|
||
|
||
| 狀態 | error.code | 情境 |
|
||
|------|-----------|------|
|
||
| 400 | `validation_error` | targets 格式錯、source 非合法 stage |
|
||
| 404 | `job_not_found` | 同上 |
|
||
| 409 | `job_not_ready_for_promote` | `status != completed`(`details.current_status`)|
|
||
| 409 | `source_not_available` | job 沒產這個 stage 的結果(例如只跑 onnx 但要 promote nef)|
|
||
| 502 | `file_gateway_unavailable` | File Access Agent PUT 失敗 |
|
||
| 503 | `auth_service_unavailable` | 取 Converter 自己 token 失敗 |
|
||
|
||
**重試語意**:`promote` 是冪等的(同樣 target_object_key PUT 兩次結果一樣,File Access Agent 會覆蓋)。Converter Bucket 檔案在 7 天內保留,允許重試。
|
||
|
||
### 1.5 重要錯誤 payload 範例
|
||
|
||
#### `user_has_active_job`(採納 Design 建議)
|
||
|
||
```json
|
||
{
|
||
"error": {
|
||
"code": "user_has_active_job",
|
||
"message": "使用者目前已有進行中的轉檔任務",
|
||
"details": {
|
||
"active_job_id": "550e8400-...",
|
||
"active_job_status": "running",
|
||
"active_job_stage": "bie",
|
||
"active_job_progress": 45,
|
||
"active_job_created_at": "2026-04-25T12:00:00Z"
|
||
},
|
||
"request_id": "req-uuid"
|
||
}
|
||
}
|
||
```
|
||
|
||
#### `insufficient_scope`
|
||
|
||
```json
|
||
{
|
||
"error": {
|
||
"code": "insufficient_scope",
|
||
"message": "token 缺少必要權限",
|
||
"details": {
|
||
"required_scope": "converter:job.write",
|
||
"provided_scopes": ["converter:job.read"]
|
||
},
|
||
"request_id": "req-uuid"
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 2. Task Scheduler 改造
|
||
|
||
### 2.1 目錄結構建議
|
||
|
||
```
|
||
apps/task-scheduler/
|
||
├── server.js ← 既有,只作為 entry(初始化 + mount routes)
|
||
├── src/
|
||
│ ├── config.js ← 新:集中讀取所有 env(fail fast)
|
||
│ ├── redis.js ← 新:Redis client + helper
|
||
│ ├── auth/
|
||
│ │ ├── jwks.js ← 新:JWKS cache + JWT 驗證
|
||
│ │ ├── middleware.js ← 新:Express middleware(驗 token + scope)
|
||
│ │ └── oauthClient.js ← 新:Converter 作為 OAuth client(token cache)
|
||
│ ├── fileAccessAgent/
|
||
│ │ ├── client.js ← 新:File Access Agent HTTP client(僅 PUT,promote 用)
|
||
│ │ └── errors.js ← 新:錯誤翻譯
|
||
│ ├── routes/
|
||
│ │ ├── legacy.js ← 既有路由(/jobs, /jobs/:id, /jobs/:id/events, ...)
|
||
│ │ └── v1/
|
||
│ │ ├── index.js ← mount 新路由
|
||
│ │ ├── jobs.js ← POST/GET /api/v1/jobs, GET /:id
|
||
│ │ └── promote.js ← POST /api/v1/jobs/:id/promote
|
||
│ ├── services/
|
||
│ │ ├── jobService.js ← 新:封裝 job CRUD、user 索引、active job 檢查
|
||
│ │ └── doneListener.js ← 既有 listenDoneQueue 抽成 module
|
||
│ ├── middleware/
|
||
│ │ ├── errorHandler.js ← 新:統一錯誤格式
|
||
│ │ └── requestId.js ← 新:X-Request-Id
|
||
│ └── utils/
|
||
│ └── logger.js ← 新:結構化 log
|
||
├── package.json
|
||
└── Dockerfile
|
||
```
|
||
|
||
**實作原則**:保守重構,既有功能不改語意,只「移動 + 抽象」。
|
||
|
||
### 2.2 auth middleware(T1)
|
||
|
||
```javascript
|
||
// src/auth/middleware.js 骨架
|
||
|
||
const { verifyJwt, InsufficientScopeError } = require('./jwks');
|
||
const config = require('../config');
|
||
|
||
function requireAuth(requiredScope) {
|
||
return async (req, res, next) => {
|
||
try {
|
||
const authHeader = req.headers.authorization || '';
|
||
const match = authHeader.match(/^Bearer\s+(.+)$/);
|
||
if (!match) {
|
||
return sendError(res, 401, 'invalid_token', 'Missing bearer token', req);
|
||
}
|
||
|
||
const token = match[1];
|
||
const claims = await verifyJwt(token, {
|
||
issuer: config.memberCenter.issuer,
|
||
audience: config.converter.audience,
|
||
clockSkew: 60,
|
||
});
|
||
|
||
// scope 檢查
|
||
const scopes = (claims.scope || '').split(' ').filter(Boolean);
|
||
if (!scopes.includes(requiredScope)) {
|
||
return sendError(res, 403, 'insufficient_scope', 'Missing required scope', req, {
|
||
required_scope: requiredScope,
|
||
provided_scopes: scopes,
|
||
});
|
||
}
|
||
|
||
// tenant 檢查(可選)
|
||
if (config.converter.tenantId && claims.tenant_id) {
|
||
if (claims.tenant_id !== config.converter.tenantId) {
|
||
return sendError(res, 403, 'tenant_mismatch', 'Tenant mismatch', req);
|
||
}
|
||
}
|
||
|
||
// 記錄 claim 到 req 供下游使用
|
||
req.auth = {
|
||
clientId: claims.client_id || claims.sub,
|
||
tenantId: claims.tenant_id || null,
|
||
scopes,
|
||
tokenClaims: claims,
|
||
};
|
||
|
||
next();
|
||
} catch (err) {
|
||
// 具體錯誤類型處理
|
||
if (err.code === 'ERR_JWT_EXPIRED') {
|
||
return sendError(res, 401, 'token_expired', 'Token expired', req);
|
||
}
|
||
if (err.code === 'ERR_JWKS_NO_MATCHING_KEY') {
|
||
return sendError(res, 401, 'invalid_token', 'Signature verification failed', req);
|
||
}
|
||
return sendError(res, 401, 'invalid_token', 'Token verification failed', req);
|
||
}
|
||
};
|
||
}
|
||
```
|
||
|
||
### 2.3 JWKS cache(T1)
|
||
|
||
採用 `jose` npm 套件的 `createRemoteJWKSet`,內建 TTL cache 與 stale-while-revalidate。
|
||
|
||
```javascript
|
||
// src/auth/jwks.js
|
||
|
||
const { createRemoteJWKSet, jwtVerify } = require('jose');
|
||
const config = require('../config');
|
||
|
||
const jwks = createRemoteJWKSet(new URL(config.memberCenter.jwksUrl), {
|
||
cacheMaxAge: 10 * 60 * 1000, // 10 min
|
||
cooldownDuration: 30 * 1000, // 30s 內不重複 refresh
|
||
});
|
||
|
||
async function verifyJwt(token, { issuer, audience, clockSkew }) {
|
||
const { payload } = await jwtVerify(token, jwks, {
|
||
issuer,
|
||
audience,
|
||
clockTolerance: clockSkew,
|
||
});
|
||
return payload;
|
||
}
|
||
|
||
module.exports = { verifyJwt };
|
||
```
|
||
|
||
### 2.4 OAuth client(T2)
|
||
|
||
```javascript
|
||
// src/auth/oauthClient.js
|
||
|
||
const config = require('../config');
|
||
|
||
class OAuthClient {
|
||
constructor() {
|
||
this._cache = new Map(); // scope-key -> { token, expiresAt }
|
||
}
|
||
|
||
async getToken(scope) {
|
||
const key = scope;
|
||
const cached = this._cache.get(key);
|
||
if (cached && cached.expiresAt - 60000 > Date.now()) {
|
||
return cached.token;
|
||
}
|
||
|
||
const params = new URLSearchParams({
|
||
grant_type: 'client_credentials',
|
||
client_id: config.converter.clientId,
|
||
client_secret: config.converter.clientSecret,
|
||
scope,
|
||
audience: config.fileAccessAgent.audience,
|
||
});
|
||
|
||
const res = await fetch(config.memberCenter.tokenUrl, {
|
||
method: 'POST',
|
||
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
|
||
body: params.toString(),
|
||
});
|
||
if (!res.ok) {
|
||
throw new Error(`token endpoint ${res.status}`);
|
||
}
|
||
const data = await res.json();
|
||
const entry = {
|
||
token: data.access_token,
|
||
expiresAt: Date.now() + (data.expires_in || 3600) * 1000,
|
||
};
|
||
this._cache.set(key, entry);
|
||
return entry.token;
|
||
}
|
||
|
||
invalidate(scope) {
|
||
this._cache.delete(scope);
|
||
}
|
||
}
|
||
|
||
module.exports = new OAuthClient();
|
||
```
|
||
|
||
**錯誤處理**:呼叫端 catch 到失敗時回 503 `auth_service_unavailable`。
|
||
|
||
### 2.5 File Access Agent client(T6)
|
||
|
||
Phase 1 Converter 只在 `promote` 階段呼叫 File Access Agent(寫入結果檔),**不需要 HEAD / GET**。
|
||
|
||
```javascript
|
||
// src/fileAccessAgent/client.js
|
||
|
||
const config = require('../config');
|
||
const oauthClient = require('../auth/oauthClient');
|
||
|
||
async function putFile(objectKey, stream, { contentType, contentLength }) {
|
||
const token = await oauthClient.getToken('files:upload.write');
|
||
const res = await fetch(
|
||
`${config.fileAccessAgent.baseUrl}/files/${encodeURI(objectKey)}`,
|
||
{
|
||
method: 'PUT',
|
||
headers: {
|
||
Authorization: `Bearer ${token}`,
|
||
'Content-Type': contentType,
|
||
'Content-Length': String(contentLength),
|
||
},
|
||
body: stream,
|
||
duplex: 'half', // Node 18 stream body 需要
|
||
}
|
||
);
|
||
if (!res.ok) throw new FAAError(res.status, await res.text());
|
||
return await res.json();
|
||
}
|
||
|
||
module.exports = { putFile };
|
||
```
|
||
|
||
**大檔 stream 處理(promote 用)**:從 MinIO `GetObjectCommand` 的 Body(stream)直接 pipe 到 fetch PUT body,確保不把整個結果檔載入記憶體。`POST /api/v1/jobs/:id/promote` 流程:
|
||
|
||
```
|
||
MinIO GetObjectCommand.Body (stream)
|
||
↓ pipe
|
||
fetch PUT body (stream, duplex: 'half')
|
||
↓
|
||
File Access Agent
|
||
```
|
||
|
||
### 2.6 新路由群(T3)
|
||
|
||
```javascript
|
||
// src/routes/v1/index.js
|
||
|
||
const express = require('express');
|
||
const jobsRouter = require('./jobs');
|
||
const promoteRouter = require('./promote');
|
||
const { requireAuth } = require('../../auth/middleware');
|
||
const { apiV1RateLimit } = require('../../middleware/rateLimit');
|
||
|
||
const router = express.Router();
|
||
|
||
router.use(apiV1RateLimit);
|
||
|
||
router.post('/jobs', requireAuth('converter:job.write'), jobsRouter.create);
|
||
router.get('/jobs', requireAuth('converter:job.read'), jobsRouter.list);
|
||
router.get('/jobs/:id', requireAuth('converter:job.read'), jobsRouter.get);
|
||
router.post('/jobs/:id/promote', requireAuth('converter:job.write'), promoteRouter.promote);
|
||
|
||
// Phase 2 預留
|
||
router.post('/jobs/:id/download-tokens', requireAuth('converter:job.read'), (req, res) => {
|
||
res.status(501).json({
|
||
error: { code: 'not_implemented', message: 'Phase 2 功能,待 Member Center 補完', request_id: req.requestId },
|
||
});
|
||
});
|
||
router.delete('/jobs/:id', requireAuth('converter:job.write'), (req, res) => {
|
||
res.status(501).json({
|
||
error: { code: 'not_implemented', message: '尚未實作', request_id: req.requestId },
|
||
});
|
||
});
|
||
|
||
module.exports = router;
|
||
```
|
||
|
||
### 2.7 Redis 資料模型改造
|
||
|
||
#### 2.7.1 Job record(JSON,key = `job:{id}`)新增欄位
|
||
|
||
```jsonc
|
||
{
|
||
// 既有欄位
|
||
"job_id": "uuid",
|
||
"created_at": "...",
|
||
"updated_at": "...",
|
||
"status": "ONNX | BIE | NEF | COMPLETED | FAILED", // 注意:舊 Web UI 仍用大寫狀態
|
||
"stage": "onnx | bie | nef | null",
|
||
"progress": 0,
|
||
"parameters": { /* model_id, version, platform, options */ },
|
||
"output": { "bie_path": null, "nef_path": null },
|
||
"error": null,
|
||
|
||
// 新增欄位(Phase 1)
|
||
"origin": "api | web", // 來自新 API 或舊 Web UI
|
||
"user_id": "visionA-user-12345",
|
||
"tenant_id": "uuid-or-null",
|
||
"created_by_client_id": "kneron_converter_client_abc",
|
||
"input": {
|
||
"filename": "model.onnx", // multipart 原始檔名
|
||
"object_key": "jobs/{job_id}/input/model.onnx", // Converter Bucket 內的 key
|
||
"size_bytes": 204800000,
|
||
"ref_images_count": 0
|
||
},
|
||
"stage_timings": {
|
||
"onnx": { "started_at": "...", "completed_at": "..." },
|
||
"bie": { "started_at": "...", "completed_at": null },
|
||
"nef": null
|
||
},
|
||
"stage_progress": 0, // 0-100,當前 stage 內進度(Worker 推上來)
|
||
"expires_at": "2026-05-02T12:00:00Z",
|
||
"metadata": {}
|
||
}
|
||
```
|
||
|
||
**關於 `status` 大小寫**:既有 Web UI 會讀大寫(`ONNX`, `COMPLETED` 等)。新 API 對外回傳時需要**映射為小寫語意化狀態**(`created`, `running`, `completed`, `failed`)。映射表:
|
||
|
||
| 內部 status | 對外 `status` + `stage` |
|
||
|------------|----------------------|
|
||
| `ONNX` | `running` + stage=`onnx` |
|
||
| `BIE` | `running` + stage=`bie` |
|
||
| `NEF` | `running` + stage=`nef` |
|
||
| `COMPLETED` | `completed` + stage=`null` |
|
||
| `FAILED` | `failed` + stage=<失敗時的 stage> |
|
||
|
||
**注意**:既有 Scheduler `advanceJob` 把初始狀態設 `ONNX`,不區分「created」。新 API 建 job 後、onnx worker 接到前,依然是 `ONNX`。此時對外狀態應回 `created`(stage=onnx 但 stage_timings.onnx.started_at 為 null)。**實作上以 `stage_timings.onnx.started_at == null` 判斷是 `created` 還是 `running`。**
|
||
|
||
#### 2.7.2 User 索引(新)
|
||
|
||
| Key | 類型 | 用途 | TTL |
|
||
|-----|------|------|-----|
|
||
| `user:{user_id}:jobs` | Set | 該 user 所有 job_id(不分狀態) | 每次寫入時 `EXPIRE 7d` |
|
||
| `user:{user_id}:active_job` | String | 當前 in-progress job_id(= `created` 或 `running`)| 隨 job 結束刪除 |
|
||
|
||
**寫入時機**(原子性用 MULTI 包):
|
||
|
||
```
|
||
建立 job:
|
||
MULTI
|
||
SET job:{id} {...}
|
||
SADD user:{user_id}:jobs {id}
|
||
EXPIRE user:{user_id}:jobs 604800
|
||
SETNX user:{user_id}:active_job {id} # NX 是同使用者鎖的關鍵
|
||
EXEC
|
||
|
||
若 SETNX 回 0 → 衝突,回滾(DEL job:{id}、SREM user:{user_id}:jobs {id}),回 409
|
||
若 SETNX 回 1 → 成功
|
||
|
||
完成 / 失敗時:
|
||
MULTI
|
||
SET job:{id} {...}
|
||
DEL user:{user_id}:active_job
|
||
EXEC
|
||
|
||
僅在 active_job 的 value 等於當前 job_id 時才 DEL(用 WATCH 或 Lua script 確保)
|
||
```
|
||
|
||
**Lua script(建議)**:確保「檢查 + 設 active + 寫 job」的原子性。
|
||
|
||
```lua
|
||
-- claim_active_job.lua
|
||
-- KEYS[1] = user:{user_id}:active_job
|
||
-- KEYS[2] = job:{job_id}
|
||
-- KEYS[3] = user:{user_id}:jobs
|
||
-- ARGV[1] = job_id
|
||
-- ARGV[2] = job_json
|
||
-- ARGV[3] = ttl_seconds
|
||
|
||
if redis.call('EXISTS', KEYS[1]) == 1 then
|
||
return {'conflict', redis.call('GET', KEYS[1])}
|
||
end
|
||
redis.call('SET', KEYS[1], ARGV[1])
|
||
redis.call('SET', KEYS[2], ARGV[2])
|
||
redis.call('SADD', KEYS[3], ARGV[1])
|
||
redis.call('EXPIRE', KEYS[3], tonumber(ARGV[3]))
|
||
return {'ok'}
|
||
```
|
||
|
||
#### 2.7.3 避免 `KEYS *` 的實作
|
||
|
||
**錯誤做法**(既有 code 有用,但新 API 不用):
|
||
```javascript
|
||
const keys = await redis.keys('job:*'); // O(N) 阻塞 Redis
|
||
```
|
||
|
||
**新 API 列表查詢**:
|
||
```javascript
|
||
async function listJobsByUser(userId, { status, limit, offset }) {
|
||
const ids = await redis.smembers(`user:${userId}:jobs`);
|
||
const pipeline = redis.pipeline();
|
||
for (const id of ids) pipeline.get(`job:${id}`);
|
||
const results = await pipeline.exec();
|
||
let jobs = results.map(([err, raw]) => JSON.parse(raw)).filter(Boolean);
|
||
// status 過濾
|
||
if (status === 'in_progress') {
|
||
jobs = jobs.filter(j => ['created', 'running'].includes(mapStatus(j)));
|
||
} else if (status && status !== 'all') {
|
||
jobs = jobs.filter(j => mapStatus(j) === status);
|
||
}
|
||
// 排序、分頁
|
||
jobs.sort((a, b) => new Date(b.created_at) - new Date(a.created_at));
|
||
return { total: jobs.length, items: jobs.slice(offset, offset + limit) };
|
||
}
|
||
```
|
||
|
||
### 2.8 POST /api/v1/jobs 流程(T4)
|
||
|
||
```
|
||
1. requireAuth('converter:job.write') — middleware 驗 token(放在 multer 之前,避免未驗證就吃大檔)
|
||
2. multer 中介層處理 multipart(memoryStorage,fileSize=500MB):
|
||
- req.files.model[0](required)
|
||
- req.files['ref_images[]'] / req.files.ref_images(optional, maxCount=100)
|
||
- req.body.user_id / model_id / version / platform / enable_*
|
||
├── LIMIT_FILE_SIZE → 413 file_too_large
|
||
├── multer 其他錯誤 → 400 invalid_multipart
|
||
└── ok → 繼續
|
||
3. 驗證 fields(joi / zod / 手寫):
|
||
- user_id, model_id, version, platform 必填
|
||
- enable_* 轉 boolean
|
||
- model 檔副檔名白名單
|
||
├── 失敗 → 400 validation_error(details.field)
|
||
4. 檢查 STORAGE_BACKEND === 'minio'
|
||
├── 否 → 500 misconfiguration
|
||
5. 生成 job_id(UUIDv4)
|
||
6. 嘗試 claim_active_job Lua script(見 §2.7.2)
|
||
├── conflict → 回 409 user_has_active_job + 當前 active job 詳情
|
||
└── ok → 繼續
|
||
7. 同步寫入 MinIO(Converter Bucket):
|
||
- jobs/{job_id}/input/{sanitized_model_filename} ← req.files.model[0].buffer
|
||
- jobs/{job_id}/ref_images/{index}_{sanitized_filename} ← 每個 ref_image.buffer
|
||
- 失敗 → 回滾(DEL job:{id}, DEL user:{user_id}:active_job, SREM user:{user_id}:jobs {id}),回 502 `storage_unavailable`
|
||
8. 更新 job record(補 input.object_key、size_bytes、ref_images_count、stage_timings.onnx.started_at=now)
|
||
9. enqueueStage('onnx', job)
|
||
10. 回 201 + { job_id, status: 'created', ... }
|
||
```
|
||
|
||
**關鍵**:
|
||
- Auth middleware 必須在 multer 之前,避免未驗證就 parse 500MB 大檔
|
||
- 第 7 步若任一檔案寫 MinIO 失敗必須回滾,避免 Redis 有 job 但 MinIO 沒檔
|
||
- `claim_active_job` 之後才寫 MinIO,避免拿到鎖但 MinIO 失敗時還要補回滾 MinIO(順序:驗證 → 鎖 → 寫檔 → enqueue)
|
||
|
||
**time complexity**:SLA p95 < 5s(200MB @ 50MB/s ≈ 4s multipart + 1s MinIO write)。500MB 檔案 ~12s(見 design-doc §6.1)。
|
||
|
||
### 2.9 GET /api/v1/jobs/:id 流程(T5)
|
||
|
||
```
|
||
1. requireAuth('converter:job.read')
|
||
2. 讀 job:{id}
|
||
3. 若不存在,回 404 job_not_found
|
||
4. 若 job.created_by_client_id !== req.auth.clientId → 回 404(不洩露)
|
||
5. 計算 ETag = hash(job.updated_at),若 If-None-Match 吻合 → 304
|
||
6. 映射內部 status → 對外 status + stage
|
||
7. 回 200 + 序列化 response
|
||
```
|
||
|
||
### 2.10 promote 流程(T6)
|
||
|
||
```
|
||
1. requireAuth('converter:job.write')
|
||
2. 驗 body(targets 格式)
|
||
3. 讀 job:{id}(+ client 隔離檢查)
|
||
4. 若 status != 'completed' → 409 job_not_ready_for_promote
|
||
5. 對每個 target:
|
||
a. 從 Converter Bucket 讀結果檔(stream)
|
||
b. faa.putFile(target.target_object_key, stream, ...)
|
||
c. 記錄 promoted_at / etag / size
|
||
6. 全部成功 → 回 200 + promoted[]
|
||
7. 部分失敗 → 回 502,details 標注哪些成功 / 失敗
|
||
```
|
||
|
||
**冪等性**:promote 是冪等的(File Access Agent PUT 會覆蓋),可以重試。
|
||
|
||
### 2.11 Done listener 的改造
|
||
|
||
既有 `listenDoneQueue` 收到 worker done 事件時呼叫 `advanceJob`。新改動:
|
||
|
||
- `advanceJob` 在 status 變化時同步更新 `stage_timings`
|
||
- 完成時自動 `DEL user:{user_id}:active_job`(Lua script 保證原子性)
|
||
- 失敗時同上
|
||
|
||
### 2.12 /health 升級
|
||
|
||
既有 `/health` 只檢查 Redis。新版加上:
|
||
- Member Center reachability(`GET /.well-known/openid-configuration`,背景 30s 一次,cache 結果)
|
||
- File Access Agent reachability(`GET /health`,同上)
|
||
- 回應 503 if 任一 critical dependency 異常
|
||
|
||
---
|
||
|
||
## 3. Worker 改造
|
||
|
||
**Phase 1 決定:Worker 不大改。**
|
||
|
||
既有 `services/workers/s3_storage.py` 已支援從 MinIO 讀寫。Worker 只要看到 input 在 `jobs/{job_id}/input/` 路徑就開工,不需要知道 File Access Agent 的存在。
|
||
|
||
唯一需要改動的:
|
||
|
||
1. **stage_progress 回報**(可選):Worker 處理過程中若能回報階段內進度(例如 30%、60%),可透過一個新的 Redis Stream `queue:progress` 推給 Scheduler。Phase 1 可先全回 0 或 100,後續增強。
|
||
2. **`stage_timings` 的 started_at**:Worker 接到任務時用既有 done event 前,先寫一個 `stage_started` event。或者更簡單的做法:Scheduler 在 `enqueueStage` 時寫 `stage_timings.{stage}.started_at = now`。**建議採後者**,Worker 不動。
|
||
|
||
---
|
||
|
||
## 4. 資料模型與索引
|
||
|
||
### 4.1 為什麼不用 PostgreSQL
|
||
|
||
- Phase 1 的資料模式簡單:job 是 state machine,user index 是 key-value
|
||
- 既有哲學是「Crash 即 Reset」,PG 會引入反向的持久化語意,反而變複雜
|
||
- Redis Set 做 user 索引足以應付預期量(per user < 10 jobs / 7 天)
|
||
- 未來若要跨 Crash recovery / 多 instance HA,再評估 PG
|
||
|
||
### 4.2 Redis 記憶體預估
|
||
|
||
- 每個 job record 約 2-4 KB(含 stage_timings 等)
|
||
- 每個 user index Set 每個元素 < 40 bytes
|
||
- 1000 並發 user × 10 jobs = 10k job record ≈ 40 MB(Redis 輕鬆)
|
||
- Converter Bucket lifecycle 7 天,Redis 也跟著 TTL 7 天,記憶體上限可控
|
||
|
||
---
|
||
|
||
## 5. OAuth 整合細節
|
||
|
||
### 5.1 token 驗證(resource server 身分)
|
||
|
||
| Claim | 檢查 |
|
||
|-------|------|
|
||
| `iss` | 等於 `MC_ISSUER` |
|
||
| `aud` | 包含 `KNERON_CONVERTER_AUDIENCE`(支援 array 或 string)|
|
||
| `exp` | 未過期(含 60s clock skew)|
|
||
| `nbf` | 若有,已到 |
|
||
| `scope` | 空白分隔,包含 endpoint 要求的 scope |
|
||
| `client_id` | 必須有(記錄用)|
|
||
| `tenant_id` | 若有,等於 `CONVERTER_TENANT_ID`(Phase 1 可先 warn-only)|
|
||
|
||
**JWKS 快取**:`jose.createRemoteJWKSet` 內建,TTL 10min,30s cooldown。
|
||
|
||
### 5.2 Converter 當 OAuth Client
|
||
|
||
- `client_credentials` grant
|
||
- Phase 1 只需要一個 scope:`files:upload.write`(`aud=file_access_api`),僅 `promote` 時呼叫
|
||
- Cache key = scope(未來擴充時若新增 scope,自動 per-scope cache)
|
||
- expires_in - 60s 時主動 refresh
|
||
- 失敗時 catch,轉 503 `auth_service_unavailable`
|
||
|
||
### 5.3 Member Center 離線的影響
|
||
|
||
| 場景 | 影響 | 緩解 |
|
||
|------|------|------|
|
||
| JWKS fetch 失敗 | 新 kid 無法驗證 | cache 內還有舊 kid 的 key,舊 token 可過;新 token 會失敗 |
|
||
| token endpoint 失敗 | Converter 無法取新 token 打 File Access Agent(僅 promote 用)| cache 內 token 有效期內無影響;過期後 promote 會失敗 → 503。`POST /api/v1/jobs` 建 job 不受影響(只驗他人 token,不取自己 token)|
|
||
| discovery 失敗 | health check 標示 unhealthy | K8s / Docker 重啟不解決,需人工介入 |
|
||
|
||
---
|
||
|
||
## 6. File Access Agent 整合
|
||
|
||
### 6.1 Object key 命名約定(建議)
|
||
|
||
| 用途 | 建議命名 | 說明 |
|
||
|------|---------|------|
|
||
| promote 結果到模型庫(File Access Agent)| `visionA/models/{user_id}/{model_id}/v{version}/{filename}` | VisionA 決定 target_object_key(Converter 不強制命名規則)|
|
||
| Converter Bucket 內部(原始模型 input)| `jobs/{job_id}/input/{filename}` | Converter 自己管,multipart 上傳後寫入 |
|
||
| Converter Bucket 內部(參考圖片)| `jobs/{job_id}/ref_images/{index}_{filename}` | Converter 自己管 |
|
||
| Converter Bucket 內部(結果檔)| `jobs/{job_id}/output/{filename}` | Converter 自己管 |
|
||
|
||
**約定**:
|
||
- `target_object_key`(promote 目標)的命名規則由 VisionA 定義,Converter 只做基本 sanity check(不能有 `..`、反斜線)。
|
||
- Converter Bucket 內部 object key 由 Converter 控制,外部看不到也不需對齊。
|
||
- Phase 1 不涉及 File Access Agent 上原始模型的 object key,該情境已不存在(原始模型直接 multipart 到 Converter)。
|
||
|
||
### 6.2 HTTP headers 一覽
|
||
|
||
| Request | Headers |
|
||
|---------|---------|
|
||
| PUT /files/{key}(promote 用)| `Authorization: Bearer <S2S JWT files:upload.write>`, `Content-Type`, `Content-Length` |
|
||
|
||
**注意**:Phase 1 Converter 只對 File Access Agent 發 `PUT` 請求(promote 結果檔),不需要 HEAD / GET。
|
||
|
||
### 6.3 失敗重試策略(僅 PUT /files/{key})
|
||
|
||
| 錯誤 | Converter 行為 |
|
||
|------|--------------|
|
||
| 4xx(client error)| 不重試,直接回對應的 4xx 給 visionA-backend(例如 target_object_key 不合法)|
|
||
| 401(token 失效)| 強制 `oauthClient.invalidate('files:upload.write')`,重取 token 重試一次;仍失敗 → 503 `auth_service_unavailable` |
|
||
| 5xx(server error)| 重試最多 2 次(exponential backoff 500ms / 2000ms);全失敗 → 502 `file_gateway_unavailable` |
|
||
| network timeout | 同 5xx |
|
||
|
||
### 6.4 Timeout
|
||
|
||
- PUT /files/{key}:依檔案大小動態,預設 300s(500MB @ 最壞 5MB/s);由 `PROMOTE_TIMEOUT_MS` env 控制
|
||
|
||
### 6.5 大檔 stream
|
||
|
||
- 使用 Node 18 原生 `fetch` + `body: ReadableStream`
|
||
- `duplex: 'half'` 旗標必要(Node 18.17+)
|
||
- 從 MinIO GetObjectCommand 的 Body(stream)直接 pipe 到 fetch PUT body
|
||
- 不做記憶體緩衝
|
||
|
||
---
|
||
|
||
## 7. 部署架構
|
||
|
||
### 7.1 Nginx 設定(雙 vhost)
|
||
|
||
```nginx
|
||
# /etc/nginx/conf.d/converter.conf
|
||
|
||
# Upstream
|
||
upstream scheduler_upstream {
|
||
server scheduler:4000;
|
||
keepalive 32;
|
||
}
|
||
|
||
# Public vhost(對公網,端口 443)
|
||
server {
|
||
listen 443 ssl http2;
|
||
server_name converter.innovedus.com;
|
||
|
||
ssl_certificate /etc/nginx/certs/fullchain.pem;
|
||
ssl_certificate_key /etc/nginx/certs/privkey.pem;
|
||
|
||
# 只 proxy /api/v1/*
|
||
location /api/v1/ {
|
||
proxy_pass http://scheduler_upstream;
|
||
proxy_set_header Host $host;
|
||
proxy_set_header X-Real-IP $remote_addr;
|
||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||
proxy_set_header X-Forwarded-Proto $scheme;
|
||
proxy_request_buffering off; # 大檔 stream
|
||
proxy_read_timeout 300s;
|
||
client_max_body_size 600M; # 容許略大於 500MB 的 multipart 上限(POST /api/v1/jobs 原始模型上傳)
|
||
}
|
||
|
||
# /health 可公開
|
||
location = /health {
|
||
proxy_pass http://scheduler_upstream;
|
||
}
|
||
|
||
# 其他路徑 404
|
||
location / {
|
||
return 404 '{"error":{"code":"not_found","message":"Not found"}}';
|
||
default_type application/json;
|
||
}
|
||
}
|
||
|
||
# Internal vhost(僅內網 bind,端口 80 綁內部 interface)
|
||
server {
|
||
listen 10.0.0.1:80; # 內部 IP,不對外
|
||
server_name converter-internal.innovedus.com;
|
||
|
||
# Web UI / 舊工具走的路徑
|
||
location /jobs {
|
||
proxy_pass http://scheduler_upstream;
|
||
proxy_http_version 1.1;
|
||
proxy_set_header Host $host;
|
||
proxy_buffering off; # SSE 需要
|
||
}
|
||
|
||
location /queues/stats {
|
||
proxy_pass http://scheduler_upstream;
|
||
}
|
||
|
||
# Web UI 靜態資源
|
||
location / {
|
||
proxy_pass http://web:3000;
|
||
}
|
||
}
|
||
```
|
||
|
||
### 7.2 docker-compose.yml 變更
|
||
|
||
```yaml
|
||
services:
|
||
scheduler:
|
||
environment:
|
||
# 既有
|
||
- PORT=4000
|
||
- REDIS_URL=redis://redis:6379
|
||
- STORAGE_BACKEND=minio
|
||
# ... MinIO 相關
|
||
# 新增(Phase 1)
|
||
- MC_ISSUER=${MC_ISSUER}
|
||
- MC_JWKS_URL=${MC_JWKS_URL}
|
||
- MC_TOKEN_URL=${MC_TOKEN_URL}
|
||
- KNERON_CONVERTER_AUDIENCE=${KNERON_CONVERTER_AUDIENCE:-kneron_converter_api}
|
||
- KNERON_CONVERTER_CLIENT_ID=${KNERON_CONVERTER_CLIENT_ID}
|
||
- KNERON_CONVERTER_CLIENT_SECRET=${KNERON_CONVERTER_CLIENT_SECRET}
|
||
- FILE_ACCESS_AGENT_BASE_URL=${FILE_ACCESS_AGENT_BASE_URL}
|
||
- FILE_ACCESS_AGENT_AUDIENCE=${FILE_ACCESS_AGENT_AUDIENCE:-file_access_api}
|
||
- CONVERTER_TENANT_ID=${CONVERTER_TENANT_ID:-}
|
||
- CONVERTER_SCOPE_WRITE=${CONVERTER_SCOPE_WRITE:-converter:job.write}
|
||
- CONVERTER_SCOPE_READ=${CONVERTER_SCOPE_READ:-converter:job.read}
|
||
- API_V1_RATE_LIMIT_WINDOW_MS=${API_V1_RATE_LIMIT_WINDOW_MS:-300000}
|
||
- API_V1_RATE_LIMIT_MAX=${API_V1_RATE_LIMIT_MAX:-300}
|
||
- NODE_ENV=${NODE_ENV:-development}
|
||
```
|
||
|
||
### 7.3 `.env.example` 新增
|
||
|
||
```bash
|
||
# === OAuth (Member Center) ===
|
||
MC_ISSUER=https://auth.innovedus.com
|
||
MC_JWKS_URL=https://auth.innovedus.com/.well-known/jwks
|
||
MC_TOKEN_URL=https://auth.innovedus.com/oauth/token
|
||
|
||
# === Converter identity (Resource Server) ===
|
||
KNERON_CONVERTER_AUDIENCE=kneron_converter_api
|
||
|
||
# === Converter identity (OAuth Client,呼叫 File Access Agent 用) ===
|
||
KNERON_CONVERTER_CLIENT_ID=kneron_converter
|
||
KNERON_CONVERTER_CLIENT_SECRET=change-me
|
||
CONVERTER_TENANT_ID=
|
||
|
||
# === File Access Agent ===
|
||
FILE_ACCESS_AGENT_BASE_URL=https://files.nas.internal
|
||
FILE_ACCESS_AGENT_AUDIENCE=file_access_api
|
||
|
||
# === Scope 命名(可配置以防 Member Center owner 要求不同名稱)===
|
||
CONVERTER_SCOPE_WRITE=converter:job.write
|
||
CONVERTER_SCOPE_READ=converter:job.read
|
||
|
||
# === Rate Limit ===
|
||
API_V1_RATE_LIMIT_WINDOW_MS=300000
|
||
API_V1_RATE_LIMIT_MAX=300
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Scope 設計總表(給跨團隊對齊用)
|
||
|
||
### 8.1 Converter 作為 Resource Server(接收端)
|
||
|
||
| Scope | 用途 | 被誰取 |
|
||
|-------|------|--------|
|
||
| `converter:job.write` | 建 job、promote | visionA-backend |
|
||
| `converter:job.read` | 查 job | visionA-backend |
|
||
| (未來)`converter:admin.read` | 跨 client 查 job | 內部監控用 |
|
||
|
||
### 8.2 Converter 作為 OAuth Client(發起端)
|
||
|
||
| Scope | 用途 | 在哪裡用 |
|
||
|-------|------|---------|
|
||
| `files:upload.write` | PUT File Access Agent | promote 結果檔到 NAS 模型庫 |
|
||
|
||
**Phase 1 僅需上述一個 scope。** Converter 完全不從 File Access Agent 讀取任何東西(原始模型已改為 visionA-backend 直接 multipart 上傳 Converter),因此不需要 `files:download.read` / `files:metadata.read`。
|
||
|
||
### 8.3 Member Center 需要做的事(跨團隊協調,對應 progress.md 未解決問題)
|
||
|
||
1. 新增 resource audience `kneron_converter_api`
|
||
2. 新增 OAuth client `kneron_converter`(供 Converter 自己用,grant=client_credentials)
|
||
3. 為 visionA-backend 的 client 加上 `converter:job.write`、`converter:job.read` scope 授權
|
||
4. 為 `kneron_converter` client 加上 `files:upload.write` scope 授權(**僅此一個,用於 promote**)
|
||
5. 確認 `tenant_id` claim 是否在 S2S token 中可用
|
||
6. (Phase 2)實作 `POST /file-access/download-tokens`
|
||
|
||
---
|
||
|
||
## 9. 配置管理(完整環境變數清單)
|
||
|
||
| 變數 | 必填 | 預設 | 說明 |
|
||
|------|------|------|------|
|
||
| `PORT` | ❌ | `4000` | Scheduler listen port |
|
||
| `NODE_ENV` | ❌ | `development` | Node 環境 |
|
||
| `REDIS_URL` | ✅ | `redis://redis:6379` | Redis 連線 |
|
||
| `JOB_DATA_DIR` | ❌ | `/data/jobs` | 舊 local 模式路徑 |
|
||
| `FRONTEND_URL` | ❌ | `http://localhost:3000` | CORS |
|
||
| `STORAGE_BACKEND` | ❌ | `local` | `local` / `minio` |
|
||
| `MINIO_*` | 依 STORAGE_BACKEND | — | 既有 MinIO 參數 |
|
||
| **新增(Phase 1)**| | | |
|
||
| `MC_ISSUER` | ✅ | — | Member Center issuer URL |
|
||
| `MC_JWKS_URL` | ✅ | — | JWKS endpoint |
|
||
| `MC_TOKEN_URL` | ✅ | — | token endpoint |
|
||
| `KNERON_CONVERTER_AUDIENCE` | ✅ | `kneron_converter_api` | 接受的 aud |
|
||
| `KNERON_CONVERTER_CLIENT_ID` | ✅ | — | Converter 作為 client |
|
||
| `KNERON_CONVERTER_CLIENT_SECRET` | ✅ | — | 嚴禁進 Git |
|
||
| `FILE_ACCESS_AGENT_BASE_URL` | ✅ | — | File Access Agent URL(僅 promote 使用)|
|
||
| `FILE_ACCESS_AGENT_AUDIENCE` | ✅ | `file_access_api` | File Access Agent 的 aud(僅 promote 使用)|
|
||
| `CONVERTER_TENANT_ID` | ❌ | `""` | 若空則不做 tenant 檢查 |
|
||
| `CONVERTER_SCOPE_WRITE` | ❌ | `converter:job.write` | 可覆寫 |
|
||
| `CONVERTER_SCOPE_READ` | ❌ | `converter:job.read` | 可覆寫 |
|
||
| `API_V1_RATE_LIMIT_WINDOW_MS` | ❌ | `300000` | 5 min |
|
||
| `API_V1_RATE_LIMIT_MAX` | ❌ | `300` | 每 client_id |
|
||
| `MULTIPART_MODEL_MAX_BYTES` | ❌ | `524288000` | `POST /api/v1/jobs` 模型檔大小上限(500MB,可覆寫)|
|
||
| `MULTIPART_REF_IMAGES_MAX_COUNT` | ❌ | `100` | `POST /api/v1/jobs` ref_images 數量上限 |
|
||
| `PROMOTE_TIMEOUT_MS` | ❌ | `300000` | promote 單檔 timeout |
|
||
|
||
**Secret 管理**:`KNERON_CONVERTER_CLIENT_SECRET` 禁止進 Git。dev 用 `.env`,prod 建議由 Docker secrets / K8s secrets 注入。
|
||
|
||
---
|
||
|
||
## 10. 向後相容與遷移
|
||
|
||
### 10.1 既有路徑行為(不變)
|
||
|
||
| 路徑 | Phase 1 行為 |
|
||
|------|------------|
|
||
| `POST /jobs` (multipart) | **不變**,繼續接收 Web UI 上傳 |
|
||
| `GET /jobs/:id` | **不變**,`origin=web` 的 job 不過濾,`origin=api` 的 job 也看得到(內部 vhost 無授權,看不到差別)|
|
||
| `GET /jobs/:id/events` (SSE) | **不變**,Web UI 繼續用 |
|
||
| `GET /jobs/:id/download/:filename` | **不變**,Web UI 下載結果 |
|
||
| `GET /jobs` | **不變**,列全部 |
|
||
| `GET /health`, `GET /queues/stats` | **不變** |
|
||
|
||
### 10.2 Web UI 何時遷移
|
||
|
||
**非本次範圍**。未來若決定把 Web UI 也納入 OAuth,屬於獨立的 L 級任務,需要設計 Member Center 登入流程、token refresh 等 UX 細節。
|
||
|
||
### 10.3 `STORAGE_BACKEND=local` 模式
|
||
|
||
既有 local 模式(Shared Volume)保留運作。新 API 要求 `STORAGE_BACKEND=minio`,因為:
|
||
- 從 multipart 收到的 buffer 要寫到某個 bucket 供 Worker 讀取
|
||
- Shared Volume 路徑跨 container 複雜,未來跨主機部署也不適合
|
||
|
||
**實作檢查**:`POST /api/v1/jobs` 啟動時檢查 `STORAGE_BACKEND === 'minio'`,若非則 500 `misconfiguration`。
|
||
|
||
---
|
||
|
||
## 11. 測試策略
|
||
|
||
### 11.1 Unit test(Jest / Mocha)
|
||
|
||
- `auth/jwks.js`:mock JWKS 回應,測過期、簽章錯、aud 錯、scope 不足
|
||
- `auth/oauthClient.js`:mock token endpoint,測 cache 命中、過期重取、失敗處理
|
||
- `fileAccessAgent/client.js`:mock fetch,測 PUT 5xx 重試、401 invalidate 重試、timeout
|
||
- `services/jobService.js`:測 claim_active_job 的並發(模擬兩個 user_id 相同同時建 job)
|
||
- `routes/v1/jobs.js` multipart validation:mock `multer`,測超過 500MB、缺 `model`、model_id 非數字、platform 不在 enum、user_id 含 `/`
|
||
- Response schema 映射(內部 status → 對外 status + stage)
|
||
|
||
### 11.2 Integration test
|
||
|
||
- **Member Center mock**:用 `wiremock` 或手寫 Express mock 模擬 JWKS + token endpoint
|
||
- **File Access Agent mock**:模擬 PUT 的成功 / 失敗回應(promote 用)
|
||
- **Redis**:用真 Redis(docker-compose test 環境)
|
||
- **multipart 上傳**:用 `supertest` + `attach('model', buffer, 'model.onnx')` 測試真實 multipart 流程(小檔、中檔、邊界檔 499MB / 501MB)
|
||
|
||
### 11.3 E2E test(黑箱)
|
||
|
||
- 需真 Member Center + File Access Agent 測試環境(Phase 1 kickoff 前準備)
|
||
- 測試案例:
|
||
1. 完整流程:multipart 上傳 → polling → promote 成功
|
||
2. 409 測試:同 user 連續建 job
|
||
3. 權限測試:invalid token / 缺 scope / 錯 aud
|
||
4. 錯誤路徑:上傳超過 500MB → 413、缺 `model` file → 400、promote File Access Agent 500 → 502
|
||
5. 多檔案大小測試:小檔(1MB)、中檔(50MB)、大檔(200MB、500MB)分別驗證 p95
|
||
|
||
### 11.4 負載測試
|
||
|
||
- `POST /api/v1/jobs` 不需高 QPS(實際使用量一個 user 分鐘級),但需驗證大檔 multipart 不會 OOM(測試 10 個 user 同時上傳 200MB)
|
||
- `GET /api/v1/jobs/:id` 是熱點(polling),測每秒 100 req per Scheduler instance
|
||
- p95 < 200ms 驗證(GET),p95 < 5s / 12s 驗證(POST 200MB / 500MB)
|
||
|
||
---
|
||
|
||
## 12. 實作任務拆分(按 Autoflow 增量式開發規範)
|
||
|
||
每個任務 = 一個可獨立 review 的單位。Reviewer 會逐個審查。
|
||
|
||
| # | 任務 | 依賴 | 可並行? | 預估 | 驗收標準 |
|
||
|---|------|------|---------|------|---------|
|
||
| T1 | auth middleware + JWKS 驗證 | — | — | 3d | unit test 全過,能在空 route 上驗 mock token |
|
||
| T2 | Converter OAuth client(client_credentials + cache)| — | ✅ 與 T1 平行 | 2d | unit test 過,能對 mock token endpoint 取到並 cache |
|
||
| T3 | 新 `/api/v1/*` 路由骨架 + 錯誤格式統一 + request_id middleware | T1 | — | 2d | 所有新端點可通,回 501 是正常路徑 |
|
||
| T4 | POST /api/v1/jobs(multer 接收 multipart、寫 MinIO、active job 鎖、enqueue)| T1, T3 | — | 3d | 能建 job、409 正常、413 正常、回滾正常、大檔不 OOM |
|
||
| T5 | GET /api/v1/jobs + GET /api/v1/jobs/:id(含 ETag、client 隔離、user 索引)| T1, T3, T4 | ✅ 與 T6 平行 | 3d | Recovery 查詢正確、ETag 304 可用 |
|
||
| T6 | POST /api/v1/jobs/:id/promote(含 stream PUT、重試、FAA client)| T1, T2, T3 | ✅ 與 T5 平行 | 4d | 促進成功、冪等、失敗可重試 |
|
||
| T7 | 部署分流(Nginx 雙 vhost 設定 + docker-compose 更新)| — | ✅ 與 T1-T6 平行 | 1d | 內網可達 `/jobs`,公網只可達 `/api/v1/*` |
|
||
| T8 | OpenAPI 3.0 spec(手寫)+ 錯誤碼完整文件 | T3-T6 | — | 2d | spec lint 過,visionA-backend 能直接 import |
|
||
|
||
**預估總工時**:3-4 人週(單人序列執行),若 2 人並行可壓到 2 週。對齊 PRD RICE Effort=4 的估算(較原估算略減,因為 T4 不再需要實作 FAA GET / HEAD 分支)。
|
||
|
||
**外部依賴觸發**:
|
||
- T1 需要 Member Center JWKS URL(可用 mock)
|
||
- T6 需要 File Access Agent 測試環境(或 mock PUT endpoint)
|
||
- T7 需要使用者確認部署拓撲
|
||
|
||
---
|
||
|
||
## 13. 未解決 / 待確認事項(TBD)
|
||
|
||
| # | 項目 | 影響 | 待誰確認 |
|
||
|---|------|------|---------|
|
||
| TBD-1 | Member Center 的 `tenant_id` claim 是否出現在 client_credentials token | T1 設定 | Member Center owner |
|
||
| TBD-2 | `kneron_converter_api` audience / `kneron_converter` client / scope 的最終命名 | T1, T2 | Member Center owner |
|
||
| TBD-3 | File Access Agent 的 base URL(測試環境、prod 環境)與 tenant_id | T6 | File Access Agent owner |
|
||
| TBD-4 | Rate limit 的實際值(300 req / 5min 是估算,需觀測後校準)| 上線後調整 | 觀測資料 |
|
||
| TBD-5 | Nginx 雙 vhost 的具體 IP / hostname(依部署拓撲)| T7 | 使用者 / DevOps |
|
||
| TBD-6 | stage_progress 的顆粒度(Worker 是否有能力回報 stage 內 %)| P2 feature | Worker 開發團隊 |
|
||
|
||
---
|
||
|
||
## 14. 附錄:Error code 完整表
|
||
|
||
| Code | HTTP | 說明 |
|
||
|------|------|------|
|
||
| `validation_error` | 400 | 欄位格式錯誤(multipart field 缺漏、model_id 非數字、platform 不在 enum 等)|
|
||
| `invalid_multipart` | 400 | multipart parse 失敗、缺必要 file、副檔名不符 |
|
||
| `invalid_token` | 401 | JWT 無效 / 簽章錯 / 缺 claim |
|
||
| `token_expired` | 401 | JWT 過期 |
|
||
| `insufficient_scope` | 403 | scope 不足 |
|
||
| `tenant_mismatch` | 403 | tenant_id 不符 |
|
||
| `job_not_found` | 404 | job 不存在或不屬於 client(避免資訊洩露)|
|
||
| `not_found` | 404 | 路徑不存在 |
|
||
| `user_has_active_job` | 409 | 同 user 已有 in-progress job |
|
||
| `job_not_ready_for_promote` | 409 | promote 時 job 非 completed |
|
||
| `source_not_available` | 409 | promote 的 source stage 沒產出 |
|
||
| `file_too_large` | 413 | multipart 上傳超過 500MB(由 multer `LIMIT_FILE_SIZE` 觸發)|
|
||
| `invalid_object_key` | 422 | target_object_key 格式不合法 |
|
||
| `misconfiguration` | 500 | 伺服器設定錯誤(例:STORAGE_BACKEND 錯)|
|
||
| `storage_unavailable` | 502 | MinIO 寫入失敗(`POST /api/v1/jobs` 寫 input 時)|
|
||
| `internal_error` | 500 | 其他未分類 |
|
||
| `not_implemented` | 501 | Phase 2 功能 |
|
||
| `file_gateway_unavailable` | 502 | File Access Agent 失敗(僅 promote 使用)|
|
||
| `auth_service_unavailable` | 503 | Member Center 取 token 失敗(僅 promote 使用)|
|
||
| `service_unavailable` | 503 | 其他依賴失敗 |
|
||
|
||
---
|
||
|
||
## 15. 附錄:請求 / 回應速查
|
||
|
||
### 建 job(multipart)
|
||
```bash
|
||
curl -X POST https://converter.innovedus.com/api/v1/jobs \
|
||
-H "Authorization: Bearer $TOKEN" \
|
||
-F "model=@./model.onnx" \
|
||
-F "user_id=u-12345" \
|
||
-F "model_id=1001" \
|
||
-F "version=0001" \
|
||
-F "platform=520" \
|
||
-F "enable_evaluate=false" \
|
||
-F "enable_sim_fp=false" \
|
||
-F "enable_sim_fixed=false" \
|
||
-F "enable_sim_hw=false"
|
||
```
|
||
|
||
**含參考圖片**(可重複 `-F "ref_images[]=@..."`):
|
||
```bash
|
||
curl -X POST https://converter.innovedus.com/api/v1/jobs \
|
||
-H "Authorization: Bearer $TOKEN" \
|
||
-F "model=@./model.onnx" \
|
||
-F "ref_images[]=@./img_0.jpg" \
|
||
-F "ref_images[]=@./img_1.jpg" \
|
||
-F "user_id=u-12345" \
|
||
-F "model_id=1001" \
|
||
-F "version=0001" \
|
||
-F "platform=520"
|
||
```
|
||
|
||
### 查 job
|
||
```bash
|
||
curl -H "Authorization: Bearer $TOKEN" \
|
||
https://converter.innovedus.com/api/v1/jobs/550e8400-...
|
||
```
|
||
|
||
### Recovery
|
||
```bash
|
||
curl -H "Authorization: Bearer $TOKEN" \
|
||
'https://converter.innovedus.com/api/v1/jobs?user_id=u-12345&status=in_progress'
|
||
```
|
||
|
||
### Promote
|
||
```bash
|
||
curl -X POST https://converter.innovedus.com/api/v1/jobs/550e8400-.../promote \
|
||
-H "Authorization: Bearer $TOKEN" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"targets": [
|
||
{"source": "nef", "target_object_key": "visionA/models/u-12345/m-1001/v0001/out.nef"}
|
||
]
|
||
}'
|
||
```
|
||
|
||
---
|
||
|
||
## 16. 變更記錄
|
||
|
||
| 日期 | 版本 | 變更 | 作者 |
|
||
|------|------|------|------|
|
||
| 2026-04-25 | Draft 1.0 | 初版,Phase 1 完整規格 | Architect Agent |
|
||
| 2026-04-25 | Draft 1.1 | POST /api/v1/jobs 改 multipart/form-data;移除 FAA getFile/headFile 實作、`files:download.read`/`files:metadata.read` scope、`input_object_key` 欄位、`input_not_found` error code;新增 `invalid_multipart`/`file_too_large`/`storage_unavailable` error codes;TBD-1 刪除、TBD 重新編號;§2.5 File Access Agent client 僅保留 putFile;§2.8 POST jobs 流程改為 multer 接收→寫 MinIO;§6 FAA 整合精簡為僅 PUT | Architect Agent |
|
||
|
||
---
|
||
|
||
**注意**:本 TDD 約 1390 行,已超過拆分門檻甚多。本次更新聚焦內容修正,暫不拆分;下輪更新強烈建議拆分為:
|
||
- `TDD.md`(索引)
|
||
- `TDD-api.md`(§1、§14、§15)
|
||
- `TDD-backend.md`(§2、§3、§4)
|
||
- `TDD-integration.md`(§5、§6)
|
||
- `TDD-infra.md`(§7、§9)
|
||
- `TDD-testing.md`(§11)
|