jim800121chen cff9236699 docs: migrate Autoflow shared documents to docs/autoflow/
Move PRD, design specs, architecture docs, and TDD from .autoflow/
(personal/per-branch layer) to docs/autoflow/ (shared layer that
goes into git) per the new Autoflow workspace layout.

Files moved:
- 02-prd/PRD.md
- 03-design/design-review.md
- 03-design/user-flow-cross-system.md
- 04-architecture/TDD.md
- 04-architecture/design-doc.md
- 04-architecture/security.md

The originals were never tracked, so git mv reduced to a filesystem
rename with no history to preserve. .autoflow/ remains for personal
notes (progress.md, review reports, testing logs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:59:21 +08:00

1393 lines
51 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TDD — Kneron Model Converter 對外 APIPhase 1
## 作者Architect Agent
## 狀態Draft三方交叉審閱前
## 最後更新2026-04-25
## 配套文件:
- `design-doc.md`(架構決策)
- `../02-prd/PRD.md`(需求)
- `../03-design/design-review.md`UX 回饋)
本 TDD 聚焦 Phase 1 實作細節。所有決策背後的「為什麼」請參考 `design-doc.md`
## 變更歷程
| 日期 | 變更 | 作者 |
|------|------|------|
| 2026-04-25 | 初版 Draft 1.0 | Architect Agent |
| 2026-04-25 | 原始模型上傳路徑改為 visionA-backend multipart 直接上傳 ConverterPOST /api/v1/jobs 改 multipart/form-data移除 FAA `getFile()` / `headFile()` / `files:download.read` / `files:metadata.read` 相關內容TBD-1、input_object_key、input_not_found 相關內容同步移除 | Architect Agent |
---
## 1. API 規格Phase 1 必做)
### 1.1 通用約定
- **Base URL**`https://<converter-host>/api/v1`public vhost僅此路徑對外
- **Content-Type**
- `POST /api/v1/jobs``multipart/form-data`(與既有 Web UI `POST /jobs` 一致)
- 其他端點GET / POST `/promote`request 為 `application/json; charset=utf-8`
- **所有 response**`application/json; charset=utf-8`
- **時間格式**ISO 8601 UTC`2026-04-25T12:00:00Z`
- **ID 格式**`job_id` 採 UUIDv4字串
- **認證**`Authorization: Bearer <JWT>`(除 `/health` 外全部必要)
- **Request ID**:若 client 傳 `X-Request-Id`,回應帶同一值;未傳則 server 產 UUIDv4。所有 log 必須記錄。
- **速率限制**per `client_id` 300 req / 5minheader 回 `X-RateLimit-Limit``X-RateLimit-Remaining``X-RateLimit-Reset`
### 1.2 統一錯誤格式
所有 4xx / 5xx 回應:
```json
{
"error": {
"code": "string_code",
"message": "human readable message (zh-TW)",
"details": { /* 可選,結構視 code 而定 */ },
"request_id": "uuid-v4"
}
}
```
### 1.3 端點清單
| 方法 | 路徑 | 說明 | 需 scope |
|------|------|------|---------|
| GET | `/health` | 健康檢查 | — |
| POST | `/api/v1/jobs` | 建立轉檔 job | `converter:job.write` |
| GET | `/api/v1/jobs` | 列出 job過濾條件| `converter:job.read` |
| GET | `/api/v1/jobs/:id` | 單一 job 狀態 | `converter:job.read` |
| POST | `/api/v1/jobs/:id/promote` | 搬檔到 File Access Agent | `converter:job.write` |
**Phase 2 預留Phase 1 回 501 Not Implemented**
| 方法 | 路徑 | 說明 |
|------|------|------|
| POST | `/api/v1/jobs/:id/download-tokens` | 換 delegated download token待 Member Center|
| DELETE | `/api/v1/jobs/:id` | 取消 job |
### 1.4 端點詳細規格
#### 1.4.1 `GET /health`(不需 auth
**Response 200**
```json
{
"service": "kneron-converter-api",
"status": "healthy",
"version": "1.0.0",
"timestamp": "2026-04-25T12:00:00Z",
"dependencies": {
"redis": "connected",
"member_center": "reachable",
"file_access_agent": "reachable"
}
}
```
**Response 503**(任一依賴失敗):
```json
{
"service": "kneron-converter-api",
"status": "unhealthy",
"dependencies": {
"redis": "disconnected",
"member_center": "reachable",
"file_access_agent": "reachable"
}
}
```
說明Member Center / File Access Agent 的可達性檢查可用背景 cache每 30s 檢查一次),避免 `/health` 自己變慢。
---
#### 1.4.2 `POST /api/v1/jobs`
**Request**
```http
POST /api/v1/jobs
Authorization: Bearer <JWT>
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary...
X-Request-Id: <uuid> (optional)
------WebKitFormBoundary...
Content-Disposition: form-data; name="model"; filename="model.onnx"
Content-Type: application/octet-stream
<binary model file>
------WebKitFormBoundary...
Content-Disposition: form-data; name="ref_images[]"; filename="img_0.jpg"
Content-Type: image/jpeg
<binary image>
------WebKitFormBoundary...
Content-Disposition: form-data; name="user_id"
visionA-user-12345
------WebKitFormBoundary...
Content-Disposition: form-data; name="model_id"
1001
------WebKitFormBoundary...
Content-Disposition: form-data; name="version"
0001
------WebKitFormBoundary...
Content-Disposition: form-data; name="platform"
520
------WebKitFormBoundary...
Content-Disposition: form-data; name="enable_evaluate"
false
------WebKitFormBoundary...--
```
**Multer 設定**
- `multer.memoryStorage()`(與既有 Web UI `POST /jobs` 一致)
- `limits.fileSize`: 500MB`model` 單檔上限)
- `fields`: `model`1 個 file`ref_images[]``maxCount: 100`
**欄位定義**
| 欄位 | 類型 | 位置 | 必填 | 驗證 |
|------|------|------|------|------|
| `model` | file | multipart file | ✅ | 副檔名 ∈ {`.onnx`, `.pt`, `.pth`, `.tflite`, `.h5`, `.pb`};大小 ≤ 500MB |
| `ref_images[]` | file[] | multipart file | ❌ | `image/*`;最多 100 張;與既有 Web UI 規則一致 |
| `user_id` | string | multipart field | ✅ | 1-128 字元,不含 `/``\``..`VisionA 端決定格式 |
| `model_id` | string → int | multipart field | ✅ | 轉 int 後 1 ≤ x ≤ 65535 |
| `version` | string | multipart field | ✅ | 1-32 字元,建議數字字串 |
| `platform` | string | multipart field | ✅ | enum: `520`, `720`, `530`, `630`, `730` |
| `enable_evaluate` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
| `enable_sim_fp` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
| `enable_sim_fixed` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
| `enable_sim_hw` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` |
| `metadata` | stringJSON| multipart field | ❌ | 若傳入,需為合法 JSON 物件字串;未來擴展用 |
**注意事項**
- multipart 中所有 field value 都是字串server 端需將 `'true'` / `'false'` → boolean`model_id` → integer。
- 與既有 Web UI `POST /jobs` multipart 欄位完全對齊,`user_id` 是對外 API 新增的欄位Web UI 不需要)。
- Validation 順序:先驗 OAuth token、再驗 multipart避免未驗證就吃大檔。實作上建議把 `requireAuth` middleware 放在 `multer` middleware 之前,這樣無效 token 會在 multer 開始 parse 前就被拒。
**Response 201 Created**
```json
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "created",
"stage": "onnx",
"progress": 0,
"created_at": "2026-04-25T12:00:00Z",
"expires_at": "2026-05-02T12:00:00Z",
"user_id": "visionA-user-12345"
}
```
**錯誤回應**
| 狀態 | error.code | 情境 |
|------|-----------|------|
| 400 | `validation_error` | 欄位缺漏或格式錯誤(`details.field` 列出問題欄位)|
| 400 | `invalid_multipart` | multipart parse 失敗、缺必要 file / field、副檔名不符 |
| 401 | `invalid_token` | JWT 無效 / 過期 / 缺 claim |
| 403 | `insufficient_scope` | token 缺 `converter:job.write``details.required_scope`|
| 403 | `tenant_mismatch` | token 的 `tenant_id` 與 Converter 設定不符 |
| 409 | `user_has_active_job` | user_id 已有進行中 job詳見 §1.5|
| 413 | `file_too_large` | 上傳檔案超過 500MB由 multer `LIMIT_FILE_SIZE` 觸發)|
| 500 | `misconfiguration` | `STORAGE_BACKEND !== 'minio'` 等 |
| 500 | `internal_error` | 其他 |
---
#### 1.4.3 `GET /api/v1/jobs/:id`
**Request**
```http
GET /api/v1/jobs/550e8400-e29b-41d4-a716-446655440000
Authorization: Bearer <JWT>
If-None-Match: "etag-value" (optional)
```
**Response 200 OK**
```json
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"user_id": "visionA-user-12345",
"status": "running",
"stage": "bie",
"progress": 45,
"stage_progress": 60,
"created_at": "2026-04-25T12:00:00Z",
"updated_at": "2026-04-25T12:05:30Z",
"expires_at": "2026-05-02T12:00:00Z",
"stage_timings": {
"onnx": { "started_at": "2026-04-25T12:00:05Z", "completed_at": "2026-04-25T12:02:10Z" },
"bie": { "started_at": "2026-04-25T12:02:15Z", "completed_at": null },
"nef": null
},
"input": {
"filename": "model.onnx",
"object_key": "jobs/550e8400-e29b-41d4-a716-446655440000/input/model.onnx",
"size_bytes": 204800000,
"ref_images_count": 0
},
"result_object_keys": null,
"error": null,
"parameters": {
"model_id": 1001,
"version": "0001",
"platform": "520",
"enable_evaluate": false,
"enable_sim_fp": false,
"enable_sim_fixed": false,
"enable_sim_hw": false
},
"metadata": {
"source": "visionA-web",
"tags": ["experiment-001"]
},
"estimated_completion_at": null
}
```
**狀態機**`status` 欄位):
- `created` — 剛建立,等第一階段開工
- `running` — 正在某個 stage`stage` 欄位有值)
- `completed` — 全部完成(`result_object_keys` 有值,`stage=null`
- `failed` — 失敗(`error` 有值)
**完成時的 `result_object_keys`**(在 Converter Bucket 的 key
```json
"result_object_keys": {
"onnx": "jobs/{job_id}/output/out.onnx",
"bie": "jobs/{job_id}/output/out.bie",
"nef": "jobs/{job_id}/output/out.nef"
}
```
**失敗時的 `error`**
```json
"error": {
"stage": "bie",
"code": "quantization_failed",
"message": "參考圖片不足或格式不符BIE 量化階段失敗",
"details": { "raw": "..." }
}
```
**Response 304 Not Modified**:若 `If-None-Match` 吻合當前 ETagETag 建議為 `updated_at` 的 hash
**錯誤回應**
| 狀態 | error.code | 情境 |
|------|-----------|------|
| 401/403 | 同上 | — |
| 404 | `job_not_found` | job 不存在,或不屬於呼叫 client_id避免資訊洩露|
---
#### 1.4.4 `GET /api/v1/jobs`(列表 / Recovery
**Query 參數**
| 參數 | 類型 | 必填 | 說明 |
|------|------|------|------|
| `user_id` | string | ❌ | 過濾 user_idRecovery 必用)|
| `status` | string | ❌ | `in_progress`= `created` `running`, `completed`, `failed`, `all`(預設 `all`|
| `limit` | int | ❌ | 預設 20上限 100 |
| `offset` | int | ❌ | 預設 0 |
| `created_after` | ISO 8601 | ❌ | 過濾 `created_at >= created_after` |
**Response 200**
```json
{
"total": 2,
"limit": 20,
"offset": 0,
"items": [
{ /* GET /jobs/:id 格式,但 items 為精簡版:可省 stage_timings.details、metadata */ }
]
}
```
**實作注意**:以 `user:{user_id}:jobs` Set 為索引,避免全掃 `KEYS job:*`(採納 Design 4.1.2 建議)。
---
#### 1.4.5 `POST /api/v1/jobs/:id/promote`
**Request**
```http
POST /api/v1/jobs/550e8400-.../promote
Authorization: Bearer <JWT>
Content-Type: application/json
{
"targets": [
{
"source": "nef",
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef"
},
{
"source": "bie",
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.bie"
}
]
}
```
**欄位定義**
| 欄位 | 類型 | 必填 | 說明 |
|------|------|------|------|
| `targets` | array | ✅ | 要 promote 的檔案清單(至少 1 個)|
| `targets[].source` | string | ✅ | enum: `onnx`, `bie`, `nef` — 對應 job 輸出檔 |
| `targets[].target_object_key` | string | ✅ | File Access Agent 的目標 keyVisionA 決定命名)|
**Response 200 OK**
```json
{
"job_id": "550e8400-...",
"promoted": [
{
"source": "nef",
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef",
"size_bytes": 10485760,
"file_access_agent_etag": "abc123",
"promoted_at": "2026-04-25T12:30:00Z"
},
{
"source": "bie",
"target_object_key": "visionA/models/user-12345/model-1001/v0001/out.bie",
"size_bytes": 5242880,
"file_access_agent_etag": "def456",
"promoted_at": "2026-04-25T12:30:02Z"
}
]
}
```
**錯誤回應**
| 狀態 | error.code | 情境 |
|------|-----------|------|
| 400 | `validation_error` | targets 格式錯、source 非合法 stage |
| 404 | `job_not_found` | 同上 |
| 409 | `job_not_ready_for_promote` | `status != completed``details.current_status`|
| 409 | `source_not_available` | job 沒產這個 stage 的結果(例如只跑 onnx 但要 promote nef|
| 502 | `file_gateway_unavailable` | File Access Agent PUT 失敗 |
| 503 | `auth_service_unavailable` | 取 Converter 自己 token 失敗 |
**重試語意**`promote` 是冪等的(同樣 target_object_key PUT 兩次結果一樣File Access Agent 會覆蓋。Converter Bucket 檔案在 7 天內保留,允許重試。
### 1.5 重要錯誤 payload 範例
#### `user_has_active_job`(採納 Design 建議)
```json
{
"error": {
"code": "user_has_active_job",
"message": "使用者目前已有進行中的轉檔任務",
"details": {
"active_job_id": "550e8400-...",
"active_job_status": "running",
"active_job_stage": "bie",
"active_job_progress": 45,
"active_job_created_at": "2026-04-25T12:00:00Z"
},
"request_id": "req-uuid"
}
}
```
#### `insufficient_scope`
```json
{
"error": {
"code": "insufficient_scope",
"message": "token 缺少必要權限",
"details": {
"required_scope": "converter:job.write",
"provided_scopes": ["converter:job.read"]
},
"request_id": "req-uuid"
}
}
```
---
## 2. Task Scheduler 改造
### 2.1 目錄結構建議
```
apps/task-scheduler/
├── server.js ← 既有,只作為 entry初始化 + mount routes
├── src/
│ ├── config.js ← 新:集中讀取所有 envfail fast
│ ├── redis.js ← 新Redis client + helper
│ ├── auth/
│ │ ├── jwks.js ← 新JWKS cache + JWT 驗證
│ │ ├── middleware.js ← 新Express middleware驗 token + scope
│ │ └── oauthClient.js ← 新Converter 作為 OAuth clienttoken cache
│ ├── fileAccessAgent/
│ │ ├── client.js ← 新File Access Agent HTTP client僅 PUTpromote 用)
│ │ └── errors.js ← 新:錯誤翻譯
│ ├── routes/
│ │ ├── legacy.js ← 既有路由(/jobs, /jobs/:id, /jobs/:id/events, ...
│ │ └── v1/
│ │ ├── index.js ← mount 新路由
│ │ ├── jobs.js ← POST/GET /api/v1/jobs, GET /:id
│ │ └── promote.js ← POST /api/v1/jobs/:id/promote
│ ├── services/
│ │ ├── jobService.js ← 新:封裝 job CRUD、user 索引、active job 檢查
│ │ └── doneListener.js ← 既有 listenDoneQueue 抽成 module
│ ├── middleware/
│ │ ├── errorHandler.js ← 新:統一錯誤格式
│ │ └── requestId.js ← 新X-Request-Id
│ └── utils/
│ └── logger.js ← 新:結構化 log
├── package.json
└── Dockerfile
```
**實作原則**:保守重構,既有功能不改語意,只「移動 + 抽象」。
### 2.2 auth middlewareT1
```javascript
// src/auth/middleware.js 骨架
const { verifyJwt, InsufficientScopeError } = require('./jwks');
const config = require('../config');
function requireAuth(requiredScope) {
return async (req, res, next) => {
try {
const authHeader = req.headers.authorization || '';
const match = authHeader.match(/^Bearer\s+(.+)$/);
if (!match) {
return sendError(res, 401, 'invalid_token', 'Missing bearer token', req);
}
const token = match[1];
const claims = await verifyJwt(token, {
issuer: config.memberCenter.issuer,
audience: config.converter.audience,
clockSkew: 60,
});
// scope 檢查
const scopes = (claims.scope || '').split(' ').filter(Boolean);
if (!scopes.includes(requiredScope)) {
return sendError(res, 403, 'insufficient_scope', 'Missing required scope', req, {
required_scope: requiredScope,
provided_scopes: scopes,
});
}
// tenant 檢查(可選)
if (config.converter.tenantId && claims.tenant_id) {
if (claims.tenant_id !== config.converter.tenantId) {
return sendError(res, 403, 'tenant_mismatch', 'Tenant mismatch', req);
}
}
// 記錄 claim 到 req 供下游使用
req.auth = {
clientId: claims.client_id || claims.sub,
tenantId: claims.tenant_id || null,
scopes,
tokenClaims: claims,
};
next();
} catch (err) {
// 具體錯誤類型處理
if (err.code === 'ERR_JWT_EXPIRED') {
return sendError(res, 401, 'token_expired', 'Token expired', req);
}
if (err.code === 'ERR_JWKS_NO_MATCHING_KEY') {
return sendError(res, 401, 'invalid_token', 'Signature verification failed', req);
}
return sendError(res, 401, 'invalid_token', 'Token verification failed', req);
}
};
}
```
### 2.3 JWKS cacheT1
採用 `jose` npm 套件的 `createRemoteJWKSet`,內建 TTL cache 與 stale-while-revalidate。
```javascript
// src/auth/jwks.js
const { createRemoteJWKSet, jwtVerify } = require('jose');
const config = require('../config');
const jwks = createRemoteJWKSet(new URL(config.memberCenter.jwksUrl), {
cacheMaxAge: 10 * 60 * 1000, // 10 min
cooldownDuration: 30 * 1000, // 30s 內不重複 refresh
});
async function verifyJwt(token, { issuer, audience, clockSkew }) {
const { payload } = await jwtVerify(token, jwks, {
issuer,
audience,
clockTolerance: clockSkew,
});
return payload;
}
module.exports = { verifyJwt };
```
### 2.4 OAuth clientT2
```javascript
// src/auth/oauthClient.js
const config = require('../config');
class OAuthClient {
constructor() {
this._cache = new Map(); // scope-key -> { token, expiresAt }
}
async getToken(scope) {
const key = scope;
const cached = this._cache.get(key);
if (cached && cached.expiresAt - 60000 > Date.now()) {
return cached.token;
}
const params = new URLSearchParams({
grant_type: 'client_credentials',
client_id: config.converter.clientId,
client_secret: config.converter.clientSecret,
scope,
audience: config.fileAccessAgent.audience,
});
const res = await fetch(config.memberCenter.tokenUrl, {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: params.toString(),
});
if (!res.ok) {
throw new Error(`token endpoint ${res.status}`);
}
const data = await res.json();
const entry = {
token: data.access_token,
expiresAt: Date.now() + (data.expires_in || 3600) * 1000,
};
this._cache.set(key, entry);
return entry.token;
}
invalidate(scope) {
this._cache.delete(scope);
}
}
module.exports = new OAuthClient();
```
**錯誤處理**:呼叫端 catch 到失敗時回 503 `auth_service_unavailable`
### 2.5 File Access Agent clientT6
Phase 1 Converter 只在 `promote` 階段呼叫 File Access Agent寫入結果檔**不需要 HEAD / GET**。
```javascript
// src/fileAccessAgent/client.js
const config = require('../config');
const oauthClient = require('../auth/oauthClient');
async function putFile(objectKey, stream, { contentType, contentLength }) {
const token = await oauthClient.getToken('files:upload.write');
const res = await fetch(
`${config.fileAccessAgent.baseUrl}/files/${encodeURI(objectKey)}`,
{
method: 'PUT',
headers: {
Authorization: `Bearer ${token}`,
'Content-Type': contentType,
'Content-Length': String(contentLength),
},
body: stream,
duplex: 'half', // Node 18 stream body 需要
}
);
if (!res.ok) throw new FAAError(res.status, await res.text());
return await res.json();
}
module.exports = { putFile };
```
**大檔 stream 處理promote 用)**:從 MinIO `GetObjectCommand` 的 Bodystream直接 pipe 到 fetch PUT body確保不把整個結果檔載入記憶體。`POST /api/v1/jobs/:id/promote` 流程:
```
MinIO GetObjectCommand.Body (stream)
↓ pipe
fetch PUT body (stream, duplex: 'half')
File Access Agent
```
### 2.6 新路由群T3
```javascript
// src/routes/v1/index.js
const express = require('express');
const jobsRouter = require('./jobs');
const promoteRouter = require('./promote');
const { requireAuth } = require('../../auth/middleware');
const { apiV1RateLimit } = require('../../middleware/rateLimit');
const router = express.Router();
router.use(apiV1RateLimit);
router.post('/jobs', requireAuth('converter:job.write'), jobsRouter.create);
router.get('/jobs', requireAuth('converter:job.read'), jobsRouter.list);
router.get('/jobs/:id', requireAuth('converter:job.read'), jobsRouter.get);
router.post('/jobs/:id/promote', requireAuth('converter:job.write'), promoteRouter.promote);
// Phase 2 預留
router.post('/jobs/:id/download-tokens', requireAuth('converter:job.read'), (req, res) => {
res.status(501).json({
error: { code: 'not_implemented', message: 'Phase 2 功能,待 Member Center 補完', request_id: req.requestId },
});
});
router.delete('/jobs/:id', requireAuth('converter:job.write'), (req, res) => {
res.status(501).json({
error: { code: 'not_implemented', message: '尚未實作', request_id: req.requestId },
});
});
module.exports = router;
```
### 2.7 Redis 資料模型改造
#### 2.7.1 Job recordJSONkey = `job:{id}`)新增欄位
```jsonc
{
// 既有欄位
"job_id": "uuid",
"created_at": "...",
"updated_at": "...",
"status": "ONNX | BIE | NEF | COMPLETED | FAILED", // 注意:舊 Web UI 仍用大寫狀態
"stage": "onnx | bie | nef | null",
"progress": 0,
"parameters": { /* model_id, version, platform, options */ },
"output": { "bie_path": null, "nef_path": null },
"error": null,
// 新增欄位Phase 1
"origin": "api | web", // 來自新 API 或舊 Web UI
"user_id": "visionA-user-12345",
"tenant_id": "uuid-or-null",
"created_by_client_id": "kneron_converter_client_abc",
"input": {
"filename": "model.onnx", // multipart 原始檔名
"object_key": "jobs/{job_id}/input/model.onnx", // Converter Bucket 內的 key
"size_bytes": 204800000,
"ref_images_count": 0
},
"stage_timings": {
"onnx": { "started_at": "...", "completed_at": "..." },
"bie": { "started_at": "...", "completed_at": null },
"nef": null
},
"stage_progress": 0, // 0-100當前 stage 內進度Worker 推上來)
"expires_at": "2026-05-02T12:00:00Z",
"metadata": {}
}
```
**關於 `status` 大小寫**:既有 Web UI 會讀大寫(`ONNX`, `COMPLETED` 等)。新 API 對外回傳時需要**映射為小寫語意化狀態**`created`, `running`, `completed`, `failed`)。映射表:
| 內部 status | 對外 `status` + `stage` |
|------------|----------------------|
| `ONNX` | `running` + stage=`onnx` |
| `BIE` | `running` + stage=`bie` |
| `NEF` | `running` + stage=`nef` |
| `COMPLETED` | `completed` + stage=`null` |
| `FAILED` | `failed` + stage=<失敗時的 stage> |
**注意**:既有 Scheduler `advanceJob` 把初始狀態設 `ONNX`不區分「created」。新 API 建 job 後、onnx worker 接到前,依然是 `ONNX`。此時對外狀態應回 `created`stage=onnx 但 stage_timings.onnx.started_at 為 null。**實作上以 `stage_timings.onnx.started_at == null` 判斷是 `created` 還是 `running`。**
#### 2.7.2 User 索引(新)
| Key | 類型 | 用途 | TTL |
|-----|------|------|-----|
| `user:{user_id}:jobs` | Set | 該 user 所有 job_id不分狀態 | 每次寫入時 `EXPIRE 7d` |
| `user:{user_id}:active_job` | String | 當前 in-progress job_id= `created``running`| 隨 job 結束刪除 |
**寫入時機**(原子性用 MULTI 包):
```
建立 job:
MULTI
SET job:{id} {...}
SADD user:{user_id}:jobs {id}
EXPIRE user:{user_id}:jobs 604800
SETNX user:{user_id}:active_job {id} # NX 是同使用者鎖的關鍵
EXEC
若 SETNX 回 0 → 衝突回滾DEL job:{id}、SREM user:{user_id}:jobs {id}),回 409
若 SETNX 回 1 → 成功
完成 / 失敗時:
MULTI
SET job:{id} {...}
DEL user:{user_id}:active_job
EXEC
僅在 active_job 的 value 等於當前 job_id 時才 DEL用 WATCH 或 Lua script 確保)
```
**Lua script建議**:確保「檢查 + 設 active + 寫 job」的原子性。
```lua
-- claim_active_job.lua
-- KEYS[1] = user:{user_id}:active_job
-- KEYS[2] = job:{job_id}
-- KEYS[3] = user:{user_id}:jobs
-- ARGV[1] = job_id
-- ARGV[2] = job_json
-- ARGV[3] = ttl_seconds
if redis.call('EXISTS', KEYS[1]) == 1 then
return {'conflict', redis.call('GET', KEYS[1])}
end
redis.call('SET', KEYS[1], ARGV[1])
redis.call('SET', KEYS[2], ARGV[2])
redis.call('SADD', KEYS[3], ARGV[1])
redis.call('EXPIRE', KEYS[3], tonumber(ARGV[3]))
return {'ok'}
```
#### 2.7.3 避免 `KEYS *` 的實作
**錯誤做法**(既有 code 有用,但新 API 不用):
```javascript
const keys = await redis.keys('job:*'); // O(N) 阻塞 Redis
```
**新 API 列表查詢**
```javascript
async function listJobsByUser(userId, { status, limit, offset }) {
const ids = await redis.smembers(`user:${userId}:jobs`);
const pipeline = redis.pipeline();
for (const id of ids) pipeline.get(`job:${id}`);
const results = await pipeline.exec();
let jobs = results.map(([err, raw]) => JSON.parse(raw)).filter(Boolean);
// status 過濾
if (status === 'in_progress') {
jobs = jobs.filter(j => ['created', 'running'].includes(mapStatus(j)));
} else if (status && status !== 'all') {
jobs = jobs.filter(j => mapStatus(j) === status);
}
// 排序、分頁
jobs.sort((a, b) => new Date(b.created_at) - new Date(a.created_at));
return { total: jobs.length, items: jobs.slice(offset, offset + limit) };
}
```
### 2.8 POST /api/v1/jobs 流程T4
```
1. requireAuth('converter:job.write') — middleware 驗 token放在 multer 之前,避免未驗證就吃大檔)
2. multer 中介層處理 multipartmemoryStoragefileSize=500MB
- req.files.model[0]required
- req.files['ref_images[]'] / req.files.ref_imagesoptional, maxCount=100
- req.body.user_id / model_id / version / platform / enable_*
├── LIMIT_FILE_SIZE → 413 file_too_large
├── multer 其他錯誤 → 400 invalid_multipart
└── ok → 繼續
3. 驗證 fieldsjoi / zod / 手寫):
- user_id, model_id, version, platform 必填
- enable_* 轉 boolean
- model 檔副檔名白名單
├── 失敗 → 400 validation_errordetails.field
4. 檢查 STORAGE_BACKEND === 'minio'
├── 否 → 500 misconfiguration
5. 生成 job_idUUIDv4
6. 嘗試 claim_active_job Lua script見 §2.7.2
├── conflict → 回 409 user_has_active_job + 當前 active job 詳情
└── ok → 繼續
7. 同步寫入 MinIOConverter Bucket
- jobs/{job_id}/input/{sanitized_model_filename} ← req.files.model[0].buffer
- jobs/{job_id}/ref_images/{index}_{sanitized_filename} ← 每個 ref_image.buffer
- 失敗 → 回滾DEL job:{id}, DEL user:{user_id}:active_job, SREM user:{user_id}:jobs {id}),回 502 `storage_unavailable`
8. 更新 job record補 input.object_key、size_bytes、ref_images_count、stage_timings.onnx.started_at=now
9. enqueueStage('onnx', job)
10. 回 201 + { job_id, status: 'created', ... }
```
**關鍵**
- Auth middleware 必須在 multer 之前,避免未驗證就 parse 500MB 大檔
- 第 7 步若任一檔案寫 MinIO 失敗必須回滾,避免 Redis 有 job 但 MinIO 沒檔
- `claim_active_job` 之後才寫 MinIO避免拿到鎖但 MinIO 失敗時還要補回滾 MinIO順序驗證 → 鎖 → 寫檔 → enqueue
**time complexity**SLA p95 < 5s200MB @ 50MB/s 4s multipart + 1s MinIO write)。500MB 檔案 ~12s design-doc §6.1)。
### 2.9 GET /api/v1/jobs/:id 流程T5
```
1. requireAuth('converter:job.read')
2. 讀 job:{id}
3. 若不存在,回 404 job_not_found
4. 若 job.created_by_client_id !== req.auth.clientId → 回 404不洩露
5. 計算 ETag = hash(job.updated_at),若 If-None-Match 吻合 → 304
6. 映射內部 status → 對外 status + stage
7. 回 200 + 序列化 response
```
### 2.10 promote 流程T6
```
1. requireAuth('converter:job.write')
2. 驗 bodytargets 格式)
3. 讀 job:{id}+ client 隔離檢查)
4. 若 status != 'completed' → 409 job_not_ready_for_promote
5. 對每個 target
a. 從 Converter Bucket 讀結果檔stream
b. faa.putFile(target.target_object_key, stream, ...)
c. 記錄 promoted_at / etag / size
6. 全部成功 → 回 200 + promoted[]
7. 部分失敗 → 回 502details 標注哪些成功 / 失敗
```
**冪等性**promote 是冪等的File Access Agent PUT 會覆蓋可以重試
### 2.11 Done listener 的改造
既有 `listenDoneQueue` 收到 worker done 事件時呼叫 `advanceJob`新改動
- `advanceJob` status 變化時同步更新 `stage_timings`
- 完成時自動 `DEL user:{user_id}:active_job`Lua script 保證原子性
- 失敗時同上
### 2.12 /health 升級
既有 `/health` 只檢查 Redis新版加上
- Member Center reachability`GET /.well-known/openid-configuration`背景 30s 一次cache 結果
- File Access Agent reachability`GET /health`同上
- 回應 503 if 任一 critical dependency 異常
---
## 3. Worker 改造
**Phase 1 決定Worker 不大改。**
既有 `services/workers/s3_storage.py` 已支援從 MinIO 讀寫Worker 只要看到 input `jobs/{job_id}/input/` 路徑就開工不需要知道 File Access Agent 的存在
唯一需要改動的
1. **stage_progress 回報**可選Worker 處理過程中若能回報階段內進度例如 30%、60%可透過一個新的 Redis Stream `queue:progress` 推給 SchedulerPhase 1 可先全回 0 100後續增強
2. **`stage_timings` started_at**Worker 接到任務時用既有 done event 先寫一個 `stage_started` event或者更簡單的做法Scheduler `enqueueStage` 時寫 `stage_timings.{stage}.started_at = now`。**建議採後者**Worker 不動
---
## 4. 資料模型與索引
### 4.1 為什麼不用 PostgreSQL
- Phase 1 的資料模式簡單job state machineuser index key-value
- 既有哲學是Crash Reset」,PG 會引入反向的持久化語意反而變複雜
- Redis Set user 索引足以應付預期量per user < 10 jobs / 7
- 未來若要跨 Crash recovery / instance HA再評估 PG
### 4.2 Redis 記憶體預估
- 每個 job record 2-4 KB stage_timings
- 每個 user index Set 每個元素 < 40 bytes
- 1000 並發 user × 10 jobs = 10k job record 40 MBRedis 輕鬆
- Converter Bucket lifecycle 7 Redis 也跟著 TTL 7 記憶體上限可控
---
## 5. OAuth 整合細節
### 5.1 token 驗證resource server 身分)
| Claim | 檢查 |
|-------|------|
| `iss` | 等於 `MC_ISSUER` |
| `aud` | 包含 `KNERON_CONVERTER_AUDIENCE`支援 array string|
| `exp` | 未過期 60s clock skew|
| `nbf` | 若有已到 |
| `scope` | 空白分隔包含 endpoint 要求的 scope |
| `client_id` | 必須有記錄用|
| `tenant_id` | 若有等於 `CONVERTER_TENANT_ID`Phase 1 可先 warn-only|
**JWKS 快取**`jose.createRemoteJWKSet` 內建TTL 10min30s cooldown
### 5.2 Converter 當 OAuth Client
- `client_credentials` grant
- Phase 1 只需要一個 scope`files:upload.write``aud=file_access_api` `promote` 時呼叫
- Cache key = scope未來擴充時若新增 scope自動 per-scope cache
- expires_in - 60s 時主動 refresh
- 失敗時 catch 503 `auth_service_unavailable`
### 5.3 Member Center 離線的影響
| 場景 | 影響 | 緩解 |
|------|------|------|
| JWKS fetch 失敗 | kid 無法驗證 | cache 內還有舊 kid key token 可過 token 會失敗 |
| token endpoint 失敗 | Converter 無法取新 token File Access Agent promote | cache token 有效期內無影響過期後 promote 會失敗 503`POST /api/v1/jobs` job 不受影響只驗他人 token不取自己 token|
| discovery 失敗 | health check 標示 unhealthy | K8s / Docker 重啟不解決需人工介入 |
---
## 6. File Access Agent 整合
### 6.1 Object key 命名約定(建議)
| 用途 | 建議命名 | 說明 |
|------|---------|------|
| promote 結果到模型庫File Access Agent| `visionA/models/{user_id}/{model_id}/v{version}/{filename}` | VisionA 決定 target_object_keyConverter 不強制命名規則|
| Converter Bucket 內部原始模型 input| `jobs/{job_id}/input/{filename}` | Converter 自己管multipart 上傳後寫入 |
| Converter Bucket 內部參考圖片| `jobs/{job_id}/ref_images/{index}_{filename}` | Converter 自己管 |
| Converter Bucket 內部結果檔| `jobs/{job_id}/output/{filename}` | Converter 自己管 |
**約定**
- `target_object_key`promote 目標的命名規則由 VisionA 定義Converter 只做基本 sanity check不能有 `..`反斜線)。
- Converter Bucket 內部 object key Converter 控制外部看不到也不需對齊
- Phase 1 不涉及 File Access Agent 上原始模型的 object key該情境已不存在原始模型直接 multipart Converter)。
### 6.2 HTTP headers 一覽
| Request | Headers |
|---------|---------|
| PUT /files/{key}promote | `Authorization: Bearer <S2S JWT files:upload.write>`, `Content-Type`, `Content-Length` |
**注意**Phase 1 Converter 只對 File Access Agent `PUT` 請求promote 結果檔不需要 HEAD / GET
### 6.3 失敗重試策略(僅 PUT /files/{key}
| 錯誤 | Converter 行為 |
|------|--------------|
| 4xxclient error| 不重試直接回對應的 4xx visionA-backend例如 target_object_key 不合法|
| 401token 失效| 強制 `oauthClient.invalidate('files:upload.write')`重取 token 重試一次仍失敗 503 `auth_service_unavailable` |
| 5xxserver error| 重試最多 2 exponential backoff 500ms / 2000ms全失敗 502 `file_gateway_unavailable` |
| network timeout | 5xx |
### 6.4 Timeout
- PUT /files/{key}依檔案大小動態預設 300s500MB @ 最壞 5MB/s `PROMOTE_TIMEOUT_MS` env 控制
### 6.5 大檔 stream
- 使用 Node 18 原生 `fetch` + `body: ReadableStream`
- `duplex: 'half'` 旗標必要Node 18.17+
- MinIO GetObjectCommand Bodystream直接 pipe fetch PUT body
- 不做記憶體緩衝
---
## 7. 部署架構
### 7.1 Nginx 設定(雙 vhost
```nginx
# /etc/nginx/conf.d/converter.conf
# Upstream
upstream scheduler_upstream {
server scheduler:4000;
keepalive 32;
}
# Public vhost對公網端口 443
server {
listen 443 ssl http2;
server_name converter.innovedus.com;
ssl_certificate /etc/nginx/certs/fullchain.pem;
ssl_certificate_key /etc/nginx/certs/privkey.pem;
# 只 proxy /api/v1/*
location /api/v1/ {
proxy_pass http://scheduler_upstream;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_request_buffering off; # 大檔 stream
proxy_read_timeout 300s;
client_max_body_size 600M; # 容許略大於 500MB 的 multipart 上限POST /api/v1/jobs 原始模型上傳)
}
# /health 可公開
location = /health {
proxy_pass http://scheduler_upstream;
}
# 其他路徑 404
location / {
return 404 '{"error":{"code":"not_found","message":"Not found"}}';
default_type application/json;
}
}
# Internal vhost僅內網 bind端口 80 綁內部 interface
server {
listen 10.0.0.1:80; # 內部 IP不對外
server_name converter-internal.innovedus.com;
# Web UI / 舊工具走的路徑
location /jobs {
proxy_pass http://scheduler_upstream;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_buffering off; # SSE 需要
}
location /queues/stats {
proxy_pass http://scheduler_upstream;
}
# Web UI 靜態資源
location / {
proxy_pass http://web:3000;
}
}
```
### 7.2 docker-compose.yml 變更
```yaml
services:
scheduler:
environment:
# 既有
- PORT=4000
- REDIS_URL=redis://redis:6379
- STORAGE_BACKEND=minio
# ... MinIO 相關
# 新增Phase 1
- MC_ISSUER=${MC_ISSUER}
- MC_JWKS_URL=${MC_JWKS_URL}
- MC_TOKEN_URL=${MC_TOKEN_URL}
- KNERON_CONVERTER_AUDIENCE=${KNERON_CONVERTER_AUDIENCE:-kneron_converter_api}
- KNERON_CONVERTER_CLIENT_ID=${KNERON_CONVERTER_CLIENT_ID}
- KNERON_CONVERTER_CLIENT_SECRET=${KNERON_CONVERTER_CLIENT_SECRET}
- FILE_ACCESS_AGENT_BASE_URL=${FILE_ACCESS_AGENT_BASE_URL}
- FILE_ACCESS_AGENT_AUDIENCE=${FILE_ACCESS_AGENT_AUDIENCE:-file_access_api}
- CONVERTER_TENANT_ID=${CONVERTER_TENANT_ID:-}
- CONVERTER_SCOPE_WRITE=${CONVERTER_SCOPE_WRITE:-converter:job.write}
- CONVERTER_SCOPE_READ=${CONVERTER_SCOPE_READ:-converter:job.read}
- API_V1_RATE_LIMIT_WINDOW_MS=${API_V1_RATE_LIMIT_WINDOW_MS:-300000}
- API_V1_RATE_LIMIT_MAX=${API_V1_RATE_LIMIT_MAX:-300}
- NODE_ENV=${NODE_ENV:-development}
```
### 7.3 `.env.example` 新增
```bash
# === OAuth (Member Center) ===
MC_ISSUER=https://auth.innovedus.com
MC_JWKS_URL=https://auth.innovedus.com/.well-known/jwks
MC_TOKEN_URL=https://auth.innovedus.com/oauth/token
# === Converter identity (Resource Server) ===
KNERON_CONVERTER_AUDIENCE=kneron_converter_api
# === Converter identity (OAuth Client呼叫 File Access Agent 用) ===
KNERON_CONVERTER_CLIENT_ID=kneron_converter
KNERON_CONVERTER_CLIENT_SECRET=change-me
CONVERTER_TENANT_ID=
# === File Access Agent ===
FILE_ACCESS_AGENT_BASE_URL=https://files.nas.internal
FILE_ACCESS_AGENT_AUDIENCE=file_access_api
# === Scope 命名(可配置以防 Member Center owner 要求不同名稱)===
CONVERTER_SCOPE_WRITE=converter:job.write
CONVERTER_SCOPE_READ=converter:job.read
# === Rate Limit ===
API_V1_RATE_LIMIT_WINDOW_MS=300000
API_V1_RATE_LIMIT_MAX=300
```
---
## 8. Scope 設計總表(給跨團隊對齊用)
### 8.1 Converter 作為 Resource Server接收端
| Scope | 用途 | 被誰取 |
|-------|------|--------|
| `converter:job.write` | jobpromote | visionA-backend |
| `converter:job.read` | job | visionA-backend |
| 未來`converter:admin.read` | client job | 內部監控用 |
### 8.2 Converter 作為 OAuth Client發起端
| Scope | 用途 | 在哪裡用 |
|-------|------|---------|
| `files:upload.write` | PUT File Access Agent | promote 結果檔到 NAS 模型庫 |
**Phase 1 僅需上述一個 scope。** Converter 完全不從 File Access Agent 讀取任何東西原始模型已改為 visionA-backend 直接 multipart 上傳 Converter因此不需要 `files:download.read` / `files:metadata.read`
### 8.3 Member Center 需要做的事(跨團隊協調,對應 progress.md 未解決問題)
1. 新增 resource audience `kneron_converter_api`
2. 新增 OAuth client `kneron_converter` Converter 自己用grant=client_credentials
3. visionA-backend client 加上 `converter:job.write``converter:job.read` scope 授權
4. `kneron_converter` client 加上 `files:upload.write` scope 授權**僅此一個用於 promote**
5. 確認 `tenant_id` claim 是否在 S2S token 中可用
6. Phase 2實作 `POST /file-access/download-tokens`
---
## 9. 配置管理(完整環境變數清單)
| 變數 | 必填 | 預設 | 說明 |
|------|------|------|------|
| `PORT` | | `4000` | Scheduler listen port |
| `NODE_ENV` | | `development` | Node 環境 |
| `REDIS_URL` | | `redis://redis:6379` | Redis 連線 |
| `JOB_DATA_DIR` | | `/data/jobs` | local 模式路徑 |
| `FRONTEND_URL` | | `http://localhost:3000` | CORS |
| `STORAGE_BACKEND` | | `local` | `local` / `minio` |
| `MINIO_*` | STORAGE_BACKEND | | 既有 MinIO 參數 |
| **新增Phase 1**| | | |
| `MC_ISSUER` | | | Member Center issuer URL |
| `MC_JWKS_URL` | | | JWKS endpoint |
| `MC_TOKEN_URL` | | | token endpoint |
| `KNERON_CONVERTER_AUDIENCE` | | `kneron_converter_api` | 接受的 aud |
| `KNERON_CONVERTER_CLIENT_ID` | | | Converter 作為 client |
| `KNERON_CONVERTER_CLIENT_SECRET` | | | 嚴禁進 Git |
| `FILE_ACCESS_AGENT_BASE_URL` | | | File Access Agent URL promote 使用|
| `FILE_ACCESS_AGENT_AUDIENCE` | | `file_access_api` | File Access Agent aud promote 使用|
| `CONVERTER_TENANT_ID` | | `""` | 若空則不做 tenant 檢查 |
| `CONVERTER_SCOPE_WRITE` | | `converter:job.write` | 可覆寫 |
| `CONVERTER_SCOPE_READ` | | `converter:job.read` | 可覆寫 |
| `API_V1_RATE_LIMIT_WINDOW_MS` | | `300000` | 5 min |
| `API_V1_RATE_LIMIT_MAX` | | `300` | client_id |
| `MULTIPART_MODEL_MAX_BYTES` | | `524288000` | `POST /api/v1/jobs` 模型檔大小上限500MB可覆寫|
| `MULTIPART_REF_IMAGES_MAX_COUNT` | | `100` | `POST /api/v1/jobs` ref_images 數量上限 |
| `PROMOTE_TIMEOUT_MS` | | `300000` | promote 單檔 timeout |
**Secret 管理**`KNERON_CONVERTER_CLIENT_SECRET` 禁止進 Gitdev `.env`prod 建議由 Docker secrets / K8s secrets 注入
---
## 10. 向後相容與遷移
### 10.1 既有路徑行為(不變)
| 路徑 | Phase 1 行為 |
|------|------------|
| `POST /jobs` (multipart) | **不變**繼續接收 Web UI 上傳 |
| `GET /jobs/:id` | **不變**`origin=web` job 不過濾`origin=api` job 也看得到內部 vhost 無授權看不到差別|
| `GET /jobs/:id/events` (SSE) | **不變**Web UI 繼續用 |
| `GET /jobs/:id/download/:filename` | **不變**Web UI 下載結果 |
| `GET /jobs` | **不變**列全部 |
| `GET /health`, `GET /queues/stats` | **不變** |
### 10.2 Web UI 何時遷移
**非本次範圍**未來若決定把 Web UI 也納入 OAuth屬於獨立的 L 級任務需要設計 Member Center 登入流程token refresh UX 細節
### 10.3 `STORAGE_BACKEND=local` 模式
既有 local 模式Shared Volume保留運作 API 要求 `STORAGE_BACKEND=minio`因為
- multipart 收到的 buffer 要寫到某個 bucket Worker 讀取
- Shared Volume 路徑跨 container 複雜未來跨主機部署也不適合
**實作檢查**`POST /api/v1/jobs` 啟動時檢查 `STORAGE_BACKEND === 'minio'`若非則 500 `misconfiguration`
---
## 11. 測試策略
### 11.1 Unit testJest / Mocha
- `auth/jwks.js`mock JWKS 回應測過期簽章錯aud scope 不足
- `auth/oauthClient.js`mock token endpoint cache 命中過期重取失敗處理
- `fileAccessAgent/client.js`mock fetch PUT 5xx 重試401 invalidate 重試timeout
- `services/jobService.js` claim_active_job 的並發模擬兩個 user_id 相同同時建 job
- `routes/v1/jobs.js` multipart validationmock `multer`測超過 500MB `model`model_id 非數字platform 不在 enumuser_id `/`
- Response schema 映射內部 status 對外 status + stage
### 11.2 Integration test
- **Member Center mock** `wiremock` 或手寫 Express mock 模擬 JWKS + token endpoint
- **File Access Agent mock**模擬 PUT 的成功 / 失敗回應promote
- **Redis**用真 Redisdocker-compose test 環境
- **multipart 上傳** `supertest` + `attach('model', buffer, 'model.onnx')` 測試真實 multipart 流程小檔中檔邊界檔 499MB / 501MB
### 11.3 E2E test黑箱
- 需真 Member Center + File Access Agent 測試環境Phase 1 kickoff 前準備
- 測試案例
1. 完整流程multipart 上傳 polling promote 成功
2. 409 測試 user 連續建 job
3. 權限測試invalid token / scope / aud
4. 錯誤路徑上傳超過 500MB 413 `model` file 400promote File Access Agent 500 502
5. 多檔案大小測試小檔1MB)、中檔50MB)、大檔200MB500MB分別驗證 p95
### 11.4 負載測試
- `POST /api/v1/jobs` 不需高 QPS實際使用量一個 user 分鐘級但需驗證大檔 multipart 不會 OOM測試 10 user 同時上傳 200MB
- `GET /api/v1/jobs/:id` 是熱點polling測每秒 100 req per Scheduler instance
- p95 < 200ms 驗證GETp95 < 5s / 12s 驗證POST 200MB / 500MB
---
## 12. 實作任務拆分(按 Autoflow 增量式開發規範)
每個任務 = 一個可獨立 review 的單位Reviewer 會逐個審查
| # | 任務 | 依賴 | 可並行 | 預估 | 驗收標準 |
|---|------|------|---------|------|---------|
| T1 | auth middleware + JWKS 驗證 | | | 3d | unit test 全過能在空 route 上驗 mock token |
| T2 | Converter OAuth clientclient_credentials + cache| | T1 平行 | 2d | unit test 能對 mock token endpoint 取到並 cache |
| T3 | `/api/v1/*` 路由骨架 + 錯誤格式統一 + request_id middleware | T1 | | 2d | 所有新端點可通 501 是正常路徑 |
| T4 | POST /api/v1/jobsmulter 接收 multipart MinIOactive job enqueue| T1, T3 | | 3d | 能建 job409 正常413 正常回滾正常大檔不 OOM |
| T5 | GET /api/v1/jobs + GET /api/v1/jobs/:id ETagclient 隔離user 索引| T1, T3, T4 | T6 平行 | 3d | Recovery 查詢正確ETag 304 可用 |
| T6 | POST /api/v1/jobs/:id/promote stream PUT重試FAA client| T1, T2, T3 | T5 平行 | 4d | 促進成功冪等失敗可重試 |
| T7 | 部署分流Nginx vhost 設定 + docker-compose 更新| | T1-T6 平行 | 1d | 內網可達 `/jobs`公網只可達 `/api/v1/*` |
| T8 | OpenAPI 3.0 spec手寫+ 錯誤碼完整文件 | T3-T6 | | 2d | spec lint visionA-backend 能直接 import |
**預估總工時**3-4 人週單人序列執行 2 人並行可壓到 2 對齊 PRD RICE Effort=4 的估算較原估算略減因為 T4 不再需要實作 FAA GET / HEAD 分支)。
**外部依賴觸發**
- T1 需要 Member Center JWKS URL可用 mock
- T6 需要 File Access Agent 測試環境 mock PUT endpoint
- T7 需要使用者確認部署拓撲
---
## 13. 未解決 / 待確認事項TBD
| # | 項目 | 影響 | 待誰確認 |
|---|------|------|---------|
| TBD-1 | Member Center `tenant_id` claim 是否出現在 client_credentials token | T1 設定 | Member Center owner |
| TBD-2 | `kneron_converter_api` audience / `kneron_converter` client / scope 的最終命名 | T1, T2 | Member Center owner |
| TBD-3 | File Access Agent base URL測試環境prod 環境 tenant_id | T6 | File Access Agent owner |
| TBD-4 | Rate limit 的實際值300 req / 5min 是估算需觀測後校準| 上線後調整 | 觀測資料 |
| TBD-5 | Nginx vhost 的具體 IP / hostname依部署拓撲| T7 | 使用者 / DevOps |
| TBD-6 | stage_progress 的顆粒度Worker 是否有能力回報 stage %| P2 feature | Worker 開發團隊 |
---
## 14. 附錄Error code 完整表
| Code | HTTP | 說明 |
|------|------|------|
| `validation_error` | 400 | 欄位格式錯誤multipart field 缺漏model_id 非數字platform 不在 enum |
| `invalid_multipart` | 400 | multipart parse 失敗缺必要 file副檔名不符 |
| `invalid_token` | 401 | JWT 無效 / 簽章錯 / claim |
| `token_expired` | 401 | JWT 過期 |
| `insufficient_scope` | 403 | scope 不足 |
| `tenant_mismatch` | 403 | tenant_id 不符 |
| `job_not_found` | 404 | job 不存在或不屬於 client避免資訊洩露|
| `not_found` | 404 | 路徑不存在 |
| `user_has_active_job` | 409 | user 已有 in-progress job |
| `job_not_ready_for_promote` | 409 | promote job completed |
| `source_not_available` | 409 | promote source stage 沒產出 |
| `file_too_large` | 413 | multipart 上傳超過 500MB multer `LIMIT_FILE_SIZE` 觸發|
| `invalid_object_key` | 422 | target_object_key 格式不合法 |
| `misconfiguration` | 500 | 伺服器設定錯誤STORAGE_BACKEND |
| `storage_unavailable` | 502 | MinIO 寫入失敗`POST /api/v1/jobs` input |
| `internal_error` | 500 | 其他未分類 |
| `not_implemented` | 501 | Phase 2 功能 |
| `file_gateway_unavailable` | 502 | File Access Agent 失敗 promote 使用|
| `auth_service_unavailable` | 503 | Member Center token 失敗 promote 使用|
| `service_unavailable` | 503 | 其他依賴失敗 |
---
## 15. 附錄:請求 / 回應速查
### 建 jobmultipart
```bash
curl -X POST https://converter.innovedus.com/api/v1/jobs \
-H "Authorization: Bearer $TOKEN" \
-F "model=@./model.onnx" \
-F "user_id=u-12345" \
-F "model_id=1001" \
-F "version=0001" \
-F "platform=520" \
-F "enable_evaluate=false" \
-F "enable_sim_fp=false" \
-F "enable_sim_fixed=false" \
-F "enable_sim_hw=false"
```
**含參考圖片**可重複 `-F "ref_images[]=@..."`
```bash
curl -X POST https://converter.innovedus.com/api/v1/jobs \
-H "Authorization: Bearer $TOKEN" \
-F "model=@./model.onnx" \
-F "ref_images[]=@./img_0.jpg" \
-F "ref_images[]=@./img_1.jpg" \
-F "user_id=u-12345" \
-F "model_id=1001" \
-F "version=0001" \
-F "platform=520"
```
### 查 job
```bash
curl -H "Authorization: Bearer $TOKEN" \
https://converter.innovedus.com/api/v1/jobs/550e8400-...
```
### Recovery
```bash
curl -H "Authorization: Bearer $TOKEN" \
'https://converter.innovedus.com/api/v1/jobs?user_id=u-12345&status=in_progress'
```
### Promote
```bash
curl -X POST https://converter.innovedus.com/api/v1/jobs/550e8400-.../promote \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"targets": [
{"source": "nef", "target_object_key": "visionA/models/u-12345/m-1001/v0001/out.nef"}
]
}'
```
---
## 16. 變更記錄
| 日期 | 版本 | 變更 | 作者 |
|------|------|------|------|
| 2026-04-25 | Draft 1.0 | 初版Phase 1 完整規格 | Architect Agent |
| 2026-04-25 | Draft 1.1 | POST /api/v1/jobs multipart/form-data移除 FAA getFile/headFile 實作`files:download.read`/`files:metadata.read` scope`input_object_key` 欄位`input_not_found` error code新增 `invalid_multipart`/`file_too_large`/`storage_unavailable` error codesTBD-1 刪除TBD 重新編號;§2.5 File Access Agent client 僅保留 putFile;§2.8 POST jobs 流程改為 multer 接收 MinIO;§6 FAA 整合精簡為僅 PUT | Architect Agent |
---
**注意** TDD 1390 已超過拆分門檻甚多本次更新聚焦內容修正暫不拆分下輪更新強烈建議拆分為
- `TDD.md`索引
- `TDD-api.md`(§1、§14、§15
- `TDD-backend.md`(§2、§3、§4
- `TDD-integration.md`(§5、§6
- `TDD-infra.md`(§7、§9
- `TDD-testing.md`(§11