Add FFmpeg worker design document
This commit is contained in:
152
docs/plans/2025-11-30-ffmpeg-worker-design.md
Normal file
152
docs/plans/2025-11-30-ffmpeg-worker-design.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# FFmpeg Worker Design
|
||||
|
||||
## Overview
|
||||
|
||||
A dockerized worker capable of running arbitrary FFmpeg commands, exposing API endpoints to create jobs and check job status. Files are provided and returned via a mounted volume using relative paths.
|
||||
|
||||
## Technology Stack
|
||||
|
||||
- **Language:** Python 3.14
|
||||
- **Framework:** FastAPI
|
||||
- **Task Queue:** In-memory (asyncio.Queue)
|
||||
- **Container:** Docker with FFmpeg installed
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
- **FastAPI application** - Exposes REST endpoints for job management
|
||||
- **Job Queue** - In-memory queue (asyncio.Queue) holding pending jobs
|
||||
- **Worker Loop** - Background async task that processes jobs one at a time
|
||||
- **Job Store** - In-memory dict mapping job IDs to job state
|
||||
|
||||
### File Handling
|
||||
|
||||
- A single volume mounted at `/data` (configurable via env var)
|
||||
- API requests specify relative paths like `input/video.mp4`
|
||||
- Worker resolves to absolute paths: `/data/input/video.mp4`
|
||||
- Output files written to the same volume, returned as relative paths
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
ffmpeg-worker/
|
||||
├── app/
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # FastAPI app, endpoints
|
||||
│ ├── models.py # Pydantic models
|
||||
│ ├── queue.py # Job queue and worker loop
|
||||
│ └── ffmpeg.py # FFmpeg execution and progress parsing
|
||||
├── Dockerfile
|
||||
├── requirements.txt
|
||||
└── docker-compose.yml
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### POST /jobs - Create a new job
|
||||
|
||||
```json
|
||||
Request:
|
||||
{
|
||||
"command": "-i input/source.mp4 -c:v libx264 -crf 23 output/result.mp4"
|
||||
}
|
||||
|
||||
Response (201 Created):
|
||||
{
|
||||
"id": "job_abc123",
|
||||
"status": "queued",
|
||||
"created_at": "2025-11-30T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### GET /jobs/{id} - Get job status
|
||||
|
||||
```json
|
||||
Response:
|
||||
{
|
||||
"id": "job_abc123",
|
||||
"status": "running",
|
||||
"command": "-i input/source.mp4 ...",
|
||||
"created_at": "2025-11-30T10:30:00Z",
|
||||
"started_at": "2025-11-30T10:30:05Z",
|
||||
"completed_at": null,
|
||||
"progress": {
|
||||
"frame": 1234,
|
||||
"fps": 30.2,
|
||||
"time": "00:01:23.45",
|
||||
"bitrate": "1250kbits/s",
|
||||
"percent": 45.2
|
||||
},
|
||||
"output_files": [],
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
|
||||
### GET /jobs - List all jobs
|
||||
|
||||
Optional query parameter: `?status=running`
|
||||
|
||||
### GET /health - Health check
|
||||
|
||||
Returns 200 OK for container orchestration.
|
||||
|
||||
## Job States
|
||||
|
||||
```
|
||||
queued → running → completed
|
||||
↘ failed
|
||||
```
|
||||
|
||||
## FFmpeg Execution
|
||||
|
||||
### Command Execution
|
||||
|
||||
- Parse the raw command string to extract input/output file paths
|
||||
- Resolve relative paths against the `/data` mount point
|
||||
- Run FFmpeg as async subprocess via `asyncio.create_subprocess_exec`
|
||||
- Add `-progress pipe:1` flag to get machine-readable progress output
|
||||
|
||||
### Progress Parsing
|
||||
|
||||
FFmpeg with `-progress pipe:1` outputs key=value pairs:
|
||||
|
||||
```
|
||||
frame=1234
|
||||
fps=30.24
|
||||
total_size=5678900
|
||||
out_time_ms=83450000
|
||||
bitrate=1250.5kbits/s
|
||||
progress=continue
|
||||
```
|
||||
|
||||
For percent complete, probe the input file with `ffprobe -show_entries format=duration` before starting.
|
||||
|
||||
### Output File Detection
|
||||
|
||||
- Parse command for output path (last non-flag argument)
|
||||
- Verify file exists after completion
|
||||
- Return as relative path in response
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Non-zero exit code → job marked `failed`, stderr captured in `error` field
|
||||
- Missing input file → immediate failure with clear message
|
||||
- FFmpeg timeout (configurable, default 1 hour) → kill process, mark failed
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables:
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `DATA_PATH` | Mount point for media files | `/data` |
|
||||
| `FFMPEG_TIMEOUT` | Max job duration in seconds | `3600` |
|
||||
| `HOST` | Server bind address | `0.0.0.0` |
|
||||
| `PORT` | Server port | `8000` |
|
||||
|
||||
## Memory Management
|
||||
|
||||
- Jobs stay in memory indefinitely (simple approach)
|
||||
- Optional `DELETE /jobs/{id}` endpoint for cleanup
|
||||
- Future: auto-purge completed/failed jobs older than N hours
|
||||
Reference in New Issue
Block a user