Initial commit: Windmill Git Sync service

Add containerized service for syncing Windmill workspaces to Git repositories.

Features:
- Flask webhook server for triggering syncs from Windmill
- wmill CLI integration for pulling workspace content
- Automated Git commits and push to remote repository
- Network-isolated (only accessible within Docker network)
- Designed to integrate with existing Windmill docker-compose files

Key components:
- Docker container with Python 3.11, wmill CLI, Git, and Flask
- Sync engine with error handling and logging
- External volume support for persistent workspace data
- Comprehensive documentation (README.md and CLAUDE.md)
This commit is contained in:
2025-11-08 18:40:26 -05:00
commit c838fa568c
10 changed files with 626 additions and 0 deletions

14
.env.example Normal file
View File

@@ -0,0 +1,14 @@
# Windmill Configuration
WINDMILL_BASE_URL=http://windmill_server:8000
WINDMILL_TOKEN=your-windmill-token-here
WINDMILL_WORKSPACE=home
# Workspace Volume (external Docker volume name)
WORKSPACE_VOLUME=windmill-workspace-data
# Git Configuration
GIT_REMOTE_URL=https://github.com/username/repo.git
GIT_TOKEN=your-github-pat-here
GIT_BRANCH=main
GIT_USER_NAME=Windmill Git Sync
GIT_USER_EMAIL=windmill@example.com

37
.gitignore vendored Normal file
View File

@@ -0,0 +1,37 @@
# Environment variables
.env
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# IDEs
.vscode/
.idea/
*.swp
*.swo
*~
# Logs
*.log
# Docker
.dockerignore

165
CLAUDE.md Normal file
View File

@@ -0,0 +1,165 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a containerized service for synchronizing Windmill workspaces to Git repositories. The service provides a Flask webhook server that Windmill can call to trigger automated backups of workspace content to a remote Git repository.
### Architecture
The system consists of three main components:
1. **Flask Web Server** (`app/server.py`): Lightweight HTTP server that exposes webhook endpoints for triggering syncs and health checks. Only accessible within the Docker network (not exposed to host).
2. **Sync Engine** (`app/sync.py`): Core logic that orchestrates the sync process:
- Pulls workspace content from Windmill using the `wmill` CLI
- Manages Git repository state (init on first run, subsequent updates)
- Commits changes and pushes to remote Git repository with PAT authentication
- Handles error cases and provides detailed logging
3. **Docker Container**: Bundles Python 3.11, wmill CLI, Git, and the Flask application. Uses volume mounts for persistent workspace storage.
### Key Design Decisions
- **Integrated with Windmill docker-compose**: This service is designed to be added as an additional service in your existing Windmill docker-compose file. It shares the same Docker network and can reference Windmill services directly (e.g., `windmill_server`).
- **Network isolation**: Service uses `expose` instead of `ports` - accessible only within Docker network, not from host machine. No authentication needed since it's isolated.
- **Webhook-only triggering**: Sync happens only when explicitly triggered via HTTP POST to `/sync`. This gives Windmill full control over backup timing via scheduled flows.
- **HTTPS + Personal Access Token**: Git authentication uses PAT injected into HTTPS URL (format: `https://TOKEN@github.com/user/repo.git`). No SSH key management required.
- **Stateless operation**: Each sync is independent. The container can be restarted without losing state (workspace data persists in Docker volume).
- **Single workspace focus**: Designed to sync one Windmill workspace per container instance. For multiple workspaces, run multiple containers with different configurations.
## Common Development Commands
### Build and Run
```bash
# Build the Docker image
docker-compose build
# Start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
```
### Testing
```bash
# Test the sync manually (from inside container)
docker-compose exec windmill-git-sync python app/sync.py
# Test webhook endpoint (from another container in the network)
docker-compose exec windmill_server curl -X POST http://windmill-git-sync:8080/sync
# Health check (from another container in the network)
docker-compose exec windmill_server curl http://windmill-git-sync:8080/health
```
### Development Workflow
```bash
# Edit code locally, rebuild and restart
docker-compose down
docker-compose up -d --build
# View live logs during testing
docker-compose logs -f windmill-git-sync
# Access container shell for debugging
docker-compose exec windmill-git-sync /bin/bash
# Inspect workspace directory
docker-compose exec windmill-git-sync ls -la /workspace
```
## Environment Configuration
All configuration is done via `.env` file (copy from `.env.example`). Required variables:
- `WINDMILL_TOKEN`: API token from Windmill for workspace access
- `WORKSPACE_VOLUME`: External Docker volume name for persistent workspace storage (default: `windmill-workspace-data`)
- `GIT_REMOTE_URL`: HTTPS URL of Git repository (e.g., `https://github.com/user/repo.git`)
- `GIT_TOKEN`: Personal Access Token with repo write permissions
### Docker Compose Integration
The `docker-compose.yml` file contains a service definition meant to be **added to your existing Windmill docker-compose file**, not run standalone. The service:
- Does not declare its own network (uses the implicit network from the parent compose file)
- Assumes a Windmill service named `windmill_server` exists in the same compose file
- Uses `depends_on: windmill_server` to ensure proper startup order
- Requires an external Docker volume specified in `WORKSPACE_VOLUME` env var (created via `docker volume create windmill-workspace-data`)
## Code Structure
```
app/
├── server.py # Flask application with /health and /sync endpoints
└── sync.py # Core sync logic (wmill pull → git commit → push)
```
### Important Functions
- `sync.sync_windmill_to_git()`: Main entry point for sync operation. Returns dict with `success` bool and `message` string.
- `sync.validate_config()`: Checks required env vars are set. Raises ValueError if missing.
- `sync.run_wmill_sync()`: Executes `wmill sync pull` command with proper environment variables.
- `sync.commit_and_push_changes()`: Stages all changes, commits with automated message, and pushes to remote.
### Error Handling
The sync engine uses a try/except pattern that always returns a result dict, never raises to the web server. This ensures webhook requests always get a proper HTTP response with error details in JSON.
## Git Workflow
When making changes to this codebase:
1. Changes are tracked in the project's own Git repository (not the Windmill workspace backup repo)
2. The service manages commits to the **remote backup repository** specified in `GIT_REMOTE_URL`
3. Commits to the backup repo use the automated format: "Automated Windmill workspace backup - {workspace_name}"
## Network Architecture
This service is designed to be added to your existing Windmill docker-compose file. When added, all services share the same Docker Compose network automatically.
Expected service topology within the same docker-compose file:
```
Services in docker-compose.yml:
├── windmill_server (Windmill API server on port 8000)
├── windmill_worker (Windmill workers)
├── postgres (Database)
└── windmill-git-sync (this service on port 8080)
```
The service references `windmill_server` via `WINDMILL_BASE_URL=http://windmill_server:8000`. If your Windmill server service has a different name, update `WINDMILL_BASE_URL` in `.env`.
## Extending the Service
### Adding Scheduled Syncs
To add cron-based scheduling in addition to webhooks:
1. Install `APScheduler` in `requirements.txt`
2. Add scheduler initialization in `server.py`
3. Update configuration to support `SYNC_SCHEDULE` env var (e.g., `0 */6 * * *` for every 6 hours)
### Adding Slack/Discord Notifications
To notify on sync completion:
1. Add `slack-sdk` or `discord-webhook` to `requirements.txt`
2. Add notification function in `sync.py`
3. Call notification function in `sync_windmill_to_git()` after successful push
4. Add webhook URL as env var in `.env` and `docker-compose.yml`
### Supporting SSH Authentication
To support SSH keys instead of PAT:
1. Update `docker-compose.yml` to mount SSH key: `~/.ssh/id_rsa:/root/.ssh/id_rsa:ro`
2. Add logic in `sync.get_authenticated_url()` to detect SSH vs HTTPS URLs
3. Configure Git to use SSH: `git config core.sshCommand "ssh -i /root/.ssh/id_rsa"`

29
Dockerfile Normal file
View File

@@ -0,0 +1,29 @@
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install wmill CLI
RUN curl -L https://github.com/windmill-labs/windmill/releases/latest/download/wmill-linux-amd64 -o /usr/local/bin/wmill \
&& chmod +x /usr/local/bin/wmill
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY app/ ./app/
# Create workspace directory
RUN mkdir -p /workspace
# Expose port for webhook server
EXPOSE 8080
# Run the Flask server
CMD ["python", "-u", "app/server.py"]

91
README.md Normal file
View File

@@ -0,0 +1,91 @@
# Windmill Git Sync
A containerized service for syncing Windmill workspaces to Git repositories via webhook triggers.
## Overview
This service provides automated backup of Windmill workspaces to Git. It runs a lightweight Flask web server that responds to webhook requests from Windmill, syncing the workspace content using the `wmill` CLI and pushing changes to a remote Git repository.
## Features
- **Webhook-triggered sync**: Windmill can trigger backups via HTTP POST requests
- **Dockerized**: Runs as a container in the same network as Windmill
- **Git integration**: Automatic commits and pushes to remote repository
- **Authentication**: Supports Personal Access Token (PAT) authentication for Git
- **Health checks**: Built-in health endpoint for monitoring
## Quick Start
This service is designed to be added to your existing Windmill docker-compose file.
1. Copy the example environment file:
```bash
cp .env.example .env
```
2. Edit `.env` with your configuration:
- Set `WINDMILL_TOKEN` to your Windmill API token
- Set `GIT_REMOTE_URL` to your Git repository URL
- Set `GIT_TOKEN` to your Git Personal Access Token
- Set `WORKSPACE_VOLUME` to an external Docker volume name
3. Create the external volume:
```bash
docker volume create windmill-workspace-data
```
4. Add the `windmill-git-sync` service block from `docker-compose.yml` to your existing Windmill docker-compose file.
5. Build and start the service:
```bash
docker-compose up -d windmill-git-sync
```
6. Trigger a sync from Windmill (see Integration section below) or test from another container:
```bash
docker-compose exec windmill_server curl -X POST http://windmill-git-sync:8080/sync
```
## Configuration
All configuration is done via environment variables in `.env`:
| Variable | Required | Description |
|----------|----------|-------------|
| `WINDMILL_BASE_URL` | Yes | URL of Windmill instance (e.g., `http://windmill:8000`) |
| `WINDMILL_TOKEN` | Yes | Windmill API token for authentication |
| `WINDMILL_WORKSPACE` | No | Workspace name (default: `default`) |
| `WORKSPACE_VOLUME` | Yes | External Docker volume name for workspace data |
| `GIT_REMOTE_URL` | Yes | HTTPS Git repository URL |
| `GIT_TOKEN` | Yes | Git Personal Access Token |
| `GIT_BRANCH` | No | Branch to push to (default: `main`) |
| `GIT_USER_NAME` | No | Git commit author name |
| `GIT_USER_EMAIL` | No | Git commit author email |
## API Endpoints
This service is only accessible within the Docker network (not exposed to the host).
- `GET /health` - Health check endpoint
- `POST /sync` - Trigger a workspace sync to Git
## Integration with Windmill
Create a scheduled flow or script in Windmill to trigger backups:
```typescript
export async function main() {
const response = await fetch('http://windmill-git-sync:8080/sync', {
method: 'POST'
});
return await response.json();
}
```
## Development
See [CLAUDE.md](CLAUDE.md) for development instructions and architecture details.
## License
MIT

54
app/server.py Normal file
View File

@@ -0,0 +1,54 @@
#!/usr/bin/env python3
"""
Flask server for receiving webhook triggers from Windmill to sync workspace to Git.
Internal service - not exposed outside Docker network.
"""
import logging
from flask import Flask, jsonify
from sync import sync_windmill_to_git
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
app = Flask(__name__)
@app.route('/health', methods=['GET'])
def health():
"""Health check endpoint."""
return jsonify({'status': 'healthy'}), 200
@app.route('/sync', methods=['POST'])
def trigger_sync():
"""
Trigger a sync from Windmill workspace to Git repository.
This endpoint is only accessible within the Docker network.
"""
logger.info("Sync triggered via webhook")
try:
result = sync_windmill_to_git()
if result['success']:
logger.info(f"Sync completed successfully: {result['message']}")
return jsonify(result), 200
else:
logger.error(f"Sync failed: {result['message']}")
return jsonify(result), 500
except Exception as e:
logger.exception("Unexpected error during sync")
return jsonify({
'success': False,
'message': f'Sync failed with error: {str(e)}'
}), 500
if __name__ == '__main__':
logger.info("Starting Windmill Git Sync server on port 8080")
app.run(host='0.0.0.0', port=8080, debug=False)

176
app/sync.py Normal file
View File

@@ -0,0 +1,176 @@
#!/usr/bin/env python3
"""
Core sync logic for pulling Windmill workspace and pushing to Git.
"""
import os
import subprocess
import logging
from pathlib import Path
from git import Repo, GitCommandError
logger = logging.getLogger(__name__)
# Configuration from environment variables
WORKSPACE_DIR = Path('/workspace')
WINDMILL_BASE_URL = os.getenv('WINDMILL_BASE_URL', 'http://windmill:8000')
WINDMILL_TOKEN = os.getenv('WINDMILL_TOKEN', '')
WINDMILL_WORKSPACE = os.getenv('WINDMILL_WORKSPACE', 'default')
GIT_REMOTE_URL = os.getenv('GIT_REMOTE_URL', '')
GIT_TOKEN = os.getenv('GIT_TOKEN', '')
GIT_BRANCH = os.getenv('GIT_BRANCH', 'main')
GIT_USER_NAME = os.getenv('GIT_USER_NAME', 'Windmill Git Sync')
GIT_USER_EMAIL = os.getenv('GIT_USER_EMAIL', 'windmill@example.com')
def validate_config():
"""Validate required configuration is present."""
missing = []
if not WINDMILL_TOKEN:
missing.append('WINDMILL_TOKEN')
if not GIT_REMOTE_URL:
missing.append('GIT_REMOTE_URL')
if not GIT_TOKEN:
missing.append('GIT_TOKEN')
if missing:
raise ValueError(f"Missing required environment variables: {', '.join(missing)}")
def get_authenticated_url(url: str, token: str) -> str:
"""Insert token into HTTPS Git URL for authentication."""
if url.startswith('https://'):
# Format: https://TOKEN@github.com/user/repo.git
return url.replace('https://', f'https://{token}@')
return url
def run_wmill_sync():
"""Run wmill sync to pull workspace from Windmill."""
logger.info(f"Syncing Windmill workspace '{WINDMILL_WORKSPACE}' from {WINDMILL_BASE_URL}")
env = os.environ.copy()
env['WM_BASE_URL'] = WINDMILL_BASE_URL
env['WM_TOKEN'] = WINDMILL_TOKEN
env['WM_WORKSPACE'] = WINDMILL_WORKSPACE
try:
# Run wmill sync in the workspace directory
result = subprocess.run(
['wmill', 'sync', 'pull', '--yes'],
cwd=WORKSPACE_DIR,
env=env,
capture_output=True,
text=True,
check=True
)
logger.info("Windmill sync completed successfully")
logger.debug(f"wmill output: {result.stdout}")
return True
except subprocess.CalledProcessError as e:
logger.error(f"wmill sync failed: {e.stderr}")
raise RuntimeError(f"Failed to sync from Windmill: {e.stderr}")
def init_or_update_git_repo():
"""Initialize Git repository or open existing one."""
git_dir = WORKSPACE_DIR / '.git'
if git_dir.exists():
logger.info("Opening existing Git repository")
repo = Repo(WORKSPACE_DIR)
else:
logger.info("Initializing new Git repository")
repo = Repo.init(WORKSPACE_DIR)
# Configure user
repo.config_writer().set_value("user", "name", GIT_USER_NAME).release()
repo.config_writer().set_value("user", "email", GIT_USER_EMAIL).release()
return repo
def commit_and_push_changes(repo: Repo):
"""Commit changes and push to remote Git repository."""
# Check if there are any changes
if not repo.is_dirty(untracked_files=True):
logger.info("No changes to commit")
return False
# Stage all changes
repo.git.add(A=True)
# Create commit
commit_message = f"Automated Windmill workspace backup - {WINDMILL_WORKSPACE}"
repo.index.commit(commit_message)
logger.info(f"Created commit: {commit_message}")
# Configure remote with authentication
authenticated_url = get_authenticated_url(GIT_REMOTE_URL, GIT_TOKEN)
try:
# Check if remote exists
if 'origin' in [remote.name for remote in repo.remotes]:
origin = repo.remote('origin')
origin.set_url(authenticated_url)
else:
origin = repo.create_remote('origin', authenticated_url)
# Push to remote
logger.info(f"Pushing to {GIT_REMOTE_URL} (branch: {GIT_BRANCH})")
origin.push(refspec=f'HEAD:{GIT_BRANCH}', force=False)
logger.info("Push completed successfully")
return True
except GitCommandError as e:
logger.error(f"Git push failed: {str(e)}")
raise RuntimeError(f"Failed to push to Git remote: {str(e)}")
def sync_windmill_to_git():
"""
Main sync function: pulls from Windmill, commits, and pushes to Git.
Returns:
dict: Result with 'success' boolean and 'message' string
"""
try:
# Validate configuration
validate_config()
# Pull from Windmill
run_wmill_sync()
# Initialize/update Git repo
repo = init_or_update_git_repo()
# Commit and push changes
has_changes = commit_and_push_changes(repo)
if has_changes:
message = f"Successfully synced workspace '{WINDMILL_WORKSPACE}' to Git"
else:
message = "Sync completed - no changes to commit"
return {
'success': True,
'message': message
}
except Exception as e:
logger.exception("Sync failed")
return {
'success': False,
'message': str(e)
}
if __name__ == '__main__':
# Allow running sync directly for testing
logging.basicConfig(level=logging.INFO)
result = sync_windmill_to_git()
print(result)

25
docker-compose.yml Normal file
View File

@@ -0,0 +1,25 @@
services:
# ... existing Windmill services (windmill_server, windmill_worker, postgres, etc.) ...
windmill-git-sync:
build: .
container_name: windmill-git-sync
expose:
- "8080"
volumes:
- ${WORKSPACE_VOLUME}:/workspace
environment:
# Windmill connection
- WINDMILL_BASE_URL=http://windmill_server:8000
- WINDMILL_TOKEN=${WINDMILL_TOKEN}
- WINDMILL_WORKSPACE=${WINDMILL_WORKSPACE:-default}
# Git configuration
- GIT_REMOTE_URL=${GIT_REMOTE_URL}
- GIT_TOKEN=${GIT_TOKEN}
- GIT_BRANCH=${GIT_BRANCH:-main}
- GIT_USER_NAME=${GIT_USER_NAME:-Windmill Git Sync}
- GIT_USER_EMAIL=${GIT_USER_EMAIL:-windmill@example.com}
restart: unless-stopped
depends_on:
- windmill_server

4
requirements.txt Normal file
View File

@@ -0,0 +1,4 @@
Flask==3.0.0
GitPython==3.1.40
requests==2.31.0
python-dotenv==1.0.0

31
setup.sh Executable file
View File

@@ -0,0 +1,31 @@
#!/bin/bash
# Setup script for windmill-git-sync
set -e
echo "Setting up Windmill Git Sync..."
# Create .env file if it doesn't exist
if [ ! -f .env ]; then
echo "Creating .env file from template..."
cp .env.example .env
echo "⚠️ Please edit .env with your configuration"
else
echo "✓ .env file already exists"
fi
# Create Docker volume if it doesn't exist
if ! docker volume inspect windmill-workspace-data >/dev/null 2>&1; then
echo "Creating windmill-workspace-data Docker volume..."
docker volume create windmill-workspace-data
echo "✓ Volume created"
else
echo "✓ windmill-workspace-data already exists"
fi
echo ""
echo "Setup complete! Next steps:"
echo "1. Edit .env with your Windmill and Git configuration"
echo "2. Add the windmill-git-sync service block from docker-compose.yml to your Windmill docker-compose file"
echo "3. Run: docker-compose up -d windmill-git-sync"
echo "4. Test from within Docker network: docker-compose exec windmill_server curl -X POST http://windmill-git-sync:8080/sync"