commit c838fa568c7e021e6c9d59537cde07a38609626c Author: Bill Date: Sat Nov 8 18:40:26 2025 -0500 Initial commit: Windmill Git Sync service Add containerized service for syncing Windmill workspaces to Git repositories. Features: - Flask webhook server for triggering syncs from Windmill - wmill CLI integration for pulling workspace content - Automated Git commits and push to remote repository - Network-isolated (only accessible within Docker network) - Designed to integrate with existing Windmill docker-compose files Key components: - Docker container with Python 3.11, wmill CLI, Git, and Flask - Sync engine with error handling and logging - External volume support for persistent workspace data - Comprehensive documentation (README.md and CLAUDE.md) diff --git a/.env.example b/.env.example new file mode 100644 index 0000000..b2bba19 --- /dev/null +++ b/.env.example @@ -0,0 +1,14 @@ +# Windmill Configuration +WINDMILL_BASE_URL=http://windmill_server:8000 +WINDMILL_TOKEN=your-windmill-token-here +WINDMILL_WORKSPACE=home + +# Workspace Volume (external Docker volume name) +WORKSPACE_VOLUME=windmill-workspace-data + +# Git Configuration +GIT_REMOTE_URL=https://github.com/username/repo.git +GIT_TOKEN=your-github-pat-here +GIT_BRANCH=main +GIT_USER_NAME=Windmill Git Sync +GIT_USER_EMAIL=windmill@example.com diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..f76645b --- /dev/null +++ b/.gitignore @@ -0,0 +1,37 @@ +# Environment variables +.env + +# Python +__pycache__/ +*.py[cod] +*$py.class +*.so +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +*.egg-info/ +.installed.cfg +*.egg + +# IDEs +.vscode/ +.idea/ +*.swp +*.swo +*~ + +# Logs +*.log + +# Docker +.dockerignore diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..bceb5bf --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,165 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This is a containerized service for synchronizing Windmill workspaces to Git repositories. The service provides a Flask webhook server that Windmill can call to trigger automated backups of workspace content to a remote Git repository. + +### Architecture + +The system consists of three main components: + +1. **Flask Web Server** (`app/server.py`): Lightweight HTTP server that exposes webhook endpoints for triggering syncs and health checks. Only accessible within the Docker network (not exposed to host). + +2. **Sync Engine** (`app/sync.py`): Core logic that orchestrates the sync process: + - Pulls workspace content from Windmill using the `wmill` CLI + - Manages Git repository state (init on first run, subsequent updates) + - Commits changes and pushes to remote Git repository with PAT authentication + - Handles error cases and provides detailed logging + +3. **Docker Container**: Bundles Python 3.11, wmill CLI, Git, and the Flask application. Uses volume mounts for persistent workspace storage. + +### Key Design Decisions + +- **Integrated with Windmill docker-compose**: This service is designed to be added as an additional service in your existing Windmill docker-compose file. It shares the same Docker network and can reference Windmill services directly (e.g., `windmill_server`). +- **Network isolation**: Service uses `expose` instead of `ports` - accessible only within Docker network, not from host machine. No authentication needed since it's isolated. +- **Webhook-only triggering**: Sync happens only when explicitly triggered via HTTP POST to `/sync`. This gives Windmill full control over backup timing via scheduled flows. +- **HTTPS + Personal Access Token**: Git authentication uses PAT injected into HTTPS URL (format: `https://TOKEN@github.com/user/repo.git`). No SSH key management required. +- **Stateless operation**: Each sync is independent. The container can be restarted without losing state (workspace data persists in Docker volume). +- **Single workspace focus**: Designed to sync one Windmill workspace per container instance. For multiple workspaces, run multiple containers with different configurations. + +## Common Development Commands + +### Build and Run + +```bash +# Build the Docker image +docker-compose build + +# Start the service +docker-compose up -d + +# View logs +docker-compose logs -f + +# Stop the service +docker-compose down +``` + +### Testing + +```bash +# Test the sync manually (from inside container) +docker-compose exec windmill-git-sync python app/sync.py + +# Test webhook endpoint (from another container in the network) +docker-compose exec windmill_server curl -X POST http://windmill-git-sync:8080/sync + +# Health check (from another container in the network) +docker-compose exec windmill_server curl http://windmill-git-sync:8080/health +``` + +### Development Workflow + +```bash +# Edit code locally, rebuild and restart +docker-compose down +docker-compose up -d --build + +# View live logs during testing +docker-compose logs -f windmill-git-sync + +# Access container shell for debugging +docker-compose exec windmill-git-sync /bin/bash + +# Inspect workspace directory +docker-compose exec windmill-git-sync ls -la /workspace +``` + +## Environment Configuration + +All configuration is done via `.env` file (copy from `.env.example`). Required variables: + +- `WINDMILL_TOKEN`: API token from Windmill for workspace access +- `WORKSPACE_VOLUME`: External Docker volume name for persistent workspace storage (default: `windmill-workspace-data`) +- `GIT_REMOTE_URL`: HTTPS URL of Git repository (e.g., `https://github.com/user/repo.git`) +- `GIT_TOKEN`: Personal Access Token with repo write permissions + +### Docker Compose Integration + +The `docker-compose.yml` file contains a service definition meant to be **added to your existing Windmill docker-compose file**, not run standalone. The service: +- Does not declare its own network (uses the implicit network from the parent compose file) +- Assumes a Windmill service named `windmill_server` exists in the same compose file +- Uses `depends_on: windmill_server` to ensure proper startup order +- Requires an external Docker volume specified in `WORKSPACE_VOLUME` env var (created via `docker volume create windmill-workspace-data`) + +## Code Structure + +``` +app/ +├── server.py # Flask application with /health and /sync endpoints +└── sync.py # Core sync logic (wmill pull → git commit → push) +``` + +### Important Functions + +- `sync.sync_windmill_to_git()`: Main entry point for sync operation. Returns dict with `success` bool and `message` string. +- `sync.validate_config()`: Checks required env vars are set. Raises ValueError if missing. +- `sync.run_wmill_sync()`: Executes `wmill sync pull` command with proper environment variables. +- `sync.commit_and_push_changes()`: Stages all changes, commits with automated message, and pushes to remote. + +### Error Handling + +The sync engine uses a try/except pattern that always returns a result dict, never raises to the web server. This ensures webhook requests always get a proper HTTP response with error details in JSON. + +## Git Workflow + +When making changes to this codebase: + +1. Changes are tracked in the project's own Git repository (not the Windmill workspace backup repo) +2. The service manages commits to the **remote backup repository** specified in `GIT_REMOTE_URL` +3. Commits to the backup repo use the automated format: "Automated Windmill workspace backup - {workspace_name}" + +## Network Architecture + +This service is designed to be added to your existing Windmill docker-compose file. When added, all services share the same Docker Compose network automatically. + +Expected service topology within the same docker-compose file: + +``` +Services in docker-compose.yml: +├── windmill_server (Windmill API server on port 8000) +├── windmill_worker (Windmill workers) +├── postgres (Database) +└── windmill-git-sync (this service on port 8080) +``` + +The service references `windmill_server` via `WINDMILL_BASE_URL=http://windmill_server:8000`. If your Windmill server service has a different name, update `WINDMILL_BASE_URL` in `.env`. + +## Extending the Service + +### Adding Scheduled Syncs + +To add cron-based scheduling in addition to webhooks: + +1. Install `APScheduler` in `requirements.txt` +2. Add scheduler initialization in `server.py` +3. Update configuration to support `SYNC_SCHEDULE` env var (e.g., `0 */6 * * *` for every 6 hours) + +### Adding Slack/Discord Notifications + +To notify on sync completion: + +1. Add `slack-sdk` or `discord-webhook` to `requirements.txt` +2. Add notification function in `sync.py` +3. Call notification function in `sync_windmill_to_git()` after successful push +4. Add webhook URL as env var in `.env` and `docker-compose.yml` + +### Supporting SSH Authentication + +To support SSH keys instead of PAT: + +1. Update `docker-compose.yml` to mount SSH key: `~/.ssh/id_rsa:/root/.ssh/id_rsa:ro` +2. Add logic in `sync.get_authenticated_url()` to detect SSH vs HTTPS URLs +3. Configure Git to use SSH: `git config core.sshCommand "ssh -i /root/.ssh/id_rsa"` diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..c2e99bd --- /dev/null +++ b/Dockerfile @@ -0,0 +1,29 @@ +FROM python:3.11-slim + +WORKDIR /app + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + git \ + curl \ + && rm -rf /var/lib/apt/lists/* + +# Install wmill CLI +RUN curl -L https://github.com/windmill-labs/windmill/releases/latest/download/wmill-linux-amd64 -o /usr/local/bin/wmill \ + && chmod +x /usr/local/bin/wmill + +# Copy requirements and install Python dependencies +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Copy application code +COPY app/ ./app/ + +# Create workspace directory +RUN mkdir -p /workspace + +# Expose port for webhook server +EXPOSE 8080 + +# Run the Flask server +CMD ["python", "-u", "app/server.py"] diff --git a/README.md b/README.md new file mode 100644 index 0000000..258d961 --- /dev/null +++ b/README.md @@ -0,0 +1,91 @@ +# Windmill Git Sync + +A containerized service for syncing Windmill workspaces to Git repositories via webhook triggers. + +## Overview + +This service provides automated backup of Windmill workspaces to Git. It runs a lightweight Flask web server that responds to webhook requests from Windmill, syncing the workspace content using the `wmill` CLI and pushing changes to a remote Git repository. + +## Features + +- **Webhook-triggered sync**: Windmill can trigger backups via HTTP POST requests +- **Dockerized**: Runs as a container in the same network as Windmill +- **Git integration**: Automatic commits and pushes to remote repository +- **Authentication**: Supports Personal Access Token (PAT) authentication for Git +- **Health checks**: Built-in health endpoint for monitoring + +## Quick Start + +This service is designed to be added to your existing Windmill docker-compose file. + +1. Copy the example environment file: + ```bash + cp .env.example .env + ``` + +2. Edit `.env` with your configuration: + - Set `WINDMILL_TOKEN` to your Windmill API token + - Set `GIT_REMOTE_URL` to your Git repository URL + - Set `GIT_TOKEN` to your Git Personal Access Token + - Set `WORKSPACE_VOLUME` to an external Docker volume name + +3. Create the external volume: + ```bash + docker volume create windmill-workspace-data + ``` + +4. Add the `windmill-git-sync` service block from `docker-compose.yml` to your existing Windmill docker-compose file. + +5. Build and start the service: + ```bash + docker-compose up -d windmill-git-sync + ``` + +6. Trigger a sync from Windmill (see Integration section below) or test from another container: + ```bash + docker-compose exec windmill_server curl -X POST http://windmill-git-sync:8080/sync + ``` + +## Configuration + +All configuration is done via environment variables in `.env`: + +| Variable | Required | Description | +|----------|----------|-------------| +| `WINDMILL_BASE_URL` | Yes | URL of Windmill instance (e.g., `http://windmill:8000`) | +| `WINDMILL_TOKEN` | Yes | Windmill API token for authentication | +| `WINDMILL_WORKSPACE` | No | Workspace name (default: `default`) | +| `WORKSPACE_VOLUME` | Yes | External Docker volume name for workspace data | +| `GIT_REMOTE_URL` | Yes | HTTPS Git repository URL | +| `GIT_TOKEN` | Yes | Git Personal Access Token | +| `GIT_BRANCH` | No | Branch to push to (default: `main`) | +| `GIT_USER_NAME` | No | Git commit author name | +| `GIT_USER_EMAIL` | No | Git commit author email | + +## API Endpoints + +This service is only accessible within the Docker network (not exposed to the host). + +- `GET /health` - Health check endpoint +- `POST /sync` - Trigger a workspace sync to Git + +## Integration with Windmill + +Create a scheduled flow or script in Windmill to trigger backups: + +```typescript +export async function main() { + const response = await fetch('http://windmill-git-sync:8080/sync', { + method: 'POST' + }); + return await response.json(); +} +``` + +## Development + +See [CLAUDE.md](CLAUDE.md) for development instructions and architecture details. + +## License + +MIT diff --git a/app/server.py b/app/server.py new file mode 100644 index 0000000..34447dc --- /dev/null +++ b/app/server.py @@ -0,0 +1,54 @@ +#!/usr/bin/env python3 +""" +Flask server for receiving webhook triggers from Windmill to sync workspace to Git. +Internal service - not exposed outside Docker network. +""" +import logging +from flask import Flask, jsonify +from sync import sync_windmill_to_git + +# Configure logging +logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' +) +logger = logging.getLogger(__name__) + +app = Flask(__name__) + + +@app.route('/health', methods=['GET']) +def health(): + """Health check endpoint.""" + return jsonify({'status': 'healthy'}), 200 + + +@app.route('/sync', methods=['POST']) +def trigger_sync(): + """ + Trigger a sync from Windmill workspace to Git repository. + This endpoint is only accessible within the Docker network. + """ + logger.info("Sync triggered via webhook") + + try: + result = sync_windmill_to_git() + + if result['success']: + logger.info(f"Sync completed successfully: {result['message']}") + return jsonify(result), 200 + else: + logger.error(f"Sync failed: {result['message']}") + return jsonify(result), 500 + + except Exception as e: + logger.exception("Unexpected error during sync") + return jsonify({ + 'success': False, + 'message': f'Sync failed with error: {str(e)}' + }), 500 + + +if __name__ == '__main__': + logger.info("Starting Windmill Git Sync server on port 8080") + app.run(host='0.0.0.0', port=8080, debug=False) diff --git a/app/sync.py b/app/sync.py new file mode 100644 index 0000000..20714af --- /dev/null +++ b/app/sync.py @@ -0,0 +1,176 @@ +#!/usr/bin/env python3 +""" +Core sync logic for pulling Windmill workspace and pushing to Git. +""" +import os +import subprocess +import logging +from pathlib import Path +from git import Repo, GitCommandError + +logger = logging.getLogger(__name__) + +# Configuration from environment variables +WORKSPACE_DIR = Path('/workspace') +WINDMILL_BASE_URL = os.getenv('WINDMILL_BASE_URL', 'http://windmill:8000') +WINDMILL_TOKEN = os.getenv('WINDMILL_TOKEN', '') +WINDMILL_WORKSPACE = os.getenv('WINDMILL_WORKSPACE', 'default') +GIT_REMOTE_URL = os.getenv('GIT_REMOTE_URL', '') +GIT_TOKEN = os.getenv('GIT_TOKEN', '') +GIT_BRANCH = os.getenv('GIT_BRANCH', 'main') +GIT_USER_NAME = os.getenv('GIT_USER_NAME', 'Windmill Git Sync') +GIT_USER_EMAIL = os.getenv('GIT_USER_EMAIL', 'windmill@example.com') + + +def validate_config(): + """Validate required configuration is present.""" + missing = [] + + if not WINDMILL_TOKEN: + missing.append('WINDMILL_TOKEN') + if not GIT_REMOTE_URL: + missing.append('GIT_REMOTE_URL') + if not GIT_TOKEN: + missing.append('GIT_TOKEN') + + if missing: + raise ValueError(f"Missing required environment variables: {', '.join(missing)}") + + +def get_authenticated_url(url: str, token: str) -> str: + """Insert token into HTTPS Git URL for authentication.""" + if url.startswith('https://'): + # Format: https://TOKEN@github.com/user/repo.git + return url.replace('https://', f'https://{token}@') + return url + + +def run_wmill_sync(): + """Run wmill sync to pull workspace from Windmill.""" + logger.info(f"Syncing Windmill workspace '{WINDMILL_WORKSPACE}' from {WINDMILL_BASE_URL}") + + env = os.environ.copy() + env['WM_BASE_URL'] = WINDMILL_BASE_URL + env['WM_TOKEN'] = WINDMILL_TOKEN + env['WM_WORKSPACE'] = WINDMILL_WORKSPACE + + try: + # Run wmill sync in the workspace directory + result = subprocess.run( + ['wmill', 'sync', 'pull', '--yes'], + cwd=WORKSPACE_DIR, + env=env, + capture_output=True, + text=True, + check=True + ) + + logger.info("Windmill sync completed successfully") + logger.debug(f"wmill output: {result.stdout}") + + return True + + except subprocess.CalledProcessError as e: + logger.error(f"wmill sync failed: {e.stderr}") + raise RuntimeError(f"Failed to sync from Windmill: {e.stderr}") + + +def init_or_update_git_repo(): + """Initialize Git repository or open existing one.""" + git_dir = WORKSPACE_DIR / '.git' + + if git_dir.exists(): + logger.info("Opening existing Git repository") + repo = Repo(WORKSPACE_DIR) + else: + logger.info("Initializing new Git repository") + repo = Repo.init(WORKSPACE_DIR) + + # Configure user + repo.config_writer().set_value("user", "name", GIT_USER_NAME).release() + repo.config_writer().set_value("user", "email", GIT_USER_EMAIL).release() + + return repo + + +def commit_and_push_changes(repo: Repo): + """Commit changes and push to remote Git repository.""" + # Check if there are any changes + if not repo.is_dirty(untracked_files=True): + logger.info("No changes to commit") + return False + + # Stage all changes + repo.git.add(A=True) + + # Create commit + commit_message = f"Automated Windmill workspace backup - {WINDMILL_WORKSPACE}" + repo.index.commit(commit_message) + logger.info(f"Created commit: {commit_message}") + + # Configure remote with authentication + authenticated_url = get_authenticated_url(GIT_REMOTE_URL, GIT_TOKEN) + + try: + # Check if remote exists + if 'origin' in [remote.name for remote in repo.remotes]: + origin = repo.remote('origin') + origin.set_url(authenticated_url) + else: + origin = repo.create_remote('origin', authenticated_url) + + # Push to remote + logger.info(f"Pushing to {GIT_REMOTE_URL} (branch: {GIT_BRANCH})") + origin.push(refspec=f'HEAD:{GIT_BRANCH}', force=False) + logger.info("Push completed successfully") + + return True + + except GitCommandError as e: + logger.error(f"Git push failed: {str(e)}") + raise RuntimeError(f"Failed to push to Git remote: {str(e)}") + + +def sync_windmill_to_git(): + """ + Main sync function: pulls from Windmill, commits, and pushes to Git. + + Returns: + dict: Result with 'success' boolean and 'message' string + """ + try: + # Validate configuration + validate_config() + + # Pull from Windmill + run_wmill_sync() + + # Initialize/update Git repo + repo = init_or_update_git_repo() + + # Commit and push changes + has_changes = commit_and_push_changes(repo) + + if has_changes: + message = f"Successfully synced workspace '{WINDMILL_WORKSPACE}' to Git" + else: + message = "Sync completed - no changes to commit" + + return { + 'success': True, + 'message': message + } + + except Exception as e: + logger.exception("Sync failed") + return { + 'success': False, + 'message': str(e) + } + + +if __name__ == '__main__': + # Allow running sync directly for testing + logging.basicConfig(level=logging.INFO) + result = sync_windmill_to_git() + print(result) diff --git a/docker-compose.yml b/docker-compose.yml new file mode 100644 index 0000000..41707ee --- /dev/null +++ b/docker-compose.yml @@ -0,0 +1,25 @@ +services: + # ... existing Windmill services (windmill_server, windmill_worker, postgres, etc.) ... + + windmill-git-sync: + build: . + container_name: windmill-git-sync + expose: + - "8080" + volumes: + - ${WORKSPACE_VOLUME}:/workspace + environment: + # Windmill connection + - WINDMILL_BASE_URL=http://windmill_server:8000 + - WINDMILL_TOKEN=${WINDMILL_TOKEN} + - WINDMILL_WORKSPACE=${WINDMILL_WORKSPACE:-default} + + # Git configuration + - GIT_REMOTE_URL=${GIT_REMOTE_URL} + - GIT_TOKEN=${GIT_TOKEN} + - GIT_BRANCH=${GIT_BRANCH:-main} + - GIT_USER_NAME=${GIT_USER_NAME:-Windmill Git Sync} + - GIT_USER_EMAIL=${GIT_USER_EMAIL:-windmill@example.com} + restart: unless-stopped + depends_on: + - windmill_server diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..fb5b700 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,4 @@ +Flask==3.0.0 +GitPython==3.1.40 +requests==2.31.0 +python-dotenv==1.0.0 diff --git a/setup.sh b/setup.sh new file mode 100755 index 0000000..7077563 --- /dev/null +++ b/setup.sh @@ -0,0 +1,31 @@ +#!/bin/bash +# Setup script for windmill-git-sync + +set -e + +echo "Setting up Windmill Git Sync..." + +# Create .env file if it doesn't exist +if [ ! -f .env ]; then + echo "Creating .env file from template..." + cp .env.example .env + echo "⚠️ Please edit .env with your configuration" +else + echo "✓ .env file already exists" +fi + +# Create Docker volume if it doesn't exist +if ! docker volume inspect windmill-workspace-data >/dev/null 2>&1; then + echo "Creating windmill-workspace-data Docker volume..." + docker volume create windmill-workspace-data + echo "✓ Volume created" +else + echo "✓ windmill-workspace-data already exists" +fi + +echo "" +echo "Setup complete! Next steps:" +echo "1. Edit .env with your Windmill and Git configuration" +echo "2. Add the windmill-git-sync service block from docker-compose.yml to your Windmill docker-compose file" +echo "3. Run: docker-compose up -d windmill-git-sync" +echo "4. Test from within Docker network: docker-compose exec windmill_server curl -X POST http://windmill-git-sync:8080/sync"