mirror of
https://github.com/frappe/gunicorn.git
synced 2026-07-01 18:21:30 +08:00
Avoid port conflicts when running multiple examples: - dirty_example: 8001 - embedding_service: 8002 - celery_alternative: 8003
Celery Alternative Example
This example demonstrates how to replace Celery with Gunicorn's dirty arbiters for background task processing, using async ASGI for non-blocking HTTP handling.
Why Use This Instead of Celery?
The Problem with Celery
Celery requires:
- An external message broker (Redis or RabbitMQ)
- Separate worker processes (
celery -A app worker) - Stateless workers that reload models/connections on every task
- Polling or WebSockets for progress updates
What Dirty Arbiters Provide
| Feature | Celery | Dirty Arbiters |
|---|---|---|
| External broker | Required (Redis/RabbitMQ) | None - uses Unix sockets |
| Deployment | Multiple processes | Single gunicorn command |
| Worker state | Stateless | Stateful - keep ML models, DB connections loaded |
| Progress updates | Polling or WebSocket | Native streaming |
| HTTP blocking | N/A (separate process) | Non-blocking with async ASGI |
When to Use Dirty Arbiters
Good fit:
- Tasks that benefit from keeping state (ML models, DB connection pools, caches)
- Tasks where you want immediate results (not fire-and-forget)
- Real-time progress streaming
- Simpler deployment without external dependencies
Not ideal for:
- True fire-and-forget queuing with persistence
- Distributed task execution across multiple machines
- Tasks that must survive server restarts
How It Works
┌─────────────────────────────────────────────────────────────┐
│ Gunicorn Master │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ASGI Workers (uvloop) │ │
│ │ Non-blocking! One worker handles many requests │ │
│ │ await client.execute_async() doesn't block │ │
│ └──────────────────────────┬──────────────────────────┘ │
│ │ │
│ Unix Socket IPC │
│ │ │
│ ┌──────────────────────────┼──────────────────────────┐ │
│ │ Dirty Workers (Stateful) │ │
│ │ │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │EmailWorker │ │ImageWorker │ │DataWorker │ ... │ │
│ │ │ (2 procs) │ │ (2 procs) │ │ (4 procs) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ SMTP conn │ │ PIL loaded │ │ DB pool │ │ │
│ │ │ kept alive │ │ in memory │ │ cached │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │
│ │ Dirty Arbiter │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Key insight: The HTTP workers use async I/O, so await client.execute_async() doesn't block the event loop. One ASGI worker can handle thousands of concurrent requests while waiting for dirty workers to complete tasks.
Quick Start
Local Development
# Install dependencies
pip install fastapi uvloop httpx pytest pytest-asyncio
pip install -e ../.. # Install gunicorn from source
# Run the application
gunicorn -c gunicorn_conf.py app:app
# In another terminal, test it
curl http://localhost:8000/health
curl -X POST http://localhost:8000/api/email/send \
-H "Content-Type: application/json" \
-d '{"to": "test@example.com", "subject": "Hello", "body": "World"}'
# Interactive API docs
open http://localhost:8000/docs
Docker
# Build and run
docker compose up --build
# Run with tests
docker compose --profile test up --build --abort-on-container-exit
Task Workers
Each worker class maintains state across requests:
EmailWorker (2 workers)
- Keeps SMTP connection alive
send_email(to, subject, body)- Send single emailsend_bulk_emails(recipients, subject, body)- Bulk send with streaming progress
ImageWorker (2 workers)
- Keeps PIL/image libraries loaded
resize(image_data, width, height)- Resize imageprocess_batch(images, operation)- Batch process with streaming
DataWorker (4 workers)
- Maintains DB connection pool and query cache
aggregate(data, group_by, agg_field)- Aggregate dataetl_pipeline(source_data, transformations)- ETL with streaming progresscached_query(query_key, ttl)- Query with in-memory caching
ScheduledWorker (1 worker)
- For periodic tasks (call from cron)
cleanup_old_files(directory, max_age_days)generate_daily_report()
Streaming Progress Example
Real-time progress without polling:
import httpx
import json
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
"http://localhost:8000/api/email/send-bulk",
json={
"recipients": ["a@x.com", "b@x.com", "c@x.com"],
"subject": "Newsletter",
"body": "Hello!",
},
) as response:
async for line in response.aiter_lines():
if line.startswith("data: "):
progress = json.loads(line[6:])
if progress["type"] == "progress":
print(f"Progress: {progress['percent']}%")
elif progress["type"] == "complete":
print(f"Done! Sent: {progress['sent']}")
Celery Migration Guide
Before (Celery)
# tasks.py
from celery import Celery
app = Celery('tasks', broker='redis://localhost')
@app.task
def send_email(to, subject, body):
smtp = smtplib.SMTP(...) # New connection every task!
smtp.send(...)
return {"status": "sent"}
@app.task(bind=True)
def send_bulk(self, recipients, subject, body):
for i, to in enumerate(recipients):
send_email(to, subject, body)
self.update_state(state='PROGRESS', meta={'current': i}) # Requires polling!
# views.py - Flask
from tasks import send_email
@app.route('/send')
def send_view():
send_email.delay(to, subject, body) # Fire and forget
return {"status": "queued"} # Can't get result without polling
After (Dirty Arbiters)
# tasks.py
from gunicorn.dirty.app import DirtyApp
class EmailWorker(DirtyApp):
workers = 2
def init(self):
self.smtp = smtplib.SMTP(...) # Connected once, reused!
def __call__(self, action, *args, **kwargs):
return getattr(self, action)(*args, **kwargs)
def send_email(self, to, subject, body):
self.smtp.send(...) # Reuses connection
return {"status": "sent"}
def send_bulk(self, recipients, subject, body):
for i, to in enumerate(recipients):
self.send_email(to, subject, body)
yield {"type": "progress", "current": i} # Native streaming!
# views.py - FastAPI (async)
from gunicorn.dirty import get_dirty_client_async
@app.post('/send')
async def send_view(data: EmailRequest):
client = await get_dirty_client_async()
# Non-blocking! Other requests handled while waiting
result = await client.execute_async("tasks:EmailWorker", "send_email", ...)
return result # Immediate result, no polling!
Configuration
# gunicorn_conf.py
# ASGI workers for non-blocking HTTP
worker_class = "asgi"
asgi_loop = "uvloop"
workers = 4
# Dirty workers (replace Celery)
dirty_apps = [
"tasks:EmailWorker",
"tasks:ImageWorker",
"tasks:DataWorker",
]
dirty_workers = 9
dirty_timeout = 300
Running Tests
# Unit tests (no server needed)
pytest tests/test_tasks.py -v
# Integration tests (server must be running)
APP_URL=http://localhost:8000 pytest tests/test_integration.py -v
# All tests via Docker
docker compose --profile test up --build --abort-on-container-exit
API Endpoints
Visit /docs for interactive Swagger documentation.
| Endpoint | Method | Description |
|---|---|---|
/api/email/send |
POST | Send single email |
/api/email/send-bulk |
POST | Bulk send (SSE streaming) |
/api/image/resize |
POST | Resize image |
/api/image/process-batch |
POST | Batch process (SSE streaming) |
/api/data/aggregate |
POST | Aggregate data |
/api/data/etl |
POST | ETL pipeline (SSE streaming) |
/api/data/query |
POST | Cached query |
/api/scheduled/* |
POST | Scheduled tasks |
/health |
GET | Health check |