mirror of
https://github.com/frappe/gunicorn.git
synced 2026-07-03 03:01:31 +08:00
feat(examples): add FastAPI embedding service with Docker testing
Add a complete example demonstrating dirty workers with sentence-transformers for text embeddings via FastAPI: - EmbeddingApp DirtyApp that loads and manages the ML model - FastAPI endpoints for /embed and /health - Docker and docker-compose configuration - Integration tests with numpy similarity checks - GitHub Actions CI workflow
This commit is contained in:
parent
ce2e06ceba
commit
0e05c824e9
42
.github/workflows/embedding-integration.yml
vendored
Normal file
42
.github/workflows/embedding-integration.yml
vendored
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
name: Embedding Service Integration Tests
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
paths:
|
||||||
|
- 'examples/embedding_service/**'
|
||||||
|
- 'gunicorn/dirty/**'
|
||||||
|
pull_request:
|
||||||
|
paths:
|
||||||
|
- 'examples/embedding_service/**'
|
||||||
|
- 'gunicorn/dirty/**'
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
test:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
timeout-minutes: 15
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Build and start service
|
||||||
|
run: |
|
||||||
|
cd examples/embedding_service
|
||||||
|
docker compose up -d --build
|
||||||
|
docker compose logs -f &
|
||||||
|
|
||||||
|
- name: Wait for healthy
|
||||||
|
run: |
|
||||||
|
for i in {1..30}; do
|
||||||
|
curl -s http://127.0.0.1:8000/health && break
|
||||||
|
sleep 2
|
||||||
|
done
|
||||||
|
|
||||||
|
- name: Run tests
|
||||||
|
run: |
|
||||||
|
pip install requests numpy
|
||||||
|
python examples/embedding_service/test_embedding.py
|
||||||
|
|
||||||
|
- name: Cleanup
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
cd examples/embedding_service
|
||||||
|
docker compose down
|
||||||
21
examples/embedding_service/Dockerfile
Normal file
21
examples/embedding_service/Dockerfile
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
FROM python:3.12-slim
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
RUN pip install --no-cache-dir \
|
||||||
|
sentence-transformers \
|
||||||
|
fastapi \
|
||||||
|
pydantic
|
||||||
|
|
||||||
|
# Copy gunicorn source
|
||||||
|
COPY . /app/gunicorn-src
|
||||||
|
RUN pip install /app/gunicorn-src
|
||||||
|
|
||||||
|
# Copy app
|
||||||
|
COPY examples/embedding_service /app/embedding_service
|
||||||
|
|
||||||
|
ENV PYTHONPATH=/app
|
||||||
|
|
||||||
|
EXPOSE 8000
|
||||||
|
CMD ["gunicorn", "embedding_service.main:app", "-c", "embedding_service/gunicorn_conf.py"]
|
||||||
133
examples/embedding_service/README.md
Normal file
133
examples/embedding_service/README.md
Normal file
@ -0,0 +1,133 @@
|
|||||||
|
# Embedding Service Example
|
||||||
|
|
||||||
|
A FastAPI-based text embedding service using sentence-transformers, powered by
|
||||||
|
gunicorn's dirty workers for efficient ML model management.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This example demonstrates how to build a production-ready embedding API that:
|
||||||
|
- Keeps ML models loaded in memory across requests (dirty workers)
|
||||||
|
- Handles HTTP efficiently with async FastAPI (ASGI workers)
|
||||||
|
- Provides batch embedding for multiple texts
|
||||||
|
- Includes Docker-based deployment and testing
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐
|
||||||
|
│ HTTP Clients │────►│ FastAPI (ASGI) │────►│ DirtyWorker │
|
||||||
|
│ │ │ - /embed │ │ - sentence- │
|
||||||
|
│ │◄────│ - /health │◄────│ transformers │
|
||||||
|
└─────────────────┘ └──────────────────┘ │ - Model in memory │
|
||||||
|
└─────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why dirty workers?**
|
||||||
|
- ML models are expensive to load (several seconds)
|
||||||
|
- Dirty workers load the model once at startup
|
||||||
|
- HTTP workers remain lightweight and responsive
|
||||||
|
- Model stays in memory, serving many requests
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### With Docker (recommended)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd examples/embedding_service
|
||||||
|
docker compose up --build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Local Development
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
pip install sentence-transformers fastapi pydantic
|
||||||
|
|
||||||
|
# Run with gunicorn
|
||||||
|
gunicorn examples.embedding_service.main:app \
|
||||||
|
-c examples/embedding_service/gunicorn_conf.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Reference
|
||||||
|
|
||||||
|
### POST /embed
|
||||||
|
|
||||||
|
Generate embeddings for a list of texts.
|
||||||
|
|
||||||
|
**Request:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"texts": ["Hello world", "Another sentence"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"embeddings": [
|
||||||
|
[0.123, -0.456, ...],
|
||||||
|
[0.789, -0.012, ...]
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8000/embed \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"texts": ["Hello world"]}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### GET /health
|
||||||
|
|
||||||
|
Health check endpoint.
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{"status": "ok"}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Edit `gunicorn_conf.py` to adjust:
|
||||||
|
|
||||||
|
| Setting | Default | Description |
|
||||||
|
|---------|---------|-------------|
|
||||||
|
| `workers` | 2 | Number of HTTP workers |
|
||||||
|
| `dirty_workers` | 1 | Number of ML model workers |
|
||||||
|
| `dirty_timeout` | 60 | Max seconds per inference |
|
||||||
|
| `bind` | 0.0.0.0:8000 | Listen address |
|
||||||
|
|
||||||
|
## Model
|
||||||
|
|
||||||
|
Uses [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2):
|
||||||
|
- 384-dimensional embeddings
|
||||||
|
- Fast inference (~14K sentences/sec on GPU)
|
||||||
|
- Good quality for semantic search
|
||||||
|
- ~90MB download
|
||||||
|
|
||||||
|
To use a different model, edit `embedding_app.py`:
|
||||||
|
```python
|
||||||
|
self.model = SentenceTransformer('your-model-name')
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
Run the integration tests:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start the service first
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
pip install requests numpy
|
||||||
|
python test_embedding.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Production Considerations
|
||||||
|
|
||||||
|
1. **GPU Support**: Add CUDA to the Dockerfile for faster inference
|
||||||
|
2. **Scaling**: Increase `dirty_workers` for more concurrent embeddings
|
||||||
|
3. **Caching**: Add Redis caching for repeated texts
|
||||||
|
4. **Rate Limiting**: Add FastAPI middleware for rate limiting
|
||||||
|
5. **Monitoring**: Add Prometheus metrics endpoint
|
||||||
1
examples/embedding_service/__init__.py
Normal file
1
examples/embedding_service/__init__.py
Normal file
@ -0,0 +1 @@
|
|||||||
|
# Embedding service package
|
||||||
13
examples/embedding_service/docker-compose.yml
Normal file
13
examples/embedding_service/docker-compose.yml
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
services:
|
||||||
|
embedding-service:
|
||||||
|
build:
|
||||||
|
context: ../..
|
||||||
|
dockerfile: examples/embedding_service/Dockerfile
|
||||||
|
ports:
|
||||||
|
- "8000:8000"
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/health', timeout=5)"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 5
|
||||||
|
start_period: 30s # Model loading time
|
||||||
14
examples/embedding_service/embedding_app.py
Normal file
14
examples/embedding_service/embedding_app.py
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
from gunicorn.dirty.app import DirtyApp
|
||||||
|
|
||||||
|
|
||||||
|
class EmbeddingApp(DirtyApp):
|
||||||
|
def init(self):
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
self.model = SentenceTransformer('all-MiniLM-L6-v2')
|
||||||
|
|
||||||
|
def embed(self, texts):
|
||||||
|
embeddings = self.model.encode(texts)
|
||||||
|
return embeddings.tolist()
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
del self.model
|
||||||
8
examples/embedding_service/gunicorn_conf.py
Normal file
8
examples/embedding_service/gunicorn_conf.py
Normal file
@ -0,0 +1,8 @@
|
|||||||
|
bind = "0.0.0.0:8000"
|
||||||
|
workers = 2
|
||||||
|
worker_class = "asgi"
|
||||||
|
|
||||||
|
# Dirty worker config
|
||||||
|
dirty_apps = ["embedding_service.embedding_app:EmbeddingApp"]
|
||||||
|
dirty_workers = 1
|
||||||
|
dirty_timeout = 60
|
||||||
29
examples/embedding_service/main.py
Normal file
29
examples/embedding_service/main.py
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
from fastapi import FastAPI
|
||||||
|
from pydantic import BaseModel
|
||||||
|
from gunicorn.dirty.client import get_dirty_client
|
||||||
|
|
||||||
|
app = FastAPI()
|
||||||
|
|
||||||
|
|
||||||
|
class EmbedRequest(BaseModel):
|
||||||
|
texts: list[str]
|
||||||
|
|
||||||
|
|
||||||
|
class EmbedResponse(BaseModel):
|
||||||
|
embeddings: list[list[float]]
|
||||||
|
|
||||||
|
|
||||||
|
@app.post("/embed", response_model=EmbedResponse)
|
||||||
|
async def embed(request: EmbedRequest):
|
||||||
|
client = get_dirty_client()
|
||||||
|
result = client.execute(
|
||||||
|
"embedding_service.embedding_app:EmbeddingApp",
|
||||||
|
"embed",
|
||||||
|
request.texts
|
||||||
|
)
|
||||||
|
return EmbedResponse(embeddings=result)
|
||||||
|
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health():
|
||||||
|
return {"status": "ok"}
|
||||||
5
examples/embedding_service/requirements.txt
Normal file
5
examples/embedding_service/requirements.txt
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
sentence-transformers
|
||||||
|
fastapi
|
||||||
|
pydantic
|
||||||
|
requests
|
||||||
|
numpy
|
||||||
33
examples/embedding_service/test_embedding.py
Normal file
33
examples/embedding_service/test_embedding.py
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
import os
|
||||||
|
import requests
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
def test_embedding_endpoint():
|
||||||
|
base_url = os.environ.get("EMBEDDING_SERVICE_URL", "http://127.0.0.1:8000")
|
||||||
|
url = f"{base_url}/embed"
|
||||||
|
|
||||||
|
# Test single text
|
||||||
|
response = requests.post(url, json={"texts": ["Hello world"]})
|
||||||
|
assert response.status_code == 200
|
||||||
|
data = response.json()
|
||||||
|
assert len(data["embeddings"]) == 1
|
||||||
|
assert len(data["embeddings"][0]) == 384 # MiniLM dimension
|
||||||
|
|
||||||
|
# Test batch
|
||||||
|
texts = ["First sentence", "Second sentence", "Third one"]
|
||||||
|
response = requests.post(url, json={"texts": texts})
|
||||||
|
assert response.status_code == 200
|
||||||
|
data = response.json()
|
||||||
|
assert len(data["embeddings"]) == 3
|
||||||
|
|
||||||
|
# Test similarity (same text = same embedding)
|
||||||
|
response = requests.post(url, json={"texts": ["test", "test"]})
|
||||||
|
emb1, emb2 = response.json()["embeddings"]
|
||||||
|
assert np.allclose(emb1, emb2, rtol=1e-5, atol=1e-6)
|
||||||
|
|
||||||
|
print("All tests passed!")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test_embedding_endpoint()
|
||||||
Loading…
x
Reference in New Issue
Block a user