NORTH · Realtime Avatar API · v8.0
API documentation
Lip-sync avatar videos and live WebRTC streams through one async API. Submit, poll, download, or open a passthrough session and stream audio in for realtime face rendering.
Multi-Viewer Sessions
You can now issue view-only tokens for active realtime sessions, enabling multiple viewers to watch the same avatar stream with zero extra GPU cost. See the new endpoint →
OpenClaw (ClawHub)
The NORTH Realtime Avatar skill lets OpenClaw agents create passthrough realtime sessions (LiveKit) and offline generate jobs via the same HTTP API you use here, see the listing for install steps, env vars (NORTH_API_KEY), and security notes. Atlas Avatar on ClawHub →
Generate photorealistic lip-sync avatar videos through an async job queue. Bring your own speech audio, submit a job, poll for status, then download the result. Generation endpoints return 202 Accepted with a job ID, no more waiting for a synchronous response.
Skip to a working app
Pre-rendered Videos
Submit audio + image, get an MP4 back. Best for content creation, batch processing, and async workflows.
Live Avatars (WebRTC)
Realtime WebRTC stream. You send audio; we return lip-synced video. Best for agents, support, and live demos.
Realtime (passthrough)
| Rendering, you provide audio | |
|---|---|
| You provide | Face image + audio stream (your STT / LLM / TTS) |
| You get back | Lip-synced WebRTC video |
| GPU rendering | ✓ |
| One-shot, any face image | ✓ |
| No hard max duration | ✓ |
| No quality collapse | ✓ |
| React SDK | ✓ |
| Price | $7/hr (prorated per second) |
How It Works, 3-Step Flow
Submit
POST to a generation endpoint. Returns 202 with job_id
Poll
GET /v1/jobs/{id} until status is completed
Download
GET /v1/jobs/{id}/result for a presigned download URL
https://api.atlasv1.comLive Examples
See the API in action, try avatar generation and more with working demos.
Authentication
Most API endpoints require an API key via the Authorization header. Public endpoints are GET /, /v1/health, /healthz, SCIM discovery endpoints, and the SSO login/callback flow.
Generate API keys from your dashboard after adding a payment method. For enterprise plans, email eric@northmodellabs.com or book a 30-min call ↗.
/
Returns API info and available endpoints. No authentication required.
/v1/generate
Submit a lip-sync avatar video generation job. Returns immediately with a job ID, poll GET /v1/jobs/{id} for status.
Content-Type: multipart/form-data
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
| audio | file | yes | Audio file for lip-sync |
| image | file | yes | Reference face image |
Supported audio: wav, mp3, mpeg, ogg, webm
Supported images: png, jpeg, webp
Max upload: 50 MB combined
Works with any TTS provider. Generate speech audio with ElevenLabs, OpenAI TTS, Deepgram, or any other service, then pass the audio file to this endpoint.
Offline generation is billed at $7/hour ($0.117/min · $0.0019/sec) of output video duration. See pricing.
Response 202 Accepted
Jobs
Every generation endpoint returns a job_id. Use these endpoints to track progress, list history, and download results. Jobs run concurrently on the server, you can submit multiple jobs and poll them all in parallel.
/v1/jobs
List your recent jobs, newest first. Paginated.
Query Parameters
| Param | Type | Default | Description |
|---|---|---|---|
| limit | int | 20 | Number of results to return (max 100) |
| offset | int | 0 | Number of results to skip |
/v1/jobs/{id}
Poll the status of a specific job. This is the core endpoint you call in a loop until the job completes or fails.
Job Statuses
| Status | Description |
|---|---|
| pending | Job is queued, waiting to be processed |
| processing | Job is actively being processed |
| completed | Job finished, output ready to download |
| failed | Job failed, check error field for details |
/v1/jobs/{id}/result
Get a time-limited presigned download URL for a completed job's output. The URL is valid for 24 hours and can be used directly in browsers, video players, or download links without authentication.
Important: Only call this after the job status is completed. Calling it on a pending or processing job returns 409 not_ready. The presigned URL is also included in the poll response (GET /v1/jobs/{id}) when the job is completed.
Response 200 OK
| Field | Description |
|---|---|
| url | Presigned download URL, no auth needed, expires after expires_in seconds |
| content_type | MIME type of the output (video/mp4 for video jobs; audio/wav for WAV audio jobs) |
| expires_in | URL validity in seconds (default 24 hours) |
/v1/me
Check your API key status, current rate limit usage, and plan details.
/v1/status
Check system status. Requires authentication.
Each service reports available (capacity is free) or busy (all capacity is occupied).
/v1/health
Health check endpoint. No authentication required.
Realtime
Live interactive avatars over WebRTC. Unlike the Offline API (send files → get video back), the Realtime API streams audio and video like a video call with an AI face.
Integration uses the LiveKit client SDK. Request a session token from our API, then connect with the LiveKit SDK on the client side. Realtime sessions use passthrough (rendering): you publish audio, we stream lip-synced video. Billed at $7/hour, prorated to the second.
Realtime sessions are unlimited while active, they run as long as you keep the room alive. Billing is prorated to the second. Idle rooms close after the configured idle_timeout. Get your API key from the dashboard, or reach out to discuss your use case.
Passthrough (rendering)
Set mode: "passthrough" (or omit mode, it defaults to passthrough). You provide the audio stream; we render the face over WebRTC. Bring your own STT, LLM, and TTS, NORTH supplies the GPU lip-sync. Use the React SDK for less boilerplate.
Session Lifecycle
Create
Connect
Active
Disconnect
/v1/realtime/session
Create a realtime avatar session. Returns a LiveKit room token and connection URL for client-side WebRTC connection.
Content-Type: application/json or multipart/form-data
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| face_url | string | no | HTTPS URL of a reference face image. Must be HTTPS, max 2048 chars. |
| face | file | no | Face image file upload (PNG/JPEG/WebP, max 10 MB). Use with multipart/form-data. |
mode | string | no | "passthrough" is the default rendering mode. "conversation" is available only to allowlisted API keys. |
| idle_timeout | integer | no | Seconds an empty/inactive room can remain open before cleanup. Default 300, clamped 30–3600. |
Rendering mode: Set mode: "passthrough" to use your own audio. Publish an audio track to the LiveKit room and the avatar will lip-sync to it in realtime. You handle speech recognition, AI, and voice generation, we handle the GPU rendering.
Response 200 OK
Realtime passthrough: $7/hour, prorated to the second.
Rendering Mode, Quick Start
Error Responses
| Status | Error | Description |
|---|---|---|
| 400 | invalid_face_url | face_url must use HTTPS or exceeds 2048 chars |
| 400 | invalid_mode | mode must be "passthrough" or "conversation" |
| 401 | unauthorized | Missing or invalid Authorization header |
| 403 | forbidden | Invalid or revoked API key |
| 403 | conversation_disabled | Conversation mode is restricted. Use passthrough unless your key is allowlisted. |
| 429 | rate_limited | Too many requests, wait and retry |
| 503 | no_capacity | All GPU pods are busy. Retry after 30 seconds. |
/v1/realtime/session/{session_id}/viewer
Issue a view-only LiveKit token for an existing active session. Viewers can watch the avatar stream but cannot publish audio, video, or data. Perfect for multi-viewer scenarios, one user drives the avatar, others watch the same stream. No additional GPU is consumed.
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| session_id | string | The session ID returned from POST /v1/realtime/session |
Headers
| Header | Required | Description |
|---|---|---|
| Authorization | Yes | Bearer <api_key>, must be the same key that created the session |
Multi-Viewer Architecture
Each call returns a unique viewer token. You can issue as many viewer tokens as needed, LiveKit rooms natively support many participants. Viewers subscribe to the avatar's video/audio tracks with zero additional GPU cost. To let viewers interact (text or voice), control that at your app layer by relaying messages through your backend.
/v1/realtime/session/{session_id}
Retrieve the status and details of a specific realtime session. Only accessible by the API key that created it.
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| session_id | string | The session ID returned from POST /v1/realtime/session |
/v1/realtime/session/{session_id}
Hot-swap the avatar face during an active session without disconnecting. Uses a secure file upload (arbitrary URLs are not accepted on this endpoint). Rate-limited per session.
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| session_id | string | The active session ID to update |
Request Body
Content-Type: multipart/form-data, field face (image file, PNG/JPEG/WebP, max 10 MB).
| Field | Type | Required | Description |
|---|---|---|---|
| face | file | yes | New face image file (PNG/JPEG/WebP, max 10 MB) |
HTTPS face URL: set face_url on POST /v1/realtime/session (JSON or multipart at create). To change the face mid-session, upload a new file with this PATCH.
Error Responses
| Status | Error | Description |
|---|---|---|
| 404 | not_found | Session not found |
| 409 | session_not_active | Session is ended or not active, cannot update face |
| 429 | rate_limited | Too many face swaps, wait before retrying |
/v1/realtime/session/{session_id}
End a realtime session. The LiveKit room is destroyed and billing duration is recorded. Returns the final cost.
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| session_id | string | The session ID to end |
SDK Integration
Connect to a realtime session from your frontend using the LiveKit client SDK. Your backend creates the session and passes the token to the client.
Recommended: Use @northmodellabs/atlas-react to replace all the boilerplate below with a single useAtlasSession() hook call. See React SDK →
npm install livekit-clientBilling: Realtime sessions are unlimited while active and billed per second. When all GPU pods are busy, new session requests return 503 with a retry_after_seconds field. Always call DELETE when done to stop billing.
Part of the full hosted API, all infrastructure managed by North Model Labs. The @northmodellabs/atlas-react package provides a single useAtlasSession() hook that handles LiveKit wiring, room lifecycle, avatar video subscription, optional mic helpers, realtime session state, and cleanup. For audio-driven lip-sync, use passthrough and the persistent audio track pattern.
npm install @northmodellabs/atlas-react livekit-clientQuick Start
Hook Options
| Option | Type | Default | Description |
|---|---|---|---|
createSession | (face, faceUrl) => Promise | required | Creates a session on your backend, API key stays server-side |
deleteSession | (sessionId) => Promise | - | Tears down the session on your backend |
autoEnableMic | boolean | true | Auto-enable microphone after connecting |
autoCleanup | boolean | true | Auto-disconnect on unmount / tab close |
Returned Session Object
| Field | Type | Description |
|---|---|---|
status | "idle" | "connecting" | "connected" | "disconnected" | "error" | Connection state |
error | string | null | Error message if status is error |
sessionId | string | null | Active session ID |
muted | boolean | Whether mic is muted |
volume | number | Playback volume (0–100) |
videoRef | RefObject<HTMLDivElement> | Attach to a <div>, video renders inside |
connect(face?, faceUrl?) | () => Promise | Start a session |
disconnect() | () => Promise | End the session |
setMicEnabled(enabled) | (boolean) => void | Mute / unmute |
setVolume(v) | (number) => void | Set playback volume 0–100 |
room | Room | null | Underlying LiveKit Room, for advanced scenarios |
publishAudio(audio) | (string | Blob | ArrayBuffer) => Promise<AudioPlaybackHandle> | Publish audio to the room (passthrough mode). Recommended: use the persistent audio track pattern instead, see Passthrough Mode |
What the Hook Handles
- - Creates a LiveKit Room with
adaptiveStreamanddynacast - - Subscribes to video and audio tracks, attaches them to the DOM
- - Optional mic helpers after connecting (your STT / pipeline may use this)
- - Tracks realtime session state for your UI
- - Disconnects and cleans up on unmount and
beforeunload - - Calls your
deleteSessioncallback to tear down the server-side session - - Exposes
roomfor passthrough mode, use the persistent audio track pattern for freeze-free lip-sync
Passthrough Mode
In passthrough mode, you bring your own LLM, TTS, and audio pipeline, Atlas provides the GPU compute and WebRTC video. Create your session with mode: "passthrough", then publish audio to the avatar for lip-sync.
Important: Persistent Audio Track Pattern
Use a persistent audio track that stays published for the entire session. The track feeds silence when idle (keeping the avatar animated) and TTS audio when speaking. Avoid calling publishAudio() directly, it tears down the track after each call, causing the avatar to freeze between messages.
How the Persistent Track Works
| State | What happens |
|---|---|
| Session connects | A single MediaStreamDestination is published as an audio track, outputs silence |
| Avatar idle | Silence frames flow through the track → GPU renders idle animation (avatar stays alive) |
| TTS plays | A BufferSource connects to the same destination → TTS audio flows through → avatar lip-syncs |
| TTS ends | BufferSource disconnects → back to silence → avatar returns to idle animation |
| Session disconnects | Track is unpublished and AudioContext is closed |
The key insight: one persistent track, never torn down. Silence when idle, TTS audio when speaking. No track re-publishing, no mic toggling, no avatar freeze.
Voice Input & Echo Cancellation
Use ElevenLabs Scribe v2 for speech-to-text instead of the browser's Web Speech API. The Web Speech API picks up the avatar's TTS audio from the speakers and feeds it back as user input, causing the avatar to talk to itself. ElevenLabs Scribe connects to the mic via getUserMedia with echoCancellation: true, so the browser's built-in Acoustic Echo Cancellation strips the speaker output at the hardware level before it reaches the STT model.
Install npm install @elevenlabs/react, see the example app for the full implementation with useScribe hook and server-side token endpoint.
Security: Your API key never touches the client. The createSession callback calls your own backend, which proxies to the Atlas API. See the full documentation on npm.
Webhooks
Get notified when a job completes or fails instead of polling. Pass a callback URL when submitting a job and Atlas can POST the result to your endpoint when webhook signing is configured. Polling remains the supported fallback for every job.
How to Use
For /v1/generate (multipart), send your callback as the X-Callback-URL header. Atlas validates the URL at submit time and again at delivery time.
| Rule | Detail |
|---|---|
| Protocol | HTTPS only, HTTP and localhost are rejected |
| Signing | Requires Atlas webhook signing to be configured; otherwise poll job status |
| Retries | 3 attempts with backoff (5s, 30s, 120s) |
| Timeout | 10 seconds per attempt |
| Success | Any 2xx response counts as delivered |
Payload, Completed
Payload, Failed
Verifying Signatures
Every webhook includes X-Atlas-Signature and X-Atlas-Timestamp headers. Verify them to confirm the request came from Atlas.
Error Responses
All errors return a consistent JSON format:
Error Codes
| Code | Error | Description |
|---|---|---|
| 400 | invalid_input | Empty or invalid audio/image file |
| 400 | invalid_mode | mode must be "passthrough" or "conversation" |
| 401 | unauthorized | Missing or malformed Authorization header |
| 403 | forbidden | Invalid API key |
| 403 | conversation_disabled | Conversation mode is restricted to allowlisted API keys |
| 404 | not_found | Endpoint or job does not exist |
| 404 | no_output | No stored output available for this job |
| 405 | method_not_allowed | Wrong HTTP method |
| 409 | not_ready | Downloading result before job completes |
| 409 | already_ended | Realtime session DELETE when session is already ended |
| 413 | payload_too_large | Upload exceeds 50 MB limit |
| 415 | unsupported_media_type | Format not supported |
| 422 | validation_error | Missing required fields |
| 429 | rate_limit_exceeded | Basic self-serve rate limit reached |
| 502 | generation_failed | Generation failed after retries |
| 503 | queue_unavailable | Job queue is temporarily unavailable |
| 503 | storage_unavailable | Output storage is temporarily unavailable |
| 503 | storage_error | Failed to store or retrieve job output |
| 503 | temporarily_unavailable | Service is temporarily unavailable |
| 500 | url_generation_failed | Failed to generate download URL |
| 500 | internal_error | Unexpected server error |
| 503 | auth_unavailable | Authentication service is temporarily unavailable |
Example Errors
TTS Integration
Atlas focuses on video generation, bring your own TTS provider for speech audio. Generate audio with ElevenLabs, OpenAI TTS, Deepgram, or any other service, then pass the audio file to POST /v1/generate.
The same async flow applies, submit audio + face image, poll for status, download the video.
Provider Examples
Organizations, SSO, SCIM
Organization endpoints let teams manage members, configure WorkOS SSO, and provision users through SCIM. Organization management requires a user-linked dashboard API key; demo keys, fallback keys, and environment-only service keys cannot create organizations. SCIM user endpoints use an organization-scoped SCIM bearer token.
/v1/organizations
Create an organization. The API key must belong to a dashboard user, and that user becomes the owner.
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | yes | Organization name, 2–100 characters |
/v1/organizations
List organizations the authenticated user belongs to.
/v1/organizations/{org_id}
Get organization details and members. The caller must be an active member.
Organization Members
| Endpoint | Role | Purpose |
|---|---|---|
| POST /v1/organizations/{org_id}/members | admin or owner | Add or reactivate a member |
| DELETE /v1/organizations/{org_id}/members/{member_id} | admin or owner | Deactivate a member and revoke their org-scoped API keys |
| DELETE /v1/organizations/{org_id} | owner | Soft-delete the org, deactivate members, revoke keys and SCIM tokens |
SAML SSO
SSO is configured through WorkOS. Owners create an Admin Portal setup link, enable SSO after a connection is configured, and users start login with an organization slug.
| Endpoint | Auth | Purpose |
|---|---|---|
| GET /v1/organizations/{org_id}/sso | API key, admin or owner | Read SSO configuration status |
| POST /v1/organizations/{org_id}/sso/setup | API key, owner | Generate a WorkOS Admin Portal setup link |
| POST /v1/organizations/{org_id}/sso/enable | API key, owner | Enable or disable SSO enforcement |
| GET /v1/auth/sso?org_slug=... | public | Start SSO login and receive a WorkOS authorization URL |
| GET /v1/auth/sso/callback | public callback | Exchange WorkOS code for an Atlas SSO session token |
| POST /v1/auth/sso/webhook | WorkOS signature | Receive WorkOS connection lifecycle events |
SCIM Tokens
Owners create organization-scoped SCIM bearer tokens for IdP provisioning. The raw token is returned only once; store it in your IdP immediately.
| Endpoint | Role | Purpose |
|---|---|---|
| POST /v1/organizations/{org_id}/scim-tokens | owner | Create a SCIM token |
| DELETE /v1/organizations/{org_id}/scim-tokens/{token_id} | owner | Revoke a SCIM token |
SCIM Discovery
SCIM discovery endpoints are public so IdPs can detect capabilities before sending a bearer token.
| Endpoint | Auth | Purpose |
|---|---|---|
| GET /scim/v2/ServiceProviderConfig | none | SCIM capabilities, patch supported, bulk disabled |
| GET /scim/v2/Schemas | none | Supported User schema |
| GET /scim/v2/ResourceTypes | none | Supported resource types |
SCIM Users
SCIM user endpoints require Authorization: Bearer scim_.... Deprovisioning a user revokes their organization-scoped API keys, ends active sessions, and revokes SSO sessions.
| Endpoint | Purpose |
|---|---|
| GET /scim/v2/Users | List users, supports userName eq and externalId eq filters |
| POST /scim/v2/Users | Provision a user |
| GET /scim/v2/Users/{id} | Get one user |
| PUT /scim/v2/Users/{id} | Replace a user |
| PATCH /scim/v2/Users/{id} | Patch active, email, name, or externalId |
| DELETE /scim/v2/Users/{id} | Deprovision a user |
Rate Limits
Self-serve developer keys ship with a default limit so a single key can't saturate shared capacity. Production and enterprise contracts include custom concurrency, throughput, and dedicated GPU capacity, no fixed RPM ceiling.
- - Self-serve default: 30 requests per minute per API key, sliding window
- - Production: high-volume throughput, burst, and concurrency, set per workload
- - Enterprise: reserved GPU pool, dedicated capacity, no shared-tier rate limit
- - Check your current usage and plan via
GET /v1/me - - When exceeded, the response includes
retry_after_seconds - - Higher limits / dedicated capacity: /enterprise or email eric@northmodellabs.com
Limits & Constraints
| Constraint | Value |
|---|---|
| Max upload size | 50 MB |
| Offline job processing timeout | Configurable, default 300s; can be set higher or disabled for dedicated deployments |
| Realtime processing timeout | None while the session is active; idle cleanup is controlled by idle_timeout |
| Max retries on failure | 3 |
| Rate limit | 30 RPM for basic self-serve keys; production and enterprise use custom/high-volume limits with no shared fixed RPM ceiling |
| Job result availability | 24 hours after completion |
| Video output format | MP4 |
| Realtime mode | Live WebRTC avatar streaming once connected |
See the API in Action
Working demos of avatar generation and the full pipeline, all interactive, no setup required.
View Examples→Plugin, external LiveKit room
POST /v1/avatar/session is omitted from the GET / endpoint list. Use it when you already have a LiveKit room and want Atlas to join with the livekit-plugins-atlas SDK flow.
/v1/avatar/session
Multipart: livekit_url, livekit_token, room_name (required strings); optional avatar_image file.
Response Headers
Included in all responses for security and caching:
| Header | Value |
|---|---|
| X-Content-Type-Options | nosniff |
| X-Frame-Options | DENY |
| Cache-Control | no-store |
| Strict-Transport-Security | max-age=31536000; includeSubDomains |
| Referrer-Policy | strict-origin-when-cross-origin |
| Permissions-Policy | camera=(), microphone=(), geolocation=() |
| X-Permitted-Cross-Domain-Policies | none |