◆ Solutions · Tutors / EdTech

Tutors that look you in the eye.

Effective tutoring is about turn-taking, pacing, and visible attention. Voice-only AI tutors lose learners during long sessions; video tutors don't. NORTH renders the face so you can focus on pedagogy.

Book a deployment review Email eric@northmodellabs.com Read the docs →

◇ Problem

Why current solutions fall short.

Voice-only tutors can't model attention, expression, or visual demonstration
Existing avatar APIs cap session length below a real lesson
Studio products only generate static videos, not adaptive sessions
Per-session pricing kills the unit economics of language learning

Stack-level cost: every drop-off mid-lesson is a churned learner. Retention dominates LTV.

◆ How NORTH fits

A persistent tutor identity, sized to the lesson.

1:1 lesson with a specific tutor avatar tied to the learner
Cohort / classroom mode: one render stream into many viewers
Pronunciation feedback: the tutor visibly mouths the correct phonemes
Visible 'thinking' / pause cues so learners know it's their turn
Drop-in / drop-out: pause a lesson, resume same identity later

◆ Why now

Now: realtime renderers make natural turn-taking possible for live tutoring.

Per-second billing fits drop-in / drop-out lesson patterns
No fixed session cap, run a full lesson without re-auth
Phoneme-level lip sync, language-agnostic
Single avatar per learner sustains identity continuity across many lessons
Multi-viewer mode for live classroom or cohort lessons

◆ Integration shape

What it actually looks like in your codebase.

01
Pair with your curriculum
Your lesson engine drives prompts and pacing; NORTH renders the face.
02
Wire STT/LLM/TTS
Use whatever stack already powers your tutor. NORTH is passthrough.
03
Persist tutor identity
One reference image per tutor; same identity across all sessions for that learner.
04
Ship into the app
WebRTC in the browser, native via WebView, or into Zoom for cohort classes.

◆ What to measure

The metrics that prove this is working.

Session completion rate

Compare against your voice-only or chat-only tutor baseline; lesson-completion is the dominant retention signal.

Realtime session quality

Tune your STT and TTS choices to keep turn-taking natural.

Render cost / minute

$7/hr prorated to the second. Drop-in / drop-out lessons charge only for active session time.

We don't publish customer numbers. The right comparison is against your existing baseline, your STT, your LLM, your voice, your traffic.

◇ FAQ for this use case

The questions that actually come up.

Does it work for non-English languages?

Yes. Lip-sync is phoneme-level and language-agnostic. Quality depends on the TTS you use, since the avatar follows the audio you send.

Can the same tutor 'remember' the learner?

Memory lives in your stack, not ours, that's a feature, not a bug. NORTH stays a thin rendering layer; you keep the learner record and feed it into your prompts.

Can we use it in classrooms?

Yes. Multi-viewer sessions let one rendered stream serve many viewers for one GPU's worth of cost. Useful for cohort-based tutoring or live classroom delivery.

What about kids' privacy?

WebRTC frames are not stored. For under-13 deployments, customers handle COPPA-compliant consent on their side. See /safety.

Ready to ship?

Email if you want async, book if you want a structured call. We reply within one business day.

Book 30 min ↗Email eric@northmodellabs.com All use cases

Why current solutions fall short.

Voice-only tutors can't model attention, expression, or visual demonstration

Existing avatar APIs cap session length below a real lesson

Studio products only generate static videos, not adaptive sessions

Per-session pricing kills the unit economics of language learning

Stack-level cost: every drop-off mid-lesson is a churned learner. Retention dominates LTV.

A persistent tutor identity, sized to the lesson.

1:1 lesson with a specific tutor avatar tied to the learner

Cohort / classroom mode: one render stream into many viewers

Pronunciation feedback: the tutor visibly mouths the correct phonemes

Visible 'thinking' / pause cues so learners know it's their turn

Drop-in / drop-out: pause a lesson, resume same identity later

What it actually looks like in your codebase.

Pair with your curriculum

Your lesson engine drives prompts and pacing; NORTH renders the face.

Wire STT/LLM/TTS

Use whatever stack already powers your tutor. NORTH is passthrough.

Persist tutor identity

One reference image per tutor; same identity across all sessions for that learner.

Ship into the app

WebRTC in the browser, native via WebView, or into Zoom for cohort classes.