◆ Solutions · Tutors / EdTech
Tutors that look you in the eye.
Effective tutoring is about turn-taking, pacing, and visible attention. Voice-only AI tutors lose learners during long sessions; video tutors don't. NORTH renders the face so you can focus on pedagogy.
◇ Problem
Why current solutions fall short.
- Voice-only tutors can't model attention, expression, or visual demonstration
- Existing avatar APIs cap session length below a real lesson
- Studio products only generate static videos, not adaptive sessions
- Per-session pricing kills the unit economics of language learning
Stack-level cost: every drop-off mid-lesson is a churned learner. Retention dominates LTV.
◆ How NORTH fits
A persistent tutor identity, sized to the lesson.
- 1:1 lesson with a specific tutor avatar tied to the learner
- Cohort / classroom mode: one render stream into many viewers
- Pronunciation feedback: the tutor visibly mouths the correct phonemes
- Visible 'thinking' / pause cues so learners know it's their turn
- Drop-in / drop-out: pause a lesson, resume same identity later
◆ Why now
Now: realtime renderers make natural turn-taking possible for live tutoring.
- Per-second billing fits drop-in / drop-out lesson patterns
- No fixed session cap, run a full lesson without re-auth
- Phoneme-level lip sync, language-agnostic
- Single avatar per learner sustains identity continuity across many lessons
- Multi-viewer mode for live classroom or cohort lessons
◆ Integration shape
What it actually looks like in your codebase.
01
Pair with your curriculum
Your lesson engine drives prompts and pacing; NORTH renders the face.
02
Wire STT/LLM/TTS
Use whatever stack already powers your tutor. NORTH is passthrough.
03
Persist tutor identity
One reference image per tutor; same identity across all sessions for that learner.
04
Ship into the app
WebRTC in the browser, native via WebView, or into Zoom for cohort classes.
◆ What to measure
The metrics that prove this is working.
Session completion rate
Compare against your voice-only or chat-only tutor baseline; lesson-completion is the dominant retention signal.
Realtime session quality
Tune your STT and TTS choices to keep turn-taking natural.
Render cost / minute
$7/hr prorated to the second. Drop-in / drop-out lessons charge only for active session time.
We don't publish customer numbers. The right comparison is against your existing baseline, your STT, your LLM, your voice, your traffic.
◇ FAQ for this use case
The questions that actually come up.
Does it work for non-English languages?
Yes. Lip-sync is phoneme-level and language-agnostic. Quality depends on the TTS you use, since the avatar follows the audio you send.
Can the same tutor 'remember' the learner?
Memory lives in your stack, not ours, that's a feature, not a bug. NORTH stays a thin rendering layer; you keep the learner record and feed it into your prompts.
Can we use it in classrooms?
Yes. Multi-viewer sessions let one rendered stream serve many viewers for one GPU's worth of cost. Useful for cohort-based tutoring or live classroom delivery.
What about kids' privacy?
WebRTC frames are not stored. For under-13 deployments, customers handle COPPA-compliant consent on their side. See /safety.
Ready to ship?
Email if you want async, book if you want a structured call. We reply within one business day.