skillmake
← marketplace
creatorsconceptsha:2d498942b25c35bfmanual

video-translation-pipeline

Use when localising a creator video into multiple languages — Whisper transcription, LLM translation, voice cloning per language, alignment to original timing, and burned-in subtitles in one pipeline.

Tutorials · creator-attached
One-line install
curl --create-dirs -fsSL https://skillmake.xyz/i/video-translation-pipeline -o ~/.claude/skills/video-translation-pipeline/SKILL.md

The hash above pins this exact content. The file we serve at /api/marketplace/video-translation-pipeline-2d498942/raw always matches sha:2d498942b25c35bf.

3,662 chars · ~916 tokens
---
name: video-translation-pipeline
description: Use when localising a creator video into multiple languages — Whisper transcription, LLM translation, voice cloning per language, alignment to original timing, and burned-in subtitles in one pipeline.
source: https://elevenlabs.io/docs/product/dubbing/overview
generated: 2026-05-07T21:43:06.132Z
category: concept
audience: creators
---

## Tutorials

- https://skillmake.xyz/v/video-translation-pipeline.mp4

## When to use

- Translating a YouTube video into 3–10 languages at once
- Voice-cloning the original creator into a target language so it still sounds 'like them'
- Producing a dubbed track aligned to the original video's mouth/cut timing
- Generating burned-in subtitles when a platform doesn't accept SRT (TikTok, Instagram)

## Key concepts

### transcribe-translate-synthesise loop

Three discrete stages: (1) Whisper transcribes the original; (2) LLM translates the transcript per target language, preserving names and technical terms; (3) ElevenLabs (or comparable) synthesises a cloned voice in each target. Stages stay decoupled so you can re-run any step independently.

### timing alignment

Translated text is rarely the same length as the source — German is ~30% longer than English; Japanese is ~15% shorter. Either time-stretch the synthesised audio (sox tempo, Rubber Band) to match cuts, or re-cut the video per language. Stretching by ±15% is invisible; beyond that, sounds robotic.

### subtitle burn-in vs sidecar SRT

YouTube + Vimeo accept .srt as separate tracks (best — viewers can disable). TikTok / Reels need burned-in. For burn-in, use FFmpeg's subtitles filter with a font that has glyph coverage for the target language (Noto Sans is the safe default).

## API reference

```
ElevenLabs Dubbing endpoint (one-shot)
```

Hosted pipeline that does transcribe + translate + synthesise + align in one call. Use this if the cost is fine; build the pipeline manually only when you need control over each stage.

```
const res = await fetch('https://api.elevenlabs.io/v1/dubbing', {
  method: 'POST',
  headers: { 'xi-api-key': process.env.ELEVEN_API_KEY!, 'content-type': 'application/json' },
  body: JSON.stringify({
    source_url: 'https://yourbucket.com/source.mp4',
    target_lang: 'es',
    source_lang: 'en',
    num_speakers: 1,
    watermark: false,
  }),
});
const { dubbing_id } = await res.json();
// poll GET /v1/dubbing/{id} until status === 'dubbed'
```

```
FFmpeg subtitle burn-in
```

Burn translated subs into the video as a track. The subs file should be SRT or ASS; ASS gives more typography control.

```
ffmpeg -i source.mp4 -vf "subtitles=subs_es.srt:force_style='FontName=Noto Sans,FontSize=20,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,Outline=2'" -c:v libx264 -crf 18 -c:a copy out_es.mp4
```

## Gotchas

- Don't translate idiom-heavy speech literally — instruct the LLM to localise meaning, not words. 'Bite the bullet' in Spanish is not 'morder la bala'.
- Voice clones trained on English can sound off in tonal languages (Mandarin, Vietnamese). Test before committing — sometimes a native preset voice in the target language wins.
- When subtitle text is longer than the source, drop a few words rather than time-stretch the audio. Viewers don't read 14-word subtitles in 2 seconds anyway.
- Numbers, units, and brand names should be locked in a glossary the LLM consumes per call — otherwise '$5,000' becomes '5000 dólares' in one chunk and 'cinco mil dólares' in another.

---
Generated by SkillMake from https://elevenlabs.io/docs/product/dubbing/overview on 2026-05-07T21:43:06.132Z.
Verify against source before relying on details.

File: ~/.claude/skills/video-translation-pipeline/SKILL.md