Project 3 — GenAI Educational Media Pipeline

Status: 🟢 Operational — Real avatar (Prof. Hahne) integrated. Hallo2 diffusion pipeline producing production-quality output. Institution: Hochschule Furtwangen (HFU), Faculty I: Computer Science & Applications Project Lead: Prof. Dr. Uwe Hahne

Personal notes

Setup

Docker was used to host the hermes agent
As the agent needs GPU acceleration, specific packages for docker nvidia GPU suport had to be installed
Discord setup as a communication channel was somewhat annoying to deal with using this setup, especially to trust a given account

Findings

To modify the agent on the VM, the docker container has to be accessed
I did not find it that intuitive to modify settings via docker cli + hermes cli
The GitHub classic token to push and open PRs had to be entered over and over again, as it did not store it, probably due to security concerns
Simple tasks can starve and manual intervention has to be done to try again
Overall amount of given tokens using the ollama cloud plan was sufficient for the entire experiment
Majority of the project instructions were done using the discord channel

Project

The entire project was executed by the agent
Slight adjustments had to be made
The agent managed to download models, setup the pipeline, a http tunnel and coding scripts on its own
A few hallucinations happened during interacting with the agent, for example the required inference time on the given hardware

Overall i was quite impressed by the capabilities of open source agents as it still managed to setup the entire project on its own.

VM Setup

Storage Configuration

Create and mount the additional storage disk:

sudo fdisk /dev/sdb
sudo mkfs.ext4 /dev/sdb1

sudo mkdir -p /mnt/storage
sudo mount /dev/sdb1 /mnt/storage
sudo chown -R lecture:lecture /mnt/storage

Persist mount configuration:

sudo blkid /dev/sdb1
sudo nano /etc/fstab
sudo mount -a

Example entry:

UUID=<UUID> /mnt/storage ext4 defaults,nofail 0 2

SSH Configuration

Enable OpenSSH and configure key-based authentication:

sudo mkdir -p /run/sshd
sudo chmod 755 /run/sshd

sudo systemctl start ssh
sudo systemctl enable ssh

Generate an SSH key on the local machine:

ssh-keygen -t ed25519
cat ~/.ssh/id_ed25519.pub

Add the public key to the VM:

nano ~/.ssh/authorized_keys

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

Docker Installation

Install Docker and required dependencies:

sudo apt update
sudo apt install -y ca-certificates curl gnupg git wget htop tmux unzip build-essential

Add the Docker repository and install Docker:

sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Enable Docker usage without sudo:

sudo usermod -aG docker $USER
newgrp docker

Verify installation:

docker run hello-world

NVIDIA Container Toolkit

Install and configure GPU support:

sudo apt install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify GPU passthrough:

docker run --rm --gpus all \
  nvidia/cuda:12.4.1-runtime-ubuntu22.04 \
  nvidia-smi

Docker Data Migration

Create a dedicated storage location:

sudo mkdir -p /mnt/storage/docker

Configure Docker:

sudo nano /etc/docker/daemon.json

{
    "data-root": "/mnt/storage/docker",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Restart Docker:

sudo systemctl restart docker

Verify configuration:

docker info | grep "Docker Root Dir"

Hermes Setup

Create persistent storage:

mkdir -p /mnt/storage/hermes

Run initial setup:

docker run -it --rm \
  --gpus all \
  -v /mnt/storage/hermes:/opt/data \
  nousresearch/hermes-agent setup

Configure environment variables:

nano /mnt/storage/hermes/.env

Example:

OLLAMA_API_KEY=...
DISCORD_ALLOWED_USERS=252538619787476993

Hermes Gateway Deployment

Configure the gateway:

docker run -it --rm \
  --gpus all \
  -v /mnt/storage/hermes:/opt/data \
  nousresearch/hermes-agent setup gateway

Start the persistent container:

docker run -d \
  --name hermes \
  --restart unless-stopped \
  --gpus all \
  -v /mnt/storage/hermes:/opt/data \
  -p 8642:8642 \
  nousresearch/hermes-agent gateway run

Verify operation:

docker logs hermes
docker ps

What is This?

An automated pipeline that generates educational lecture videos with a deepfake avatar of Prof. Dr. Uwe Hahne, TTS narration, and slide overlays. Designed for GenAI research and the Industrial Metaverse at HFU.

Live gallery: https://saved-carter-auditor-wanted.trycloudflare.com (ephemeral tunnel — may rotate)

Architecture

┌──────────────┐    ┌──────────────────────────────────────────────────┐
│   TTS Audio  │───→│ LivePortrait (pose-only) + Wav2Lip lip-sync     │  ← v4–v8
│  (edge-tts)  │    └──────────────────────────────────────────────────┘
└──────────────┘    ┌──────────────────────────────────────────────────┐
                    │ Hallo2 (audio-driven diffusion, native lip-sync)   │  ← v9+ ✅
                    └──────────────────────────────────────────────────┘
                                                                  ↓
                                                          FFmpeg Compose
                                                                 ↑
                                                      Slides (Chromium)

Pipeline stages: 1. Slide rendering — Chromium headless renders HTML slides to 1920×1080 PNG 2. TTS generation — edge-tts (Microsoft Edge neural voice, no API key needed) 3. Avatar generation — Hallo2 (audio-guided diffusion, native lip-sync + natural head motion) 4. Composition — FFmpeg overlays 350px avatar on slides + burns HFU logo + mixes audio 5. Gallery — Auto-generated index.html with per-video detail pages

Generated Versions

Version	Approach	Duration	Quality	Status
v4–v6	LivePortrait (expression) + Wav2Lip	97s	Mouth wildly flapping, extreme motion	❌
v7	LivePortrait (`--animation-region pose`) + Wav2Lip	97s	Too static, no eye movement, blurry lips	⚠️
v8	LivePortrait (`d0.mp4` natural idle) + Wav2Lip-SD-NOGAN	97s	Good idle motion, sharper lips	✅
v9 (Hallo2)	Hallo2 diffusion (20 steps, audio-driven)	97s	Best: natural motion + native lip-sync	✅ Preferred

Quick Start

# Activate environment
source venvs/lp-env/bin/activate

# LivePortrait + Wav2Lip pipeline (legacy, v4–v8)
python scripts/pipeline.py \
  --presentation presentations/videoretalking_presentation.json \
  --output assets/output/videoretalking_presentation_v8.mp4

# Hallo2 diffusion pipeline (preferred, v9+)
# 1. Generate slide audio (TTS) → hallo_full.yaml → inference
# 2. Run scripts/build_hallo2_presentation.py

See SETUP.md for full environment reproduction including Hallo2 install.

Documentation

Page	Description
LOG →	Full build log — what was implemented and when
AUDIO API →	Voice Agent integration spec
IMAGES →	Avatar / Image Agent integration spec

Current State

Component	Status	Notes
Avatar	✅ Real photo (`prof_hahne_v4_512.jpg`)	HFU profile photo, 512×512
Voice	✅ `edge-tts`	en-US-AriaNeural — natural, no API key
Head motion	✅ Hallo2 diffusion	Natural idle + blinks, audio-conditioned
Lip-sync	✅ Native Hallo2	No separate Wav2Lip needed
Gallery	✅ Auto-generated	Cache-busting headers, detail pages
HFU branding	✅ Logo overlay	Top-right corner on all outputs

Tech Stack

Layer	Tool
Avatar engine	Hallo2 (diffusion, audio-guided)
Legacy avatar	LivePortrait + Wav2Lip-SD-NOGAN
TTS	edge-tts (Microsoft Edge neural)
Slide renderer	Chromium headless (1920×1080)
Video composer	FFmpeg (overlay, concat, logo burn)
GPU	NVIDIA RTX A6000 (48 GB)
Python	3.10.14 (via `uv`)
Gallery	Auto-generated static HTML with cache-busting

Directory Overview

project03/
├── README.md              ← Landing page (project03/)
├── SETUP.md               ← Full reproduction guide
├── docs/                  ← MkDocs pages for the website
│   └── project03/
│       ├── index.md       ← This page (overview + status)
│       ├── log.md         ← Build history
│       ├── audio_api.md   ← Voice API contract
│       └── images.md      ← Image/avatar API contract
├── scripts/
│   ├── pipeline.py              ← LivePortrait + Wav2Lip pipeline
│   ├── build_hallo2_presentation.py  ← Hallo2 + slide composition
│   ├── generate_index.py        ← Gallery + detail page generator
│   ├── start_server.sh          ← Local gallery server
│   └── start_tunnel.sh          ← Cloudflare tunnel
├── presentations/
│   └── videoretalking_presentation.json
├── assets/
│   ├── avatars/           ← prof_hahne_v4_512.jpg
│   ├── audio/             ← TTS outputs
│   ├── slides/            ← HTML sources + rendered PNGs
│   └── output/            ← Final MP4s + gallery index
├── LivePortrait/          ← Deepfake engine + weights
├── wav2lip/               ← Lip-sync engine + checkpoints
└── Hallo2/                ← Diffusion avatar engine (external)

Gallery

The pipeline auto-generates: - index.html — dark-mode gallery with video cards + cache-busting meta tags - detail/{video}.html — per-video pages showing slide list, voice used, metadata, raw JSON - Filter rules — skips _looped.mp4 intermediates, skips videos without JSON manifests

Gallery server runs on port 8888.

Prof. Dr. Uwe Hahne @ Hochschule Furtwangen | GenAI Educational Media Project