1.2 Control a Talking Head with Hand Movements (mini project)

TL;DR

In this tutorial, we combine multiple videos inside a single Max patch. We introduce layering, compositing, and more advanced interaction. We use a monitor screen to replace the performer’s head and we control the video with hand movements.

Get the Max patch HERE

If you prefer text, below you find a list of the video’s key concepts, with images.

Overview

The core idea evolves from controlling a single video to building a layered composition:

A main video (talking head) is controlled by movement
A second video (loading animation) is added on top
Both videos are combined in a 3D space
Each layer can be positioned, scaled, and blended
Interaction can differ between layers (e.g. inverted behavior)

This turns a simple interaction into a more complex visual system.

What we use

Max 9
ste.snips snippet package
Monitor screen + stand
Sound Speakers
Lights
Gesticulating body

Key Max snippets:

ste.pixVideoLoop → loads and loops videos
ste.3dLayer → places videos in a output window
ste.inOut scale → maps movement values
ste.pixFxBlur → adds blur visual effects
ste.scenes → saves patch state

Videos

Preparing the Video Content

The main video is a looping talking-head clip. To make looping seamless:

The first and last frames should be visually similar
The video is slowed down to allow smoother interaction at low speed
A simple action (reading a recipe) helps maintain continuity

Lighting and setup:

Black background for clarity
Side lighting to isolate the subject
Portrait orientation for vertical screens

Adding a Second Video

A second video (loading animation) is introduced to enrich the composition.

Why add it?

Fill unused screen space
Create visual contrast
Add a second layer of interaction

This requires moving from a single output to a layered system.

More Patching

ste.3dLayer

Instead of sending video directly to the output, each video is placed inside a ste.3dLayer.

This allows:

Positioning (X/Y)
Scaling (size control)
Layer ordering (front/back)

Each video becomes a movable plane in a shared 3D space.

Layer Order

Since depth positioning (Z) is ignored by default in the ste.3dLayer, layering is controlled manually:

Higher layer value → appears in front
Lower layer value → appears behind

This ensures consistent visual stacking.

Blend Modes

Layers can be visually combined using blend modes.
For Example using Add brightens and merges colors between layers.
This creates more integrated and dynamic visuals.

Interaction

Main Video

Controlled by movement speed
More movement → faster playback
No movement → video stops

Second Video (Inverted)

The second video interaction is inverted:

Movement → slows down
Stillness → speeds up

This is achieved by duplicating the ste.outScale system and inverting the output values with the “invert” toggle. This contrast creates a more variation in the interaction.

Fine-Tuning Visuals

Speed Scaling

Different videos require different speed ranges: Slow-motion footage needs higher max speed while for normal footage lower values are enough
Always adjust ste.outScale per video.

Blur Effect

Low-quality or compressed videos can appear pixelated.

Using ste.pixFxBlur:

Softens harsh pixels
Improves perceived quality
Adds aesthetic cohesion

Saving Your Patch State

Using ste.scenes, you can store the state of your patch:

Loaded videos
Parameter values
Interaction settings

This is especially useful if you can’t save the patch itself.

Workflow:

Store a scene
Write it to a file (.json)
Reload it later

Setup Considerations

Camera Calibration

Movement input depends on:

Camera distance
Angle
Lighting

Always recalibrate using inscale for each setup.

Camera Placement

Avoid pointing the camera at the screen to prevent feedback loops and to ensure clean motion tracking

Key Takeaway

The extended workflow becomes:

Camera → Movement → Normalize → Scale → Control Multiple Layers → Compose → Render

Once you understand this, you can:

Build multi-layered visual systems
Create contrasting interactions
Design more complex real-time visuals

NEXT: 1.3 Control a Video Mapping with Movement (mini project)

PLOC’ is realized with the support of MA7 (Cultural Department of the City of Vienna)

STEFANO D'ALESSIO

Overview

What we use

Videos

Preparing the Video Content

Adding a Second Video

More Patching

ste.3dLayer

Layer Order

Blend Modes

Interaction

Main Video

Second Video (Inverted)

Fine-Tuning Visuals

Speed Scaling

Blur Effect

Saving Your Patch State

Setup Considerations

Camera Calibration

Camera Placement

Key Takeaway

NEXT: 1.3 Control a Video Mapping with Movement (mini project)