00 Technical Case Study

BodyScript

Video-to-pose pipeline built with MediaPipe and OpenCV.

BodyScript is a browser-based tool that turns uploaded video into structured pose data. The browser uploads a video and polls for progress while the backend extracts frames with OpenCV, runs MediaPipe Pose, writes landmark data to CSV, and returns a processed video with a skeleton overlay.

MediaPipe OpenCV FastAPI Python JavaScript Docker CSV Export

Processed video output — skeleton overlay generated from detected pose landmarks.

Personal project · Live demo · MediaPipe / Python / FastAPI

01 What It Does

What the project does

BodyScript takes an uploaded video and runs it through a pose estimation pipeline. The server extracts frames, detects skeletal landmarks using MediaPipe, and returns two outputs: a processed video with skeleton lines drawn over the original frames, and a CSV file containing landmark coordinates for each detected frame.

Input

Uploaded video

Processing

Frame-by-frame, server-side

Pose model

MediaPipe Pose (33 landmarks)

Landmarks

x/y/z + visibility per frame

Visual output

Skeleton overlay video

Data output

CSV per processed file

02 Processing Pipeline

How video moves through the server

The browser sends a video file to the FastAPI backend. The job runs as a background task: the server accepts the upload, returns a job ID, and the browser polls for completion instead of holding the request open.

OpenCV extracts the video into frames, MediaPipe Pose detects landmarks for each frame, and the results are written into a CSV file. The overlay step then reads that CSV, maps normalized landmark coordinates back to pixel positions, and draws the skeleton onto the video frames.

Browser Upload

file input + fetch POST

↓

FastAPI Job

BackgroundTasks — non-blocking

↓

Video Prep

validate, store temp file

↓

OpenCV Frame Extraction

cv2.VideoCapture — per-frame loop

↓

MediaPipe Pose

mp.solutions.pose — 33 landmarks per frame

↓

Pose Data CSV

x, y, z per landmark per frame

visual output

↓

Overlay video

data output

↓

CSV download

03 Overlay and Export

What the pipeline produces

Each processing job produces two outputs: a video with the skeleton overlay burned into the frames, and a CSV file containing the landmark data used to generate that overlay.

Overlay video

Original frames with skeleton lines and landmark dots drawn by OpenCV. Each detected landmark and skeleton connection is rendered from the coordinate data for that frame.

Pose data CSV

Landmark data exported per detected frame, including frame_id, landmark_id, normalized x/y/z coordinates, visibility, and detection strategy.

Browser display

Processed results are delivered back to the browser for playback and download. The browser plays the finished video; it does not draw the overlay with Canvas, SVG, or HTML.

04 Technical Decisions

Why these tools

Each tool in the stack has a narrow job. The choices reflect the scope of the project: a focused video-processing workflow rather than a broad platform.

MediaPipe Pose

Google's pre-trained pose model. No training data required. Detects 33 landmarks per frame, including positions for shoulders, elbows, wrists, hips, knees, and ankles.

OpenCV

Handles frame extraction and skeleton drawing. OpenCV reads the video, provides individual frames to MediaPipe, maps landmark coordinates back to pixel space, and writes the overlay frames.

FastAPI BackgroundTasks

Jobs run asynchronously without blocking the upload response. The browser submits the file, receives a job ID, and polls for completion while the backend processes the video.

Vanilla JavaScript

No framework on the frontend. The UI is a file picker, a status poller, and a video player. A framework would have added overhead without much benefit for this workflow.

Docker

Keeps the Python, OpenCV, MediaPipe, and ffmpeg runtime consistent across environments and reduces dependency drift during development and deployment.

05 Constraints and Tradeoffs

What this project does not do

BodyScript is intentionally scoped around one clear workflow: upload a short video, process it on the backend, then review and download the outputs.

Single-person detection. MediaPipe Pose detects one person per frame. Multi-person scenes are out of scope.

No real-time streaming. Video is uploaded and processed as a batch job, not streamed live.

Server-side only. All processing happens on the backend. No in-browser inference.

CSV is the only structured export. JSON and other formats were not implemented.

No authentication. The tool has no user accounts or saved project history.

06 Stack Summary

Stack

The stack is intentionally direct: a vanilla JavaScript browser client, a FastAPI backend, OpenCV and MediaPipe for video processing, ffmpeg for video preparation/output, and Docker for runtime consistency.

Frontend

JavaScript HTML/CSS

Backend

Python FastAPI BackgroundTasks

Computer Vision

MediaPipe Pose

Video Processing

OpenCV ffmpeg

Data

CSV Export Normalized Landmarks Visibility Scores

Runtime

Docker Render

Try the processing flow

BodyScript is a personal project focused on turning uploaded video into pose data and skeleton-overlay output.

Try BodyScript View Showcase