BodyScript

Video-to-pose pipeline built with MediaPipe and OpenCV.

BodyScript is a browser-based tool that turns uploaded video into structured pose data. The browser uploads a video and polls for progress while the backend extracts frames with OpenCV, runs MediaPipe Pose, writes landmark data to CSV, and returns a processed video with a skeleton overlay.

MediaPipe OpenCV FastAPI Python JavaScript Docker CSV Export
Processed video output — skeleton overlay generated from detected pose landmarks.
Personal project · Live demo · MediaPipe / Python / FastAPI

What the project does

BodyScript takes an uploaded video and runs it through a pose estimation pipeline. The server extracts frames, detects skeletal landmarks using MediaPipe, and returns two outputs: a processed video with skeleton lines drawn over the original frames, and a CSV file containing landmark coordinates for each detected frame.

Input
Uploaded video
Processing
Frame-by-frame, server-side
Pose model
MediaPipe Pose (33 landmarks)
Landmarks
x/y/z + visibility per frame
Visual output
Skeleton overlay video
Data output
CSV per processed file

How video moves through the server

The browser sends a video file to the FastAPI backend. The job runs as a background task: the server accepts the upload, returns a job ID, and the browser polls for completion instead of holding the request open.

OpenCV extracts the video into frames, MediaPipe Pose detects landmarks for each frame, and the results are written into a CSV file. The overlay step then reads that CSV, maps normalized landmark coordinates back to pixel positions, and draws the skeleton onto the video frames.

Browser Upload
file input + fetch POST
FastAPI Job
BackgroundTasks — non-blocking
Video Prep
validate, store temp file
OpenCV Frame Extraction
cv2.VideoCapture — per-frame loop
MediaPipe Pose
mp.solutions.pose — 33 landmarks per frame
Pose Data CSV
x, y, z per landmark per frame
visual output
Overlay video
data output
CSV download

What the pipeline produces

Each processing job produces two outputs: a video with the skeleton overlay burned into the frames, and a CSV file containing the landmark data used to generate that overlay.

Overlay video

Original frames with skeleton lines and landmark dots drawn by OpenCV. Each detected landmark and skeleton connection is rendered from the coordinate data for that frame.

Pose data CSV

Landmark data exported per detected frame, including frame_id, landmark_id, normalized x/y/z coordinates, visibility, and detection strategy.

Browser display

Processed results are delivered back to the browser for playback and download. The browser plays the finished video; it does not draw the overlay with Canvas, SVG, or HTML.

Why these tools

Each tool in the stack has a narrow job. The choices reflect the scope of the project: a focused video-processing workflow rather than a broad platform.

MediaPipe Pose

Google's pre-trained pose model. No training data required. Detects 33 landmarks per frame, including positions for shoulders, elbows, wrists, hips, knees, and ankles.

OpenCV

Handles frame extraction and skeleton drawing. OpenCV reads the video, provides individual frames to MediaPipe, maps landmark coordinates back to pixel space, and writes the overlay frames.

FastAPI BackgroundTasks

Jobs run asynchronously without blocking the upload response. The browser submits the file, receives a job ID, and polls for completion while the backend processes the video.

Vanilla JavaScript

No framework on the frontend. The UI is a file picker, a status poller, and a video player. A framework would have added overhead without much benefit for this workflow.

Docker

Keeps the Python, OpenCV, MediaPipe, and ffmpeg runtime consistent across environments and reduces dependency drift during development and deployment.

What this project does not do

BodyScript is intentionally scoped around one clear workflow: upload a short video, process it on the backend, then review and download the outputs.

Single-person detection. MediaPipe Pose detects one person per frame. Multi-person scenes are out of scope.
No real-time streaming. Video is uploaded and processed as a batch job, not streamed live.
Server-side only. All processing happens on the backend. No in-browser inference.
CSV is the only structured export. JSON and other formats were not implemented.
No authentication. The tool has no user accounts or saved project history.

Stack

The stack is intentionally direct: a vanilla JavaScript browser client, a FastAPI backend, OpenCV and MediaPipe for video processing, ffmpeg for video preparation/output, and Docker for runtime consistency.

Frontend
JavaScript HTML/CSS
Backend
Python FastAPI BackgroundTasks
Computer Vision
MediaPipe Pose
Video Processing
OpenCV ffmpeg
Data
CSV Export Normalized Landmarks Visibility Scores
Runtime
Docker Render

Try the processing flow

BodyScript is a personal project focused on turning uploaded video into pose data and skeleton-overlay output.