BodyScript
Video-to-pose pipeline built with MediaPipe and OpenCV.
BodyScript is a browser-based tool that turns uploaded video into structured pose data. The browser uploads a video and polls for progress while the backend extracts frames with OpenCV, runs MediaPipe Pose, writes landmark data to CSV, and returns a processed video with a skeleton overlay.
What the project does
BodyScript takes an uploaded video and runs it through a pose estimation pipeline. The server extracts frames, detects skeletal landmarks using MediaPipe, and returns two outputs: a processed video with skeleton lines drawn over the original frames, and a CSV file containing landmark coordinates for each detected frame.
How video moves through the server
The browser sends a video file to the FastAPI backend. The job runs as a background task: the server accepts the upload, returns a job ID, and the browser polls for completion instead of holding the request open.
OpenCV extracts the video into frames, MediaPipe Pose detects landmarks for each frame, and the results are written into a CSV file. The overlay step then reads that CSV, maps normalized landmark coordinates back to pixel positions, and draws the skeleton onto the video frames.
What the pipeline produces
Each processing job produces two outputs: a video with the skeleton overlay burned into the frames, and a CSV file containing the landmark data used to generate that overlay.
Original frames with skeleton lines and landmark dots drawn by OpenCV. Each detected landmark and skeleton connection is rendered from the coordinate data for that frame.
Landmark data exported per detected frame, including frame_id, landmark_id, normalized x/y/z coordinates, visibility, and detection strategy.
Processed results are delivered back to the browser for playback and download. The browser plays the finished video; it does not draw the overlay with Canvas, SVG, or HTML.
Why these tools
Each tool in the stack has a narrow job. The choices reflect the scope of the project: a focused video-processing workflow rather than a broad platform.
Google's pre-trained pose model. No training data required. Detects 33 landmarks per frame, including positions for shoulders, elbows, wrists, hips, knees, and ankles.
Handles frame extraction and skeleton drawing. OpenCV reads the video, provides individual frames to MediaPipe, maps landmark coordinates back to pixel space, and writes the overlay frames.
Jobs run asynchronously without blocking the upload response. The browser submits the file, receives a job ID, and polls for completion while the backend processes the video.
No framework on the frontend. The UI is a file picker, a status poller, and a video player. A framework would have added overhead without much benefit for this workflow.
Keeps the Python, OpenCV, MediaPipe, and ffmpeg runtime consistent across environments and reduces dependency drift during development and deployment.
What this project does not do
BodyScript is intentionally scoped around one clear workflow: upload a short video, process it on the backend, then review and download the outputs.
Stack
The stack is intentionally direct: a vanilla JavaScript browser client, a FastAPI backend, OpenCV and MediaPipe for video processing, ffmpeg for video preparation/output, and Docker for runtime consistency.
Try the processing flow
BodyScript is a personal project focused on turning uploaded video into pose data and skeleton-overlay output.