Subtitle Attachment Cleanup
A Python script to automatically clean up unnecessary font attachments from MKV video files.
When you remove unwanted subtitle streams (like foreign languages) from an MKV using tools like MKVToolNix, the font files attached to those removed subtitles are typically left behind, needlessly inflating the file size. This script reads your MKV files, thoroughly inspects the remaining SSA/ASS subtitle tracks to discover which fonts are actually being used, and builds a new MKV file—leaving behind any orphaned or unused font attachments.
Features
- Intelligently extracts and parses
.ass/.ssasubtitle tracks from MKV containers. - Identifies both top-level font declarations (in
[V4+ Styles]) and inline font overrides (in[Events]). - Uses deep metadata scanning (
fonttools) to accurately match requested font families against attached font files, even if the attached files have cryptic filenames (e.g.,arialbd.ED3587CD.ttf). - Safely preserves all non-font attachments (like cover images).
- Automatically moves original MKVs to an
original/backup folder and places the cleaned files in afinished/folder.
Prerequisites
Ensure the following tools and libraries are installed and accessible in your system's PATH:
- Python 3.x
- MKVToolNix (specifically
mkvmergeandmkvextract) - fonttools (Python library)
Install the required Python dependency:
For Windows or Ubuntu, you can use pip:
pip install fonttools
For Arch Linux (which enforces PEP 668), you should use pacman to install the system package:
sudo pacman -S python-fonttools
Usage
Simply place the script inside the directory containing the .mkv files you wish to process and run it. You can also place the script in your personal bin or PATH folder to run it from anywhere.
python subtitle_fonts_cleaner.py
# If in your PATH, simply execute: subtitle_fonts_cleaner.py
This is the main script and intended default workflow for batch cleanup.
Folder Structure
Upon execution, the script will create three folders in your working directory:
temp_subs_fonts/- A temporary directory used during processing (automatically deleted upon completion).original/- Your original, unmodified.mkvfiles are safely moved here.finished/- The new, lean.mkvfiles containing only the active ASS tracks, required font attachments, and original audio/video streams.
Supplemental Script: Font Scanner (Read-Only)
This repository also includes subtitle_fonts_scanner.py, a companion script for inspection and reporting.
Use the scanner when you want a dry-run style check before cleaning. It does not modify files and does not create output folders.
What the scanner reports
- Number of ASS/SSA subtitle tracks detected
- Number of embedded font attachments
- Which fonts are required by subtitle styles and inline
\fnoverrides - Which required fonts are covered by current attachments
- Which fonts are missing
- Which embedded font attachments appear unused
Scanner usage
Run it against a single MKV file:
python subtitle_fonts_scanner.py "input.mkv"
# If in your PATH, simply execute: subtitle_fonts_scanner.py "input.mkv"
Sample output
Example (truncated):
Scanning: Example Episode 01.mkv
──────────────────────────────────────────────────────────────────────
ASS/SSA subtitle tracks : 2
Font attachments : 15
ASS tracks parsed:
Track 2 [eng]: 1 font(s) referenced
Track 3 [ger]: 3 font(s) referenced
FONTS NEEDED BY SUBTITLES (4 total)
──────────────────────────────────────────────────────────────────────
[OK] arial
[OK] gandhi sans
[MISSING] georgia bold
[OK] times new roman bold
FONTS EMBEDDED IN MKV (15 file(s))
──────────────────────────────────────────────────────────────────────
[USED] ARIALNB.TTF -> covers: arial
[EXTRA] AdobeArabic-Bold.otf
...
MISSING FONTS (1 font(s) not embedded)
──────────────────────────────────────────────────────────────────────
✘ georgia bold
EXTRA / UNUSED EMBEDDINGS (10 file(s) not needed by any subtitle)
──────────────────────────────────────────────────────────────────────
⚠ AdobeArabic-Bold.otf
⚠ comic.ttf
...
Typical workflow
- Run
subtitle_fonts_scanner.pyon a file to preview needed vs unused fonts. - Run
subtitle_fonts_cleaner.pyto process all MKVs in the working directory. - Optionally run the scanner again on a cleaned file to verify the result.
License
MIT License. See the LICENSE file for more details.