How in-browser PDF to Markdown conversion works

This converter does something most online tools do not: it never uploads your file. Everything — parsing, structure detection, and Markdown generation — happens in your browser using PDF.js, the same open-source PDF engine that ships inside Firefox. Here is what happens under the hood.

1. The PDF is parsed locally

When you drop a file, the browser reads it into memory and hands it to PDF.js. The heavy parsing runs inside PDF.js’s own worker thread, so the page stays responsive even on large documents. No network request is made with your file.

2. Text is extracted with position data

A PDF is not a structured document — it is a set of drawing instructions that place glyphs at specific (x, y) coordinates. We extract every text fragment along with its position, width, and font, because that geometry is the only way to recover structure.

3. Structure is reconstructed from geometry

The serializer turns coordinates back into meaning:

Headings — font sizes are clustered; the largest sizes become #, ##, and so on.
Lists — lines beginning with bullet glyphs or numbers become Markdown list items.
Tables — column edges are detected by aligning item x-positions across consecutive rows.
Inline styles — bold, italic, and monospace are inferred from font names.

4. You get clean Markdown

The result is standard, GitHub-flavored Markdown you can preview, copy, or download as a.md file. Because none of it touches a server, the tool also works offline once the page has loaded.

What it does not do (yet)

Scanned PDFs are images of text with no text layer to extract. Rather than produce garbled output, the converter detects this and tells you clearly. Optical character recognition (OCR) via WebAssembly is on the roadmap, along with multi-column layout handling and an API for automated pipelines.

Ready to try it? Convert a PDF now — or read the FAQ.