Client-side JavaScript library for automatic document capture from a live camera feed or a static image. Designed as the capture front-end for OCR and identity-verification pipelines (Microblink, ID R&D, and similar).
| Capability | Detail |
|---|---|
| Zero dependencies | Pure vanilla JS, single self-contained UMD file (src/docusnap.js) |
| Module formats | window.DocuSnap (browser script tag), CommonJS, AMD |
| Document detection | Custom Hough-line corner detector with dual-zone Gaussian blur, directional morphology, and connected-component filtering for robust edge detection on textured backgrounds |
| Corner tracking | Kalman-smoothed bounding box (Q=0.01, R=12.0) with inter-frame velocity prediction for 60 fps rendering; spatial-continuity bonus and geometry-validated line tracking across frames |
| Perspective correction | Inverse homography (DLT) with bilinear interpolation; output at native detected dimensions (≤ 4K cap) with configurable margin |
| Quality gate | Real-time checks for sharpness (Laplacian variance), brightness, glare, document size, corner margin — all configurable via 0–100 thresholds |
| Best-frame selection | Rolling buffer of the last 10 full-resolution frames; on capture trigger, the frame with the highest composite score (sharpness 40%, edge confidence 30%, brightness 15%, low glare 15%) is selected |
| Face detection | Shape Detection API (native, where available) with pico.js cascade fallback loaded on demand from CDN |
| Multi-side scanning | 1 or 2 sides; each side independently configurable (document type, quality thresholds, face requirement) |
| Portrait + landscape | document and any types accept portrait-orientation documents (aspect ratio ≥ 0.55); id type enforces landscape only |
| Capture modes | smart (auto-detects camera availability), auto (fully automatic quality-gated), manual (user-triggered shutter), file (WebView / file-input fallback) |
| Output | Blob images (full-frame JPEG 0.95 + perspective-corrected crop), normalized quality scores 0–100, face presence + confidence, corner coordinates, capture metadata |
| Mobile-first | Portrait + landscape; requests up to 4K from camera; only downscales if source exceeds 4K |
| Overlay | Live canvas overlay rendered by the library; callers receive clean callbacks with no DOM coupling beyond the <video> and <canvas> elements |
| Debug visualization | Optional pipeline debug mode renders intermediate stages (Gaussian blur, Sobel gradients, Otsu threshold, Hough lines with tracked lines) on the preview canvas |
petermartis.github.io/docuSnap/demo/
Include src/docusnap.js directly — no build step required.
<script src="src/docusnap.js"></script>
<video id="video" playsinline muted></video>
<canvas id="canvas"></canvas>
const snap = new DocuSnap({
captureMode: 'smart', // auto-routes between camera and file fallback
sides: 1,
quality: { sharpness: 40, brightness: 40, glare: 18, size: 40 },
onCapture: (result) => {
// result.image — Blob, full-resolution JPEG frame
// result.documentImage — Blob, perspective-corrected crop
// result.corners — { topLeftCorner, topRightCorner, … }
// result.quality — { sharpness, brightness, glare, size, failing[] }
// result.face — { present, confidence, bounds } | null
console.log('Captured', result);
},
onFrame: (frame) => {
// frame.instructionCode — e.g. 'MOVE_CLOSER', 'HOLD_STILL'
// frame.hint — human-readable string
// frame.hintEscalated — true when same hint has repeated for ~0.5 s
// frame.quality — same shape as onCapture quality
// frame.state — 'searching' | 'confirming' | 'captured'
},
});
navigator.mediaDevices.getUserMedia({ video: { facingMode: 'environment' } })
.then((stream) => {
const video = document.getElementById('video');
video.srcObject = stream;
video.play();
snap.start(video, document.getElementById('canvas'));
});
For the full API reference see API.md.
Camera frame (up to 4K)
│
▼
┌───────────────────────────────────────────────────────┐
│ DETECTION (downscaled to 480 px wide internally) │
│ │
│ Grayscale │
│ → Dual-zone Gaussian blur │
│ • double-pass 5×5 outside center 55% ROI │
│ • single-pass 5×5 inside center │
│ → Sobel edge detection (Otsu auto-threshold × 0.65) │
│ → Directional morphology │
│ • horizontal (5×1) + vertical (1×5) line opening │
│ • union of both orientations │
│ → Connected-component filter (remove < 80 px blobs) │
│ → Hough line transform (180 angles) │
│ • line family classification (A / B clusters) │
│ • weaker family boost (2.5×) │
│ → Quad candidate search (top 10 H × top 10 V) │
│ • vanishing-point perspective check │
│ • convexity + diagonal-ratio guard │
│ • edge-support scoring (H-sides weighted 2×) │
│ • outer-line preference (wider pair separation) │
│ • spatial-continuity bonus (tracked line prox.) │
│ • aspect, area, center scoring │
│ • parallelogram corner correction │
│ → Line tracking with geometry validation │
│ • parallel pair < 15°, perpendicular > 50° │
│ • confidence gate ≥ 0.20, decay after 5 misses │
│ → Kalman smoothing (8 × KalmanFilter1D) │
│ • Q = 0.01, R = 12.0 │
│ • inter-frame velocity prediction at 60 fps │
│ → Quality gate │
│ • Laplacian variance (sharpness) │
│ • Mean luminance (brightness) │
│ • Bright-pixel ratio (glare) │
│ • Width coverage (document size) │
│ • Corner margin check │
└───────────────┬───────────────────────────────────────┘
│ corners + quality report
▼
┌───────────────────────────────────────────────────────┐
│ DISPLAY CANVAS (max 720 px longest side) │
│ drawImage video directly (no getImageData) │
│ spotlight overlay (evenodd clip path) │
│ 3-second initial suppression (no bbox on start) │
│ corners rendered with Kalman + velocity prediction │
└───────────────┬───────────────────────────────────────┘
│ state machine
│ DETECTING → STAY_STILL → CAPTURED
▼
┌───────────────────────────────────────────────────────┐
│ AUTOCAPTURE │
│ 10 consecutive quality-passing frames required │
│ Rolling buffer: last 10 full-res frames retained │
│ Best-frame selection on trigger: │
│ sharpness 40% + edge confidence 30% │
│ + brightness 15% + low glare 15% │
│ → extractPaper() on the winning frame │
│ inverse homography (DLT) + bilinear interpolation │
│ output = native detected dimensions (≤ 4K cap) │
│ → Blob (JPEG 0.95) × 2 (full frame + corrected crop)│
└───────────────────────────────────────────────────────┘
│
▼
onCapture(CaptureResult)
When a document is captured, its four detected corners are used to compute an inverse homography matrix (Direct Linear Transform). Each pixel in the output image is backward-mapped to the source frame using this matrix and sampled with bilinear interpolation. The result is a geometrically corrected, front-facing view of the document at its native resolution (up to a 4K cap). A 25% margin is added around the document region to preserve surrounding context.
Detection — The Hough-line detector runs at ~10–15 fps on a 480 px-wide downscaled frame. When a valid quad passes all geometric checks (aspect ratio, convexity, perspective consistency), its corners are smoothed through 8 independent Kalman filters (one per x/y coordinate).
Quality gate — Every frame with detected corners is scored for sharpness, brightness, glare, and document size. If all four checks pass, a consecutive-good-frames counter increments.
Stay-still countdown — After 10 consecutive passing frames, the library enters a brief hold-still phase to confirm stability.
Best-frame selection — Throughout detection, a rolling buffer retains the last 10 full-resolution camera frames along with their quality metrics. At capture time, the frame with the highest composite score (sharpness 40%, edge-support confidence 30%, brightness 15%, low glare 15%) is selected — not necessarily the trigger frame.
Extraction — The winning frame is perspective-corrected using extractPaper() and delivered as a high-quality JPEG Blob alongside the uncropped full-frame image.
The onFrame callback receives an instructionCode from DocuSnap.InstructionCode:
| Code | Meaning |
|---|---|
SEARCHING |
No document rectangle detected |
MOVE_CLOSER |
Document is too small in frame |
CENTER_DOCUMENT |
Corners touching or outside frame edges |
SHARPEN |
Laplacian variance below threshold (motion blur / out of focus) |
REDUCE_GLARE |
Too many bright pixels within document bounds |
IMPROVE_LIGHTING |
Mean luminance below threshold |
HOLD_STILL |
Quality passed; waiting for stay-still countdown |
CONFIRMED |
All checks passed |
const snap = new DocuSnap({
sides: 2,
sideConfig: [
{ documentType: 'id', quality: { sharpness: 50 }, face: { detect: true, requirePresent: true } },
{ documentType: 'id', quality: { sharpness: 50 } },
],
onCapture: (result) => {
console.log(`Side ${result.sideIndex + 1} of ${result.sidesTotal}`);
if (!result.isLastSide) snap.nextSide();
},
// ...
});
documentType |
Min aspect | Max aspect | Portrait allowed | Intended use |
|---|---|---|---|---|
'id' |
1.2 | 1.8 | No | ISO/IEC 7810 ID-1 cards, credit cards |
'document' |
0.55 | 1.6 | Yes | A4 / letter sheets (portrait 0.707 or landscape 1.414) |
'any' |
0.55 | 2.2 | Yes | General-purpose, accepts both orientations |
When portrait is allowed (aspect < 1.0), the rotation-consistency check is automatically skipped to avoid rejecting correctly detected portrait documents.
All values are 0–100. Internally the library maps them to raw metric ranges.
| Field | Default | What it gates |
|---|---|---|
sharpness |
40 | Laplacian variance (blur detection) |
brightness |
40 | Mean pixel luminance |
glare |
18 | Fraction of overexposed pixels within document quad |
size |
40 | Document width coverage as fraction of frame |
Internally, quad candidates are ranked by a weighted composite score:
| Component | Weight | Description |
|---|---|---|
| Edge support | 0.40 | Fraction of quad edges supported by actual Sobel edges (H-sides weighted 2×) |
| Aspect score | 0.20 | Proximity to known document aspect ratios |
| Outer score | 0.15 | Rewards wider line-pair separation (rejects inner photo borders) |
| Area score | 0.15 | Rewards larger document coverage (min(areaFrac / 0.45, 1.0)) |
| Center score | 0.10 | Proximity of quad center to frame center |
| Proximity bonus | +0.20 | Bonus for quads whose lines are within 70 px of previously tracked lines |
| Browser | Camera capture | File-input fallback | Face detection |
|---|---|---|---|
| Chrome / Edge 88+ | Yes | Yes | Shape Detection API (native) |
| Firefox 90+ | Yes | Yes | pico.js CDN fallback |
| Safari 15.4+ | Yes | Yes | pico.js CDN fallback |
| iOS Safari (real browser) | Yes | Yes | pico.js CDN fallback |
| Android WebView / in-app browser | No (auto-detected) | Yes | pico.js CDN fallback |
| iOS WKWebView / in-app browser | No (auto-detected) | Yes | pico.js CDN fallback |
DocuSnap.detectCapability() returns 'auto' or 'file' at runtime so you can branch before calling start().
Evaluation / Commercial — contact peter.martis@gmail.com for licensing terms.