Vol. 1 · Issue 01 · Research toolkit

Link integrity tools for research.

Verify that every URL in your bibliography, dataset, source list, or citation export actually resolves — and if it doesn't, find the closest archived snapshot. Built for academics, librarians, journalists, and UX researchers.

¹ Implements HEAD → GET fallback, follows up to 10 redirects, 8s timeout per URL, 8 concurrent workers. DOI, arXiv, PubMed, and Wayback Machine links are recognized automatically.

Batch size≤ 500URLs / run
Timeout8sper URL
Concurrency8workers
Data retained0bytes
§ 02

Who uses it

01

Academic researchers

Reference lists, supplementary materials, replication packages.

DOI rot and dead publisher links creep into manuscripts between drafts.

02

Librarians & editors

Subject guides, journal references, institutional repositories.

Quarterly link audits across thousands of curated URLs.

03

UX & market researchers

Source lists, survey panels, competitor URLs, research repos.

Stakeholders open the report a quarter later — half the citations 404.

04

Journalists & OSINT

Source verification, archived evidence, investigation links.

Adversaries delete content; you need to catch and archive it fast.

§ 03

Methodology

What happens when you submit a batch — start to finish, no black boxes.

  1. 01

    Ingest

    Paste URLs (one per line), upload a CSV/TXT, or drop in BibTeX/RIS — we extract the URL fields. Bare DOIs and arXiv IDs are auto-prefixed.

  2. 02

    Classify

    Each entry is tagged as URL, DOI, arXiv, PubMed, or Wayback via deterministic regex — no network calls, no LLM, no privacy footprint.

  3. 03

    Verify

    Server-side fetch: HEAD first, then GET with Range fallback when servers reject HEAD. Follows redirects, 8s timeout, browser-like UA, 8 concurrent workers.

  4. 04

    Recover

    Every failing URL ships with a one-click Wayback Machine snapshot lookup. Export the broken-link report as CSV or Markdown footnotes for supplementary materials.

¹ HEAD requests are tried first because they're bandwidth-cheap; many CDNs return 403/405 to HEAD (Cloudflare, Akamai), in which case we retry GET with a 1KB Range header. Soft-404 pages that return 200 OK with empty bodies are not yet detected — that is a known limitation, documented in the FAQ.

§ 05

The toolkit

Six stateless tools covering the full reference-integrity workflow — from extraction to verification to archival recovery.

§ 01
● live

URL Checker

Verify every link in a reference list

Paste or upload up to 500 URLs, DOIs, or arXiv IDs. Each one is fetched server-side and classified by HTTP status, with Wayback fallback links for anything broken.

  • DOI, arXiv, PubMed detection
  • HEAD → GET, follows redirects
  • Wayback archive links on failure
Open tool
§ 02
● live

Citation Extractor

Pull URLs from BibTeX or RIS

Drop a .bib or .ris file and extract every URL field into a clean, deduplicated list ready to feed into the URL Checker.

  • BibTeX & RIS parsing
  • DOI normalization
  • Deduplication
Open tool
§ 03
● live

DOI Resolver Audit

Catch DOIs that resolve to a 404

DOIs can resolve successfully but land on a missing publisher page. We follow the full chain and flag the silent failures.

  • Full doi.org chain
  • Publisher 404 detection
  • Crossref metadata
Open tool
§ 04
● live

Wayback Snapshot Finder

Best archive.org snapshot per URL

For any URL, find the closest Wayback Machine snapshot to a target date — useful for replacing rotted links in manuscripts.

  • Closest-to-date lookup
  • Bulk batch mode
  • Citation-ready URLs
Open tool
§ 05
● live

Reference List Diff

Compare two reference lists

Diff two bibliographies and see what was added, removed, or changed between manuscript versions or review rounds.

  • DOI-aware matching
  • Side-by-side view
  • Export as table
Open tool
§ 06
● live

Source Repository Mapper

Map sources to archives & repos

Group source URLs by hosting platform (GitHub, OSF, Zenodo, Dataverse, journal) so you can audit reproducibility coverage at a glance.

  • Platform classification
  • Reproducibility score
  • CSV export
Open tool
§ 04

Frequently asked

Q01Do you store the URLs, DOIs, or citation lists I submit?+

No. The URL Checker runs a stateless server function — your input is fetched, classified, and discarded once results are returned. Nothing is logged, persisted, or sent to third parties.

Q02How do you handle DOIs and arXiv IDs?+

Bare DOIs (10.xxxx/...) and arXiv IDs (e.g. arXiv:2301.01234) are auto-prefixed to https://doi.org/ and https://arxiv.org/abs/ respectively before checking. The full doi.org redirect chain is followed, so you see the final publisher URL and its status.

Q03Will you find Wayback Machine snapshots for broken links?+

Yes — every failing URL surfaces a one-click 'Find archive' link that opens the Wayback Machine's snapshot history for that URL in a new tab. We don't yet call the Wayback Availability API to embed a specific snapshot; that's planned.

Q04Can I check more than 500 URLs at once?+

Not in a single request — 500 is the per-batch cap. For larger bibliographies, split the list in two and run them sequentially. The 8-worker concurrency pool finishes a full 500 in well under a minute on average.

Q05What's the difference between this and a generic broken-link checker?+

Two things. First, classification — we recognize DOIs, arXiv, PubMed, and Wayback URLs and treat them appropriately. Second, recovery — broken results come with archive lookups and Markdown-footnote exports ready to paste into a manuscript.

Q06Does the tool detect soft-404s (pages that return 200 OK but are actually missing)?+

Not currently. We only inspect HTTP status codes and redirect chains. A page that returns 200 with the publisher's 'article not found' template will be marked OK. Detecting soft-404s requires content scraping and is on the roadmap.

Q07Can I upload BibTeX or RIS files?+

BibTeX/RIS extraction is in the URL Checker today as a paste tab — it pulls URL fields with regex. A proper parser handling exotic entries (the dedicated Citation Extractor tool) is coming.

Q08Is there an API?+

Not yet. If you have a reproducibility audit workflow that would benefit from one, get in touch via the about page.

§ 06 · Built for reproducibility

Stop shipping reference lists you haven't verified.

Paste your list, get a structured report in under a minute, and an archive link for everything broken.

Open URL Checker