legal-sources

worldwidelaw/legal-sources
285 starsAGPL-3.0Community

Install to Claude Code

This server doesn't publish a one-line install command. Follow the setup in the source repository.

Summary

Search 18M+ legal documents worldwide — case law, legislation, and doctrine across 110+ countries.

README.md

World Wide Law

Open-source collection scripts for open legal data from 110+ countries.

Every country publishes its laws, court decisions, and regulations online -- but in different formats, behind different APIs, with different access rules. World Wide Law is building the open infrastructure to collect, normalize, and make all of it searchable.

All sources in this repository are open data -- publicly available legal information from official government portals, APIs, and bulk download endpoints. We always prefer API and bulk access over web extraction.

Live Dashboard & API

What's Here

This repository contains 960+ collection scripts across 110+ countries that download and normalize open legal data from government portals worldwide. Each script follows a standard interface so that any developer can run, test, or improve it. Some sources are marked as blocked (CAPTCHA, IP restrictions, etc.) -- their scripts are included so developers can review and potentially contribute fixes.

sources/
  FR/LegifranceCodes/     # French consolidated legal codes (API)
  DE/GesetzeImInternet/   # German federal laws (bulk XML)
  IT/NormattivaLegislation/ # Italian legislation (API)
  ES/BOE/                 # Spanish official gazette (API)
  ... (110+ countries)

Quick Start

# Clone the repo
git clone https://github.com/worldwidelaw/legal-sources.git
cd legal-sources

# Install dependencies
pip install -r requirements.txt

# Check project status
python runner.py status

# Test a specific source
python runner.py sample FR/LegifranceCodes

# See what needs work
python runner.py next

How It Works

Per-Source Structure

Every source lives in sources/{COUNTRY_CODE}/{SourceName}/ and contains:

| File | Purpose | |------|---------| | bootstrap.py | Collection script -- implements fetch_all(), fetch_updates(), normalize() | | config.yaml | Source metadata, access method, rate limits, schema | | sample/ | 10+ sample documents for validation | | README.md | Documentation about the data source | | .env.template | Required API keys or credentials (if any) | | retrieve.py | Reference resolver (e.g., "article 1240 code civil" -> document) |

Two Data Models

Legislation (mutable): Laws get amended. Same ID, new content. Strategy: upsert with version tracking.

Case law (immutable): Court decisions don't change after publication. Strategy: append-only with dedup.

Standard Output Schema

Every script normalizes documents to a common schema:

  • _id -- Unique identifier
  • _source -- Source identifier (e.g., FR/LegifranceCodes)
  • _type -- legislation or case_law
  • title -- Document title
  • text -- Full text content
  • date -- Publication or decision date
  • url -- Link to the original source

Architecture

legal-sources/
  manifest.yaml          # Master inventory: all sources + status
  runner.py              # CLI: run, test, and manage collection scripts
  common/                # Shared libraries
    base_scraper.py        Base class all scripts inherit from
    http_client.py         HTTP client with retries + caching
    rate_limiter.py        Token bucket rate limiter
    storage.py             JSONL storage with deduplication
    validators.py          Schema validation
  templates/             # Templates for new sources
    scraper_template.py    Boilerplate for bootstrap.py
    config_template.yaml   Boilerplate for config.yaml
    retrieve_template.py   Boilerplate for retrieve.py
  sources/               # One directory per data source
    {CC}/{Source}/          (see per-source structure above)

Coverage

| Region | Countries | Sources | |--------|-----------|---------| | EU Member States | AT, BE, BG, CY, CZ, DE, DK, EE, ES, FI, FR, GR, HR, HU, IE, IT, LT, LU, LV, MT, NL, PL, PT, RO, SE, SI, SK | 130+ | | EFTA / EEA | CH, NO, IS, LI | 10+ | | Council of Europe | UK, TR, UA, GE, AM, AZ, MD | 20+ | | Western Balkans | RS, BA, ME, AL, MK, XK | 15+ | | Latin America | AR, BR, CL, CO, MX, PE | 25+ | | Asia-Pacific | AU, JP, KR, NZ, SG, TW, IN | 30+ | | Middle East & Africa | EG, MA, ZA, NG, KE, TN | 20+ | | Other | US, CA, and more | 15+ |

Track live progress on the dashboard.

Contributing

We welcome contributions from developers, legal researchers, and especially governments who want their open legal data included.

Who can contribute?

| You are... | How you can help | |---|---| | Developer | Build or fix collection scripts, add retrieve scripts, improve tooling | | Government official / jurisdiction lead | Tell us about your country's legal data portals — no coding needed | | Lawyer / legal researcher | Validate data quality, improve legal reference resolution, flag coverage gaps | | Anyone | Report data quality issues, broken sources, or coverage gaps |

Submit a data source (no coding required):

Fix or improve a collection script:

Report a problem:

Good first issues: Browse label:good-first-issue for approachable starting points.

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). You are free to use, modify, and distribute this software, provided that any modified versions made available over a network also make their source code available under the same license.

Commercial Licensing: If you wish to use this software without the AGPL-3.0 obligations (e.g., in a proprietary product or SaaS), commercial licenses are available. Contact zacharie@goodlegal.fr for details.

Related MCP servers

Browse all →