Skip to main content
All case studies

Legal Services

ALM Data Platform

Platform modernization for data products under delivery constraints

Overview

ALM engaged InGen to replace manual court-runner collection with a production-grade data platform that captures civil court filings at scale. The system powers a live, paid product and provides near real-time access to case data and documents across dozens of U.S. courts.

The Challenge

ALM needed a reliable way to collect court filings from heterogeneous public portals that provide no official APIs. Manual collection could not scale, and data freshness was insufficient for a competitive legal intelligence product.

Key constraints included:

  • No standardized court APIs or consistent portal behaviors
  • Fragile UIs with frequent markup changes
  • Captchas, rate limits, and paid document access
  • A requirement for repeatable, idempotent ingestion

The Solution

We built a distributed, court-specific scraping and ingestion platform designed for durability over novelty.

Court-Specific Scrapers

Each court runs its own .NET 6 scraper, containerized with Docker and automated via Selenium. Scrapers handle login, MFA, captchas, and document purchase flows while mimicking real user interaction.

Centralized Ingestion + Idempotency

Scrapers submit standardized payloads to a central ingestion API. The API enforces deterministic record keys to prevent duplicates and coordinates document purchases so the same filing is never purchased twice.

Data Platform + Access

Court data is stored in Postgres (Aurora/RDS), documents live in S3, and ALM consumes updates via an OData API on a near real-time cadence.

Outcomes

  • Expanded coverage from a handful of courts to roughly 150
  • Approximately 42 active scrapers in production
  • Near real-time availability of civil case data and PDFs
  • Ongoing operations model with predictable recovery windows

Technology Stack

  • .NET 6, C#
  • Selenium + Chrome (headless/headful)
  • Docker
  • AWS (EC2, Aurora/RDS Postgres, S3)
  • OData ingestion API with Swagger-defined contracts

Outcomes

Modernization approach

Foundations first, sequenced delivery

Risk posture

Reduced delivery surprises

Service Pillar: Cloud & Platform Engineering

Services

.NETC#SeleniumDockerAWSPostgreSQLODataData Engineering