Legal Services
ALM Data Platform
Platform modernization for data products under delivery constraints
Overview
ALM engaged InGen to replace manual court-runner collection with a production-grade data platform that captures civil court filings at scale. The system powers a live, paid product and provides near real-time access to case data and documents across dozens of U.S. courts.
The Challenge
ALM needed a reliable way to collect court filings from heterogeneous public portals that provide no official APIs. Manual collection could not scale, and data freshness was insufficient for a competitive legal intelligence product.
Key constraints included:
- No standardized court APIs or consistent portal behaviors
- Fragile UIs with frequent markup changes
- Captchas, rate limits, and paid document access
- A requirement for repeatable, idempotent ingestion
The Solution
We built a distributed, court-specific scraping and ingestion platform designed for durability over novelty.
Court-Specific Scrapers
Each court runs its own .NET 6 scraper, containerized with Docker and automated via Selenium. Scrapers handle login, MFA, captchas, and document purchase flows while mimicking real user interaction.
Centralized Ingestion + Idempotency
Scrapers submit standardized payloads to a central ingestion API. The API enforces deterministic record keys to prevent duplicates and coordinates document purchases so the same filing is never purchased twice.
Data Platform + Access
Court data is stored in Postgres (Aurora/RDS), documents live in S3, and ALM consumes updates via an OData API on a near real-time cadence.
Outcomes
- Expanded coverage from a handful of courts to roughly 150
- Approximately 42 active scrapers in production
- Near real-time availability of civil case data and PDFs
- Ongoing operations model with predictable recovery windows
Technology Stack
- .NET 6, C#
- Selenium + Chrome (headless/headful)
- Docker
- AWS (EC2, Aurora/RDS Postgres, S3)
- OData ingestion API with Swagger-defined contracts
Outcomes
Modernization approach
Foundations first, sequenced delivery
Risk posture
Reduced delivery surprises