About

About & contact

A focused team building industrial-grade data

French Corpus LLM is a young company built around one belief: regulated industries deserve training data that meets the same evidence standard as the regulations themselves. We ship data products, we engineer custom pipelines, and we advise on AI Act readiness.

What we believe

General-purpose web corpora exist for free, and they are excellent for breadth. They are useless when the model is going to advise a regulated business. Our position is to leave the breadth play to others — our value is the editorial decision of what not to include: no scraped marketing copy, no machine-translated dilution, no orphan documents whose licence cannot be cited.

We pretrain nothing. We do not sell a model. We sell the dataset, the pipeline that built it, and the legal chain that travels with it. If you are a foundation-model team, a regtech vendor, a tier-1 bank, an AI lab or a public agency — that is the kind of evidence your governance committee needs.

How we operate

Small & technical

A focused team of data and ML engineers. We do not run a sales floor and we do not put chatbots between you and the people who write the pipelines.

EU-hosted compute

All data ingestion, curation and storage runs on European Union infrastructure. No raw or intermediate data transits outside the EU.

Self-funded

We are not chasing the next round. Customer revenue funds the roadmap, which means we say no to scope that doesn’t serve our customers.

Get in touch

The fastest way to reach us is email. We answer within one business day — with a written first read, not a calendar link.

Sales & scoping

contact@frenchcorpus.com

Product licences, custom engagements, partnership briefs.

Technical & compliance

compliance@frenchcorpus.com

Pipeline questions, AI Act audits, dataset-card integration.