About & contact
A focused team building industrial-grade data
French Corpus LLM is a young company built around one belief: regulated industries deserve training data that meets the same evidence standard as the regulations themselves. We ship data products, we engineer custom pipelines, and we advise on AI Act readiness.
What we believe
General-purpose web corpora exist for free, and they are excellent for breadth. They are useless when the model is going to advise a regulated business. Our position is to leave the breadth play to others — our value is the editorial decision of what not to include: no scraped marketing copy, no machine-translated dilution, no orphan documents whose licence cannot be cited.
We pretrain nothing. We do not sell a model. We sell the dataset, the pipeline that built it, and the legal chain that travels with it. If you are a foundation-model team, a regtech vendor, a tier-1 bank, an AI lab or a public agency — that is the kind of evidence your governance committee needs.
How we operate
Small & technical
A focused team of data and ML engineers. We do not run a sales floor and we do not put chatbots between you and the people who write the pipelines.
EU-hosted compute
All data ingestion, curation and storage runs on European Union infrastructure. No raw or intermediate data transits outside the EU.
Self-funded
We are not chasing the next round. Customer revenue funds the roadmap, which means we say no to scope that doesn’t serve our customers.
Get in touch
The fastest way to reach us is email. We answer within one business day — with a written first read, not a calendar link.
Technical & compliance
Pipeline questions, AI Act audits, dataset-card integration.