Dataset Sales
Legal FR→JA Translation Dataset
Professional bilingual dataset (French → Japanese) specialized in legal and compliance documents.
Includes approximately 100 translation units extracted from the ISHIDA International Legal Corpus (FR→JA).
Ideal for LLM fine-tuning, research, or translation memory enrichment.
Dataset Specifications
| Field | Details |
|---|---|
| Language pair | French → Japanese |
| Domain | Legal / Compliance / Corporate Governance |
| Number of segments | Approx. 100 (sample dataset) |
| File format | TSV / TMX / JSON (available upon request) |
| Annotations |
Professionally verified FR–JA translations with entity placeholders
such as ORG_001, PERS_002, LOC_003.
|
| License | CC BY-NC 4.0 (for research use) |
| Access | Public sample on Hugging Face / Full version on request |
Key Features
- Carefully curated legal corpus reflecting authentic contract and compliance terminology.
- Aligned and cleaned by a professional translator using SDL Trados QA validation.
- Entity anonymization with consistent placeholder IDs (e.g.
ORG_001,PERS_002,LOC_003). - Designed for fine-tuning multilingual models such as mBART, M2M100, and NLLB.
- Ideal for legal LLM research, domain adaptation, or corpus training.