Hugging Face BlogDec 16, 2024, 12:00 AMimportant 82

Hugging Face 推出 Synthetic Data Generator：用自然語言輕鬆構建 AI 訓練資料集

Original: Introducing the Synthetic Data Generator - Build Datasets with Natural Language

Hugging Face launched a brand-new "Synthetic Data Generator" in December 2024 — a web-based, no-code tool designed to allow anyone to…

Hugging Face 發表「Synthetic Data Generator」工具，旨在降低 AI 模型訓練資料集的構建門檻。用戶只需用自然語言描述需求，系統便會利用 distilabel 框架與開源大模型（如 Llama 3.1）自動生成高品質的指令微調（SFT）或偏好對齊（DPO）資料集。生成的資料可直接上傳至 Hugging Face Hub，並支援匯出至 Argilla 進行人工標註與微調。

Hugging Face launched a brand-new "Synthetic Data Generator" in December 2024 — a web-based, no-code tool designed to allow anyone to create high-quality AI training datasets simply by describing what they need in natural language. The release of this tool dramatically lowers the barrier for developers who struggle with a shortage of high-quality data when fine-tuning large language models.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

llama open-source synthetic-data-generator distilabel #synthetic-data #distilabel #fine-tuning #dataset #no-code

Summaries are AI-generated; the original article is authoritative.