Hugging Face BlogMay 23, 2022, 12:00 AM

無需真實數據的高效表格預訓練:TAPEX 概念與 Hugging Face 整合介紹

Original: Efficient Table Pre-training without Real Data: An Introduction to TAPEX

When working with structured data such as tables, traditional pre-trained models typically require crawling large amounts of real-world…

微軟提出的 TAPEX(Table Pre-training via Execution)是一種創新的表格預訓練方法,現已整合至 Hugging Face。它不依賴網路爬取的真實表格,而是利用隨機生成的 SQL 查詢及其執行結果(合成數據)來訓練 Seq2Seq 模型。這種「藉由執行來學習」的方式,顯著提升了模型對表格數據的推理能力,並在 WikiSQL 和 WikiTableQuestions 等基準測試中取得領先。

When working with structured data such as tables, traditional pre-trained models typically require crawling large amounts of real-world tables and related text from the web. However, this approach faces challenges including noisy data, privacy concerns, and difficulties in aligning tables with text. TAPEX (Table Pre-training via Execution), proposed by Microsoft Research Asia, breaks through these limitations. The core idea of TAPEX is "table pre-training via execution" — it requires no real-world table data whatsoever and relies entirely on synthetic data.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

Summaries are AI-generated; the original article is authoritative.