Mistral AI introduced Mistral OCR 3, a document extraction model focused on high-fidelity text, image, markdown, and HTML table output. The company says it achieves a 74% overall win rate over Mistral OCR 2 across forms, scanned documents, complex tables, and handwriting. It is available through API and the Document AI Playground in Mistral AI Studio, with pricing starting at $2 per 1,000 pages.
The well-known open-source OCR (Optical Character Recognition) toolkit PaddleOCR has long been celebrated for its high accuracy, lightweight models, and strong…
IBM has officially launched its new lightweight multimodal model on Hugging Face — the Granite 4.0 3B Vision. With 3 billion (3B) parameters, this model is…
Traditional OCR systems (such as Tesseract) often struggle with complex layouts, multi-column tables, handwriting, and mathematical formulas, while using…
Hugging Face has recently released a major update for its innovative spreadsheet AI tool "AI Sheets," officially unlocking powerful image processing…
The Replicate platform has newly listed two powerful document and image parsing models developed by Datalab: "Datalab Marker" and "Datalab OCR." They are…
This technical article from Hugging Face introduces how to deploy a state-of-the-art (SOTA) optical character recognition (OCR) model called dots.ocr using…
### Background With the proliferation of vision-language models (VLMs), using VLMs for document OCR (e.g., converting PDFs to Markdown) has become mainstream…
The Language Technologies department (BSC-LT) of the Barcelona Supercomputing Center (BSC) recently released a new open-source multimodal model on Hugging Face…
### Solving Real-World Document AI Pain Points In the fields of Document AI and OCR (Optical Character Recognition), datasets used in academic research or…
Hugging Face has announced the launch of Idefics2, the next generation of its open-source Vision Language Model (VLM). With 8 billion (8B) parameters, this…
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…