ByteDance's Dolphin: 1.1K Stars for Doc-Parsing AI!
Dolphin by ByteDance: Open-source AI for fast, accurate document parsing. 2-stage VLM model with Hugging Face support.
"Top Python Libraries" Publication 400 Subscriptions 20% Discount Offer Link.
Dolphin (Document Image Parsing via Heterogeneous Anchor Prompts) is an advanced open-source document image parsing model by ByteDance, designed to achieve efficient and accurate document parsing through a two-stage analysis-parsing paradigm.
Core Mechanism
Dolphin’s core lies in its innovative two-stage approach: the first stage conducts page-level layout analysis to generate a sequence of elements in natural reading order; the second stage performs parallel parsing of document elements using heterogeneous anchors and task-specific prompts. This method not only enhances parsing efficiency but also significantly improves accuracy.