Inventors:
Bodin Dresevic - Bellevue WA, US
Oren Trutner - Kirkland WA, US
Sasa Tomasevic - Belgrade, RS
Aleksandar Uzelac - Kru{hacek over (s)}evac, RS
Dejan Lukacevic - Belgrade, RS
Assignee:
Microsoft Corporation - Redmond WA
International Classification:
G06F 17/00
US Classification:
715249, 715243, 715248, 715239, 382190
Abstract:
Computer-readable media, systems, and methods for document layout extraction are described. In embodiments, textual data in an electronic format is received and the textual data is converted from the electronic format to an independent interface format, the independent interface format including coordinates to one or more structural elements of the textual data. Further, in embodiments, a structure and layout analysis of the textual data is performed to generate a set of structure and layout information. Still further, in embodiments, the textual data and the set of structure and layout information is stored in an enriched interface format, the enriched interface format providing for search and navigation of the textual data.