Inventors:
Minos N. Garofalakis - Chatham Township NJ, US
Aristides Gionis - Stanford CA, US
Rajeev Rastogi - New Providence NJ, US
Srinivasan Seshadri - Basking Ridge NJ, US
Kyuseok Shim - Bedminster NJ, US
Assignee:
Lucent Technologies Inc. - Murray Hill NJ
International Classification:
G06F 15/00
Abstract:
The present invention discloses a document descriptor extraction method and system. The document descriptor extraction method and system creates a document descriptor by generalizing input sequences within a document; factoring the input sequences and generalized input sequences; and selecting a document descriptor from the input sequences, generalized sequences, and factored sequences, preferably using minimum descriptor length (MDL) principles. Novel algorithms are employed to perform the generalizing, factoring, and selecting.