Inventors:
Jill Carrier - Dorchester MA, US
Alwin B. Carus - Waban MA, US
William F. Cote - Carlisle MA, US
John Dowd - Sudbury MA, US
Kathryn Del La Femina - Ashland MA, US
Alan Frankel - Framingham MA, US
Larissa Lapshina - Shirley MA, US
Bernardo Rechea - Belmont MA, US
Ana Santisteban - Somerville MA, US
Amy J. Uhrbach - Needham MA, US
Assignee:
Dictaphone Corporation - Stratford CT
International Classification:
G06F 17/27
G06F 17/20
Abstract:
The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.