Search

George Saon Phones & Addresses

  • Wilton, CT
  • Stamford, CT
  • Old Greenwich, CT
  • Yorktown Heights, NY
  • 10 Brookside Park, Greenwich, CT 06830 (203) 422-0570
  • Putnam Valley, NY
  • Yorktown Hts, NY
  • 10 Brookside Dr APT LD, Greenwich, CT 06830

Work

Company: Ibm Oct 1998 Position: Research scientist

Education

Degree: High school graduate or higher

Industries

Information Technology And Services

Resumes

Resumes

George Saon Photo 1

Research Scientist

View page
Location:
New York, NY
Industry:
Information Technology And Services
Work:
Ibm
Research Scientist

Publications

Us Patents

Methods And Apparatus For Performing Heteroscedastic Discriminant Analysis In Pattern Recognition Systems

View page
US Patent:
6609093, Aug 19, 2003
Filed:
Jun 1, 2000
Appl. No.:
09/584871
Inventors:
Ramesh Ambat Gopinath - Millwood NY
Mukund Padmanabhan - White Plains NY
George Andrei Saon - Putnam Valley NY
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G10L 1508
US Classification:
704236, 704243, 382190
Abstract:
The present invention provides a new approach to heteroscedastic linear discriminant analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume. The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e. g. , MLLTâmaximum likelihood linear transformation) to the HDA space results in an increased classification accuracy. In another embodiment, the heteroscedastic discriminant objective function assumes that models associated with the function have diagonal covariances thereby resulting in a diagonal heteroscedastic discriminant objective function.

Lattice-Based Unsupervised Maximum Likelihood Linear Regression For Speaker Adaptation

View page
US Patent:
7216077, May 8, 2007
Filed:
Sep 26, 2000
Appl. No.:
09/670251
Inventors:
Mukund Padmanabhan - White Plains NY, US
George A. Saon - Putnam Valley NY, US
Geoffrey G. Zweig - Greenwich CT, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G10L 15/06
G10L 15/14
US Classification:
704240, 704244, 704246
Abstract:
Methods and arrangements using lattice-based information for unsupervised speaker adaptation. By performing adaptation against a word lattice, correct models are more likely to be used in estimating a transform. Further, a particular type of lattice proposed herein enables the use of a natural confidence measure given by the posterior occupancy probability of a state, that is, the statistics of a particular state will be updated with the current frame only if the a posteriori probability of the state at that particular time is greater than a predetermined threshold.

Speech Recognition Utilizing Multitude Of Speech Features

View page
US Patent:
7464031, Dec 9, 2008
Filed:
Nov 28, 2003
Appl. No.:
10/724536
Inventors:
Scott E. Axelrod - Mount Kisco NY, US
Sreeram Viswanath Balakrishnan - Los Altos CA, US
Stanley F. Chen - Yorktown Heights NY, US
Yuging Gao - Mount Kisco NY, US
Ramesh A. Gopinath - Millwood NY, US
Benoit Maison - White Plains NY, US
David Nahamoo - White Plains NY, US
Michael Alan Picheny - White Plains NY, US
George A. Saon - Old Greenwich CT, US
Geoffrey G. Zweig - Ridgefield CT, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G10L 15/00
G10L 15/20
US Classification:
704236, 704240, 704251
Abstract:
In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Speech Recognition Utilizing Multitude Of Speech Features

View page
US Patent:
20080312921, Dec 18, 2008
Filed:
Aug 20, 2008
Appl. No.:
12/195123
Inventors:
Scott E. Axelrod - Mount Kisco NY, US
Sreeram Viswanath Balakrishnan - Los Altos CA, US
Stanley F. Chen - Yorktown Heights NY, US
Yuging Gao - Mount Kisco NY, US
Benoit Maison - White Plains NY, US
David Nahamoo - White Plains NY, US
Michael Alan Picheny - White Plains NY, US
George A. Saon - Old Greenwich CT, US
Geoffrey G. Zweig - Ridgefield CT, US
International Classification:
G10L 15/00
G10L 15/04
US Classification:
704240, 704251, 704E15001, 704E15004
Abstract:
In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Methods And Apparatus For Forming Compound Words For Use In A Continuous Speech Recognition System

View page
US Patent:
6385579, May 7, 2002
Filed:
Apr 29, 1999
Appl. No.:
09/302032
Inventors:
Mukund Padmanabhan - White Plains NY
George Andrei Saon - Putnam Valley NY
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G10L 1506
US Classification:
704243, 704256
Abstract:
A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair. A fourth measure is based on a difference between an average phone recognition score for a particular compound word and a sum of respective average phone recognition scores of the words of the pair.

End To End Spoken Language Understanding Model

View page
US Patent:
20220319494, Oct 6, 2022
Filed:
Mar 31, 2021
Appl. No.:
17/218618
Inventors:
- Armonk NY, US
George Andrei Saon - Stamford CT, US
Zoltan Tueske - White Plains NY, US
Brian E. D. Kingsbury - Cortlandt Manor NY, US
International Classification:
G10L 15/06
G06K 9/62
G10L 13/02
Abstract:
An approach to training an end-to-end spoken language understanding model may be provided. A pre-trained general automatic speech recognition model may be adapted to a domain specific spoken language understanding model. The pre-trained general automatic speech recognition model may be a recurrent neural network transducer model. The adaptation may provide transcription data annotated with spoken language understanding labels. Adaptation may include audio data may also be provided for in addition to verbatim transcripts annotated with spoken language understanding labels. The spoken language understanding labels may be entity and/or intent based with values associated with each label.

Chunking And Overlap Decoding Strategy For Streaming Rnn Transducers For Speech Recognition

View page
US Patent:
20220277734, Sep 1, 2022
Filed:
Feb 26, 2021
Appl. No.:
17/186167
Inventors:
- Armonk NY, US
George Andrei Saon - Stamford CT, US
International Classification:
G10L 15/16
G06N 3/04
G06N 3/08
Abstract:
A computer-implemented method is provided for improving accuracy recognition of digital speech. The method includes receiving the digital speech. The method further includes splitting the digital speech into overlapping chunks. The method also includes computing a bidirectional encoder embedding of each of the overlapping chunks to obtain bidirectional encoder embeddings. The method additionally includes combining the bidirectional encoder embeddings. The method further includes interpreting, by a speech recognition system, the digital speech using the combined bidirectional encoder embeddings.

Customization Of Recurrent Neural Network Transducers For Speech Recognition

View page
US Patent:
20220208179, Jun 30, 2022
Filed:
Dec 29, 2020
Appl. No.:
17/136439
Inventors:
- Armonk NY, US
George Andrei Saon - Stamford CT, US
Brian E. D. Kingsbury - Cortlandt Manor NY, US
International Classification:
G10L 15/16
G10L 25/30
G10L 13/02
G06N 3/08
Abstract:
A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.
George A Saon from Wilton, CT, age ~54 Get Report