Luis Gravano from 771 End Ave, New York, NY 10025, age 58, Phone: (212) 932-8465

Method Of Building Multidimensional Workload-Aware Histograms

View page

US Patent:

7007039, Feb 28, 2006

Filed:

Jun 14, 2001

Appl. No.:

09/881500

Inventors:

Surajit Chaudhuri - Redmond WA, US
Nicolas Bruno - New York NY, US
Luis Gravano - New York NY, US

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 17/30

US Classification:

707200, 707 5

Abstract:

In a database system, a method of maintaining a self-tuning histogram having a plurality of existing rectangular shaped buckets arranged in a hierarchical manner and defined by at least two bucket boundaries, a bucket volume, and a bucket frequency. At least one new bucket is created in response to a query on the database. Each new bucket is contained within at least one existing bucket and the new bucket becomes a child bucket and the existing bucket containing it becomes a parent bucket. The boundaries of each new bucket correspond to a region of the database accessed by the query and the frequency of the new bucket is a number of data records returned by the query. Buckets may be merged based on a merge criterion such as similar bucket density when the total number of buckets exceeds the predetermined budget.

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

View page

US Patent:

7146358, Dec 5, 2006

Filed:

Aug 28, 2001

Appl. No.:

09/939661

Inventors:

Luis Gravano - New York NY, US
Monika H. Henzinger - Menlo Park CA, US

Assignee:

Google Inc. - Mountain View CA

International Classification:

G06F 17/30
G06F 7/00

US Classification:

707 4, 707 5

Abstract:

A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

String Predicate Selectivity Estimation

View page

US Patent:

7149735, Dec 12, 2006

Filed:

Jun 24, 2003

Appl. No.:

10/603035

Inventors:

Surajit Chaudhuri - Redmond WA, US
Venkatesh Ganti - Redmond WA, US
Luis Gravano - New York NY, US

Assignee:

Microsoft Corporation - Redmond WA

International Classification:

G06F 17/30

US Classification:

707 6

Abstract:

A method of estimating selectivity of a given string predicate in a database query. In the method selectivities of substrings of various substring lengths are estimated. For example, the selectivity of substrings between length l (or some constant q) to the length of the given string predicate may be estimated. The method then selects a candidate sub string for each sub string length based on estimated selectivities of the substrings. The estimated selectivities of the candidate substrings are combined. The combined estimated selectivity of the candidate substrings is returned as the estimated selectivity of the given string predicate.

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

View page

US Patent:

7814103, Oct 12, 2010

Filed:

Aug 30, 2006

Appl. No.:

11/468674

Inventors:

Luis Gravano - New York NY, US
Monika H. Henzinger - Corseaux, CH

Assignee:

Google Inc. - Mountain View CA

International Classification:

G06F 17/30

US Classification:

707736, 707760

Abstract:

A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

View page

US Patent:

8190608, May 29, 2012

Filed:

Jun 30, 2011

Appl. No.:

13/174209

Inventors:

Luis Gravano - New York NY, US
Monika H. Henzinger - Menlo Park CA, US

Assignee:

Google Inc. - Mountain View CA

International Classification:

G06F 17/30

US Classification:

707736, 707760

Abstract:

A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

View page

US Patent:

8631010, Jan 14, 2014

Filed:

May 18, 2012

Appl. No.:

13/474957

Inventors:

Luis Gravano - New York NY, US
Monika H. Henzinger - Menlo Park CA, US

Assignee:

Google Inc. - Mountain View CA

International Classification:

G06F 17/30

US Classification:

707736, 707760

Abstract:

A method may include obtaining, based on a content of a search query, one or more documents in a first language; identifying one or more documents in a second language that contain an anchor that links to the one or more documents in the first language, the second language being different than the first language; and translating one or more terms of the search query into the second language using content included in the one or more documents in the second language.

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

View page

US Patent:

7996402, Aug 9, 2011

Filed:

Aug 31, 2010

Appl. No.:

12/872755

Inventors:

Luis Gravano - New York NY, US
Monika H. Henzinger - Menlo Park CA, US

Assignee:

Google Inc. - Mountain View CA

International Classification:

G06F 17/30

US Classification:

707736, 707760

Abstract:

A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

Text Joins For Data Cleansing And Integration In A Relational Database Management System

View page

US Patent:

20050027717, Feb 3, 2005

Filed:

Apr 21, 2004

Appl. No.:

10/828819

Inventors:

Nikolaos Koudas - New York NY, US
Divesh Srivastava - Summit NJ, US
Luis Gravano - New York NY, US

International Classification:

G06F017/30

US Classification:

707100000

Abstract:

An organization's data records are often noisy: because of transcription errors, incomplete information, and lack of standard formats for textual data. A fundamental task during data cleansing and integration is matching strings—perhaps across multiple relations—that refer to the same entity (e.g., organization name or address). Furthermore, it is desirable to perform this matching within an RDBMS, which is where the data is likely to reside. In this paper, We adapt the widely used and established cosine similarity metric from the information retrieval field to the relational database context in order to identify potential string matches across relations. We then use this similarity metric to characterize this key aspect of data cleansing and integration as a join between relations on textual attributes, where the similarity of matches exceeds a specified threshold. Computing an exact answer to the text join can be expensive. For query processing efficiency, we propose an approximate, sampling-based approach to the join problem that can be easily and efficiently executed in a standard, unmodified RDBMS. Therefore the present invention includes a system for string matching across multiple relations in a relational database management system comprising generating a set of strings from a set of characters, decomposing each string into a subset of tokens, establishing at least two relations within the strings, establishing a similarity threshold for the relations, sampling the at least two relations, correlating the relations for the similarity threshold and returning all of the tokens which meet the criteria of the similarity threshold.

Luis Gravano

Luis Gravano Phones & Addresses

Publications

Us Patents

Method Of Building Multidimensional Workload-Aware Histograms

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

String Predicate Selectivity Estimation

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

Systems And Methods For Using Anchor Text As Parallel Corpora For Cross-Language Information Retrieval

Text Joins For Data Cleansing And Integration In A Relational Database Management System

Luis Gravano