US Patent:
20120191669, Jul 26, 2012
Inventors:
Jon Christopher Kennedy - Marlborough MA, US
Ronald Ray Trimble - Acton MA, US
Carey Jay McMaster - Stow MA, US
John Henry Petrangelo - Shrewsbury MA, US
Roland Leo Sorel - Westford MA, US
Patrick James Grinwald - Grafton MA, US
Assignee:
Sepaton, Inc. - Marlborough MA
International Classification:
G06F 17/30
Abstract:
Described are computer-based methods and apparatuses, including computer program products, for detection and deduplication of backup sets exhibiting poor locality. A first set of summaries of a first data set are determined, each summary of the first set of summaries being indicative of a data pattern in the first data set. A second set of summaries of a second data set are determined, each summary of the second set of summaries being indicative of a data pattern in the second data set. A set of comparison metrics are calculated, each comparison metric being based on a first subset of summaries from the first set of summaries and a second subset of summaries from the second set of summaries. A locality metric is calculated based on the set of comparison metrics indicative of whether the first data set and second data set exhibit poor locality.