Inventors:
- Mountain View CA, US
Weiqin Ma - San Jose CA, US
Weidong Zhang - San Jose CA, US
Liwen Zhang - San Carlos CA, US
Srihari R. Duddukuru - San Jose CA, US
SangHyun Park - Fremont CA, US
Yongzheng Zhang - San Jose CA, US
Yi Zheng - Cupertino CA, US
Hong Lu - Fremont CA, US
Yurong Shi - San Jose CA, US
Chi-Yi Kuan - Fremont CA, US
Assignee:
LinkedIn Corporation - Mountain View CA
International Classification:
G06F 17/30
G06F 17/24
Abstract:
The disclosed embodiments provide a system for processing data. During operation, the system obtains a first configuration for processing a first set of content items from a first data source and a second configuration for processing a second set of content items from a second data source. For each content item in the first set of content items, the system uses mappings from the first configuration to transform original fields from the content item into required fields in a record representing the content item. Next, the system generates, from the required fields, a document key for the content item. The system also performs deduplication of multiple records with the document key and stores a single record with the document key. Finally, the system uses the second configuration to generate, from the second set of content items, a set of records independently of processing the first set of content items.