Nitin Naresh Garegrat from San Jose, CA, age 38

Public records

Vehicle Records

Nitin Garegrat

View page

Address:

900 N Rural Rd APT 2051, Chandler, AZ 85226

VIN:

1N4AL2AP9AN469035

Make:

NISSAN

Model:

ALTIMA

Year:

2010

Resumes

Senior Machine Learning Architect

View page

Location:

900 north Rural Rd, Chandler, AZ 85226

Industry:

Semiconductors

Work:

Intel C since Mar 2010
Senior RTL Design Engineer
University of Michigan, Ann Arbor Sep 2009 - Dec 2009
Graduate Student Instructor (EECS 478-Logic Synthesis & Optimization)
University of Michigan, Ann Arbor May 2009 - Sep 2009
Graduate Student Research Assistant
University of Michigan 2008 - 2009
Student

Education:

University of Michigan 2008 - 2009
Masters, Computer Science and Engineering University of Mumbai 2004 - 2008
BE, Electronics

Skills:

Systemverilog
Verilog
Asic
Soc
Rtl Design
Microarchitecture
Cadence Virtuoso
Computer Architecture
Project Management
Architecture
Open Verification Methodology
Circuit Design
Program Management
System Verilog
Perl
Scrum
System on A Chip
Very Large Scale Integration
Application Specific Integrated Circuits
Processors

Publications

Us Patents

Compressed Matrix With Sparsity Metadata

View page

US Patent:

20220222319, Jul 14, 2022

Filed:

Jan 14, 2021

Appl. No.:

17/149643

Inventors:

- Redmond WA, US
Nitin Naresh GAREGRAT - San Jose CA, US

Assignee:

Microsoft Technology Licensing, LLC - Redmond WA

International Classification:

G06F 17/16

Abstract:

A computing device is provided, including one or more processing devices configured to receive a first matrix including a plurality of first matrix elements arranged in a plurality of submatrices. The one or more processing devices may be further configured to generate first matrix sparsity metadata indicating one or more zero submatrices and one or more nonzero submatrices of the plurality of submatrices. Each of the first matrix elements included in the one or more zero submatrices may be equal to zero. The one or more processing devices may be further configured to store, in memory, a compressed first matrix including the first matrix sparsity metadata and the one or more nonzero submatrices and not including the one or more zero submatrices.

Computing Dot Products At Hardware Accelerator

View page

US Patent:

20220222575, Jul 14, 2022

Filed:

Jan 14, 2021

Appl. No.:

17/149602

Inventors:

- Redmond WA, US
Nitin Naresh GAREGRAT - San Jose CA, US
Viraj Sunil KHADYE - Sunnyvale CA, US
Yuxuan ZHANG - San Jose CA, US

Assignee:

Microsoft Technology Licensing, LLC - Redmond WA

International Classification:

G06N 20/00

Abstract:

A computing device, including a hardware accelerator configured to train a machine learning model by computing a first product matrix including a plurality of first dot products. Computing the first product matrix may include receiving a first matrix including a plurality of first vectors and a second matrix including a plurality of second vectors. Each first vector may include a first shared exponent and a plurality of first vector elements. Each second vector may include a second shared exponent and a plurality of second vector elements. For each first vector, computing the first product matrix may further include computing the first dot product of the first vector and a second vector. The first dot product may include a first dot product exponent, a first dot product sign, and a first dot product mantissa. Training the first machine learning model may further include storing the first product matrix in memory.

Software Assisted Power Management

View page

US Patent:

20190384370, Dec 19, 2019

Filed:

Aug 30, 2019

Appl. No.:

16/557657

Inventors:

Jason Seung-Min Kim - San Jose CA, US
Sundar Ramani - Santa Clara CA, US
Yogesh Bansal - Beaverton OR, US
Nitin N. Garegrat - San Jose CA, US
Olivia K. Wu - Los Altos CA, US
Mayank Kaushik - San Jose CA, US
Mrinal Iyer - Menlo Park CA, US
Tom Schebye - San Carlos CA, US
Andrew Yang - Cupertino CA, US

International Classification:

G06F 1/324
G06F 1/08
G06F 1/28
G06F 1/12
G06F 9/28
G06F 9/30

Abstract:

Embodiments include an apparatus comprising an execution unit coupled to a memory, a microcode controller, and a hardware controller. The microcode controller is to identify a global power and performance hint in an instruction stream that includes first and second instruction phases to be executed in parallel, identify a local hint based on synchronization dependence in the first instruction phase, and use the first local hint to balance power consumption between the execution unit and the memory during parallel executions of the first and second instruction phases. The hardware controller is to use the global hint to determine an appropriate voltage level of a compute voltage and a frequency of a compute clock signal for the execution unit during the parallel executions of the first and second instruction phases. The first local hint includes a processing rate for the first instruction phase or an indication of the processing rate.

System To Perform Unary Functions Using Range-Specific Coefficient Sets

View page

US Patent:

20190384575, Dec 19, 2019

Filed:

Aug 30, 2019

Appl. No.:

16/557959

Inventors:

- Santa Clara CA, US
Nitin N. Garegrat - San Jose CA, US
Maciej Urbanski - Gdansk, PL
Michael Rotzin - Santa Clara CA, US

Assignee:

Intel Corporation - Santa Clara CA

International Classification:

G06F 7/552
G06F 7/499
G06F 7/483

Abstract:

A method comprising storing a plurality of entries, each entry of the plurality of entries associated with a portion of a range of input values, each entry of the plurality of entries comprising a set of coefficients defining a power series approximation; selecting first entry of the plurality of entries based on a determination that a floating point input value is within a portion of the range of input values that is associated with the first entry; and calculating an output value by evaluating the power series approximation defined by the set of coefficients of the first entry at the floating point input value.

Proactive Di/Dt Voltage Droop Mitigation

View page

US Patent:

20190384603, Dec 19, 2019

Filed:

Aug 30, 2019

Appl. No.:

16/557187

Inventors:

Jason Seung-Min Kim - San Jose CA, US
Nitin N. Garegrat - San Jose CA, US
Anitha Loke - Santa Clara CA, US
Nasima Parveen - San Jose CA, US
David Y. Fang - San Jose CA, US
Kursad Kiziloglu - Pleasanton CA, US
Dmitry Sergeyevich Lukiyanchenko - Beaverton OR, US
Fabrice Paillet - Portland OR, US
Andrew Yang - Cupertino CA, US

International Classification:

G06F 9/30

Abstract:

Embodiments include a method comprising identifying, by an instruction scheduler of a processor core, a first high power instruction in an instruction stream to be executed by an execution unit of the processor core. A pre-charge signal is asserted indicating that the first high power instruction is scheduled for execution. Subsequent to the pre-charge signal being asserted, a voltage boost signal is asserted to cause a supply voltage for the execution unit to be increased. A busy signal indicating that the first high power instruction is executing is received from the execution unit. Based at least in part on the busy signal being asserted, de-asserting the voltage boost signal. More specific embodiments include decreasing the supply voltage for the execution unit subsequent to the de-asserting the voltage boost signal. More Further embodiments include delaying asserting the voltage boost signal based on a start delay time.

Apparatus And Method For Coherent, Accelerated Conversion Between Data Representations

View page

US Patent:

20190042094, Feb 7, 2019

Filed:

Jun 30, 2018

Appl. No.:

16/024812

Inventors:

- Santa Clara CA, US
Andrew Yang - Cupertino CA, US
Nitin Garegrat - Chandler AZ, US
Tom Schebye - San Carlos CA, US
Tony Werner - Los Altos CA, US

International Classification:

G06F 3/06
G06F 9/30

Abstract:

An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Technologies For Inflight Packet Count Limiting In A Queue Manager Environment

View page

US Patent:

20190007318, Jan 3, 2019

Filed:

Jun 30, 2017

Appl. No.:

15/638728

Inventors:

- Santa Clara CA, US
William Burroughs - Macungie PA, US
Nitin N. Garegrat - Chandler AZ, US
David P. Sonnier - Austin TX, US

International Classification:

H04L 12/803
H04L 12/859
H04L 12/801
H04L 12/805
H04L 12/863

Abstract:

Technologies for inflight packet count limiting include a network device. The network device is to receive a packet from a producer application. The packet is configured to be enqueued into a packet queue as a queue element to be consumed by a consumer application. The network device is also to increment, in response to receipt of the packet, an inflight count variable, determine whether a value of the inflight count variable satisfies an inflight count limit, and enqueue, in response to a determination that the value of the inflight count variable satisfies the inflight count limit, the packet.

Processing Engine Implementing Job Arbitration With Ordering Status

View page

US Patent:

20140282579, Sep 18, 2014

Filed:

Mar 14, 2013

Appl. No.:

13/829118

Inventors:

David A. Smiley - Chandler AZ, US
Naveen Lakkakula - Chandler AZ, US
Weiqiang Ma - Chandler AZ, US
Justin B. Diether - Phoenix AZ, US
Nitin N. Garegrat - Chandler AZ, US

International Classification:

G06F 9/50

US Classification:

718104

Abstract:

A processing engine implementing job arbitration with ordering status is disclosed. A method of the disclosure includes receiving, by a job assigner communicably coupled to a plurality of processors, availability status from a plurality of job rings, availability status from the plurality of processors, and job entry completion status from an order manager, identifying, based on the received job entry completion status, a set of job rings from the plurality of job rings that do not exceed threshold conditions maintained by the job assigner, selecting, from the identified set of job rings, a job ring from which to pull a job entry for assignment, wherein the selecting is based on the received availability status of the plurality of job rings, and selecting, based on the received availability status of the plurality of processors, a processor to receive the assignment of the job entry for processing.

Nitin Naresh Garegrat

Nitin Garegrat Phones & Addresses

Work

Education

Skills

Industries

Public records

Vehicle Records

Nitin Garegrat

Resumes

Resumes

Senior Machine Learning Architect

Publications

Us Patents

Compressed Matrix With Sparsity Metadata

Computing Dot Products At Hardware Accelerator

Software Assisted Power Management

System To Perform Unary Functions Using Range-Specific Coefficient Sets

Proactive Di/Dt Voltage Droop Mitigation

Apparatus And Method For Coherent, Accelerated Conversion Between Data Representations

Technologies For Inflight Packet Count Limiting In A Queue Manager Environment

Processing Engine Implementing Job Arbitration With Ordering Status