Search

Milind B Girkar

from Sunnyvale, CA
Age ~61

Milind Girkar Phones & Addresses

  • 1049 Olive Ave, Sunnyvale, CA 94086 (408) 738-5671
  • 1049 W Olive Ave APT 3, Sunnyvale, CA 94086 (408) 568-4413
  • Champaign, IL
  • Santa Rosa, CA

Work

Company: Intel Dec 1995 Position: Senior principal engineer

Education

Degree: PhD School / High School: University of Illinois at Urbana-Champaign 1986 to 1991 Specialities: Computer Science

Skills

Compilers • Computer Architecture • Microarchitecture • Processor Architecture

Emails

Industries

Computer Software

Resumes

Resumes

Milind Girkar Photo 1

Fellow

View page
Location:
19750 northwest Phillips Rd, Hillsboro, OR 97124
Industry:
Computer Software
Work:
Intel since Dec 1995
Senior Principal Engineer

Sun Microsystems 1993 - 1995
Staff Engineer

Kubota Computers 1991 - 1993
Engineer
Education:
University of Illinois at Urbana-Champaign 1986 - 1991
PhD, Computer Science
Vanderbilt University 1984 - 1986
MS, Computer Science
Indian Institute of Technology, Bombay 1979 - 1984
BTech, Computer Science and Engineering
Skills:
Compilers
Computer Architecture
Microarchitecture
Processor Architecture

Publications

Us Patents

Means And Method For Establishing Loop-Level Parallelism

View page
US Patent:
6367070, Apr 2, 2002
Filed:
Jan 13, 1998
Appl. No.:
09/006321
Inventors:
Mohammad R. Haghighat - Cupertino CA
Milind Girkar - Sunnyvale CA
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 945
US Classification:
717 9, 717 4, 717 5, 717 6, 717 8
Abstract:
A method for recognizing a parallel executable region within a sequence of source code. The parallel executable region is identified by locating a loop structure within the sequence of source code. The loop structure includes a field controlled by an induction variable. Furthermore, the field is set in every iteration of the loop structure.

Apparatus And Method For Vectorization Of Detected Saturation And Clipping Operations In Serial Code Loops Of A Source Program

View page
US Patent:
7020873, Mar 28, 2006
Filed:
Jun 21, 2002
Appl. No.:
10/176503
Inventors:
Aart J. C. Bik - Union City CA, US
Milind Girkar - Sunnyvale CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/45
US Classification:
717156
Abstract:
An apparatus and method for vectorization of detected saturation and clipping operations in serial code loops of a source program are described. In one embodiment, the method includes the analysis of source program code to identify source code utilizing conditional constructs to perform saturation/clipping operations. Once analysis is complete, identified source code is vectorized to implement identified saturation/clipping operations utilizing single instruction, multiple data (SIMD) saturation/clipping instructions. Accordingly, utilizing embodiments of the present invention, conditional statements utilized to implement saturation arithmetic, as well as clipping of data values, such as pixel values within graphics applications, are replaced with SIMD saturation arithmetic instructions, as well as clipping instructions.

Methods And Apparatus For Reducing Memory Latency In A Software Application

View page
US Patent:
7328433, Feb 5, 2008
Filed:
Oct 2, 2003
Appl. No.:
10/677414
Inventors:
Xinmin Tian - Union City CA, US
Shih-wei Liao - San Jose CA, US
Hong Wang - Fremont CA, US
Milind Girkar - Sunnyvale CA, US
John Shen - San Jose CA, US
Perry Wang - San Jose CA, US
Grant Haab - Mahomet IL, US
Gerolf Hoflehner - Santa Clara CA, US
Daniel Lavery - Santa Clara CA, US
Hideki Saito - Sunnyvale CA, US
Sanjiv Shah - Champaign IL, US
Dongkeun Kim - San Jose CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/44
US Classification:
717149, 717161, 711123, 711126, 711204, 712204, 712207
Abstract:
Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

Methods And Apparatuses For Thread Management Of Multi-Threading

View page
US Patent:
7398521, Jul 8, 2008
Filed:
Feb 13, 2004
Appl. No.:
10/779193
Inventors:
Gerolf F. Hoflehner - Santa Clara CA, US
Shih-wei Liao - San Jose CA, US
Xinmin Tian - Union City CA, US
Hong Wang - Fremont CA, US
Daniel M. Lavery - Santa Clara CA, US
Perry Wang - San Jose CA, US
Dongkeun Kim - San Jose CA, US
Milind Girkar - Sunnyvale CA, US
John P. Shen - San Jose CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/45
US Classification:
717151
Abstract:
Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.

Programmable Event Driven Yield Mechanism Which May Activate Other Threads

View page
US Patent:
7487502, Feb 3, 2009
Filed:
Feb 19, 2003
Appl. No.:
10/370251
Inventors:
Hong Wang - Fremont CA, US
Per Hammarlund - Hillsboro OR, US
Xiang Zou - Beaverton OR, US
John Shen - San Jose CA, US
Xinmin Tian - Union City CA, US
Milind Girkar - Sunnyvale CA, US
Perry Wang - San Jose CA, US
Piyush Desai - Pleasanton CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/46
G06F 9/44
G06F 9/45
US Classification:
718102, 717127, 703 22
Abstract:
Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.

Apparatus, Systems, And Methods For Execution-Driven Loop Splitting And Load-Safe Code Hosting

View page
US Patent:
7549146, Jun 16, 2009
Filed:
Jun 21, 2005
Appl. No.:
11/157441
Inventors:
Xinmin Tian - Union City CA, US
Milind B. Girkar - Sunnyvale CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/45
US Classification:
717150, 717148, 717151, 717152, 717160
Abstract:
Techniques for execution-driven loop splitting and load-safe code hosting are provided. Compiled code includes statements associated with an original loop and statements associated with an alternative loop. The alternative loop reproduces the original loop except for conditional load-safe invariant expressions that appeared in the original loop and that are separated out of the alternative loop. During processing, once the conditional load-safe invariant expressions are computed and referenced for a first time within the original loop, processing dynamically switches to the alternative loop where the conditional load-safe invariant expressions are computed outside of the alternative loop and referenced from within the alternative loop.

Fast Lock-Free Post-Wait Synchronization For Exploiting Parallelism On Multi-Core Processors

View page
US Patent:
7571301, Aug 4, 2009
Filed:
Mar 31, 2006
Appl. No.:
11/395841
Inventors:
Arun Kejariwal - Irvine CA, US
Hideki Saito - Sunnyvale CA, US
Xinmin Tian - Union City CA, US
Milind Girkar - Sunnyvale CA, US
Sanjiv Shah - Champaign IL, US
Wei Li - Redwood City CA, US
Utpal Banerjee - Fremont CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/45
G06F 9/52
US Classification:
712215, 717150
Abstract:
A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

System, Method And Apparatus For Dependency Chain Processing

View page
US Patent:
7603546, Oct 13, 2009
Filed:
Sep 28, 2004
Appl. No.:
10/950693
Inventors:
Satish Narayanasamy - La Jolla CA, US
Hong Wang - Santa Clara CA, US
John Shen - San Jose CA, US
Roni Rosner - Binyamina, IL
Yoav Almog - Haifa, IL
Naftali Schwartz - Yaakov, IL
Gerolf Hoflehner - Santa Clara CA, US
Daniel LaVery - Santa Clara CA, US
Wei Li - Redwood CA, US
Xinmin Tian - Union City CA, US
Milind Girkar - Sunnyvale CA, US
Perry Wang - San Jose CA, US
Assignee:
Intel Corporation - Santa Clara CA
International Classification:
G06F 9/00
G06F 9/24
G06F 15/177
US Classification:
713 1, 717144, 717151, 717159
Abstract:
Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.
Milind B Girkar from Sunnyvale, CA, age ~61 Get Report