High Performance Computing with Accelerators

My research aims at understanding how HPC applications can be mapped to newly emerging accelerator-based architectures, such as those that employ FPGAs, GPGPUs, Cell/B.E., and other types of processors. My short-term interests include helping computational scientists to design and implement scientific codes for accelerator-based systems. My long-term goal is to develop a formal methodology, guidelines, and recipes that other researchers can use when implementing applications on similar accelerator-based architectures and devise methodology for analyzing and characterizing existing applications with respect to their portability to novel computing architectures. This work is funded by NSF grants 0810563 and 0626354, NASA grant NNG06GH15G, NARA grant and is part of IACAT's Center for Extreme-scale Computing and NCSA's Innovative Systems Laboratory efforts.

Publications
- J. Enos, C. Steffen, J. Fullop, M. Showerman, G. Shi, K. Esler, V. Kindratenko, J. Stone, J. Phillips, Quantifying the Impact of GPUs on Performance and Energy Efficiency in HPC Clusters, In Proc. Workshop on Work in Progress in Green Computing, First International Green Computing Conference, 2010 (paper)
- V. Kindratenko, J. Enos, G. Shi, M. Showerman, G. Arnold, J. Stone, J. Phillips, W. Hwu, GPU Clusters for High-Performance Computing, In Proc. Workshop on Parallel Programming on Accelerator Clusters - PPAC'09, 2009. (paper, slides)
- G. Shi, J. Enos, M. Showerman, V. Kindratenko, On testing GPU memory for hard and soft errors, In Proc. Symposium on Application Accelerators in High-Performance Computing – SAAHPC'09, 2009. (paper, slides)
- V. Kindratenko, Novel Computing Architectures, inaugural Novel Architectures department article, IEEE/AIF Computing in Science and Engineering, vol. 11, no. 3, pp. 54-57, May/June 2009 (paper)
- M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, W. Hwu, QP: A Heterogeneous Multi-Accelerator Cluster, In Proc. 10th LCI International Conference on High-Performance Clustered Computing, 2009 (paper)
- A. Pant, H. Jafri, V. Kindratenko, Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors, In Proc. 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing - PDP'09, 2009, pp. 119-126 (paper)
- S. Lee, D. Raila, V. Kindratenko, LLVM-CHiMPS: compilation environment for FPGAs using LLVM compiler infrastructure and CHiMPS computational model, In Proc. 4th Annual Reconfigurable Systems Summer Institute - RSSI'08, 2008 (paper)
- T. El-Ghazawi, E. El-Araby, M. Huang, K. Gaj, V. Kindratenko, D. Buell, The Promise of High-Performance Reconfigurable Computing, Computer, vol. 41, no. 2, pp. 78-85, 2008. (paper)
- V. Kindratenko, C. Steffen, R. Brunner, Accelerating Scientific Applications with Reconfigurable Computing: Getting Started, Computing in Science and Engineering, vol. 9, no. 5, pp. 70-77, 2007 (paper)
- D. Buell, T. El-Ghazawi, K. Gaj, V. Kindratenko, High-Performance Reconfigurable Computing, Guest Editors' Introduction, Computer, March 2007, pp. 27-31 (paper)
- D. Meixner, V. Kindratenko, D. Pointer, On Using Simulink to Program SRC-6 Reconfigurable Computer, In Proc. 9th Military and Aerospace Programmable Logic Devices International Conference - MAPLD'06, 2006 (paper)
- D. Meixner, V. Kindratenko, D. Pointer, Running Simulink-based Designs on SRC-6, In Proc. 10th Annual Workshop on the High Performance Embedded Computing - HPEC'06, 2006 (paper)
- V. Kindratenko, Code Partitioning for Reconfigurable High-Performance Computing: A Case Study, In Proc. International Conference on Engineering of Reconfigurable Systems and Algorithms - ERSA'06, 2006, pp. 143-149 (paper)

Technical reports
- V. Kindratenko, R. Brunner, G. Shi, D. Roeh, A. Martinez, Investigating Application Analysis and Design Methodologies for Computational Accelerators, NCSA Technical Report, 2009 (report)
- M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, W. Hwu, QP: A Heterogeneous Multi-Accelerator Cluster, NCSA Technical Report, 2008 (report)
- V. Kindratenko, D. Pointer, D. Raila, C. Steffen, Comparing CPU and FPGA Application Performance, Short Report, 2006 (report)
- V. Kindratenko, D. Pointer, D. Caliga, High-Performance Reconfigurable Computing Application Programming in C, NCSA Technical Report, 2006 (report)
- D. Meixner, V. Kindratenko, D. Pointer, Implementing Simulink Designs on SRC-6 System, NCSA Technical Report, 2006 (report)

Tutorials
- V. Kindratenko, Introduction to GPU Programming, US-Egypt Collaboration Follow-up meeting, December 2010, The American University in Cairo, Egypt (part I, part II, part III, part IV, source)
- V. Kindratenko, Introduction to GPU Programming, High Performance Computing Course, June 2010, Advanced Digital Sciences Center, Singapore (part I, part II, part III, part IV, source)
- V. Kindratenko, Introduction to GPU Programming, CRA-W/CDC Careers in High Performance Systems (CHiPS) Mentoring Workshop, July 2009, Urbana, IL (slides)
- T. El-Ghazawi, D. Buell, K. Gaj, V. Kindratenko, Reconfigurable Supercomputing Tutorial, Supercomputing 2007, November 2007, Reno, NV (slides)

Presentations
- V. Kindratenko, GPU HPC Clusters, Seminar, Department of Computer Science & Computer Engineering, University of Arkansas, December 2010, Fayetteville, AR (abstract)
- V. Kindratenko, High Performance Computing with Application Accelerators, High Performance Computing Symposium, June 2010, Advanced Digital Sciences Center, Singapore (presentation)
- V. Kindratenko, Overview of Hardware Accelerators, NSF US/Egypt Meeting on Software Development for Multicore and Heterogeneous Processing Technologies, June 2009, Cairo, Egypt, (slides)
- V. Kindratenko, High Performance Computing with Accelerators, First workshop of the Joint Laboratory for Petascale Computing, June 2009, Paris, France, (abstract, slides)
- V. Kindratenko, High Performance Computing on FPGAs: challenges and opportunities, Panel on Key Challenges presented by next generation hardware systems, Key Challenges in Modeling and Simulation Fall Creek Falls conference, September 2007, Nashville, TN (slides)
- V. Kindratenko, Accelerating Scientific Applications with Reconfigurable Computing, Seminar, Dept. of Computer and Information Sciences, University of Alabama at Birmingham, June 2007, Birmingham, AL (abstract)
- V. Kindratenko, First-hand experience on porting MATPHOT code to SRC platform, 1st Annual Reconfigurable Systems Summer Institute - RSSI, July 2005, Urbana, IL (presentation)

Codes
- CUDA wrapper library
- CUDA memory tester

NARA grant: Innovative Systems and Software: Applications to NARA Research Problems

We investigate the suitability of the Graphics Processing Unit (GPU) technology for the acceleration of image characterization algorithms used to find similarities between documents with embedded images. We have ported image characterization algorithm used in doc2learn to GPUs using both CUDA C targeting NVIDIA GPUs and OpenCL targeting NVIDIA and AMD architectures and conducted an extensive study of the impact of GPU acceleration for documents with varying number of images and image sizes.

Publications

G. Shi, V. Kindratenko, R. Kooper, P. Bajcsy, GPU Acceleration of an Image Characterization Algorithm for Document Similarity Analysis, In Proc. ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), 2011 (paper)

Technical reports and presentations
- V. Kindratenko, G. Shi, Evaluation and Exploration of Next Generation Systems for Applicability and Performance, Technical report 1, October 2010 (report, presentation)
- V. Kindratenko, G. Shi, Evaluation and Exploration of Next Generation Systems for Applicability and Performance, Technical report 2, December 2010 (report, presentation)
- V. Kindratenko, G. Shi, Evaluation and Exploration of Next Generation Systems for Applicability and Performance, Technical report 3, March 2011 (report, presentation)
- V. Kindratenko, G. Shi, Evaluation and Exploration of Next Generation Systems for Applicability and Performance, Technical report 4, June 2011 (report, presentation)

Codes
- image analysis

IACAT Project: Implementation of MILC on computational accelerators

The MIMD Lattice Computation (MILC) code, a Quantum Chromodynamics (QCD) application used to simulate four-dimensional SU(3) lattice gauge theory, is one of the largest compute cycle users at many supercomputing centers. Previously we have investigated how one of MILC applications can be accelerated on the Cell Broadband Engine. We currently investigate how this code can take advantage of the newly emerging GPU computing architecture.

Publications

S. Gottlieb, G. Shi, A. Torok, V. Kindratenko, QUDA programming for staggered quarks, In Proc. The XXVIII International Symposium on Lattice Field Theory - Lattice'10, 2010 (paper)

G. Shi, S. Gottlieb, A. Totok, V. Kindratenko, Accelerating Quantum Chromodynamics Calculations with GPUs, In Proc. Symposium on Application Accelerators in High-Performance Computing - SAAHPC'10, 2010 (paper)

G. Shi, V. Kindratenko, F. Pratas, P. Trancoso, M. Gschwind, Application Acceleration with the Cell Broadband Engine, IEEE/AIF Computing in Science and Engineering, vol. 12, no. 1, pp. 76-81, Jan./Feb. 2010 (paper)

G. Shi, V. Kindratenko, S. Gottlieb, The bottom-up implementation of one MILC lattice QCD application on the Cell blade, International Journal of Parallel Programming, vol. 37, no. 5, pp. 488-507, 2009 (paper)

G. Shi, V. Kindratenko, S. Gottlieb, Cell processor implementation of a MILC lattice QCD application, In Proc. The XXVI International Symposium on Lattice Field Theory - Lattice'08, 2008 (paper)

Technical reports and presentations
- G. Shi, S. Gottlieb, A. Torok, V. Kindratenko, Multi-GPU Implementation of MILC using QUDA Framework, SC10 poster, November 2010 (poster)
- G. Shi, S. Gottlieb, V. Kindratenko, MILC on GPUs, NCSA Technical Report, January 2010 (report)
- G. Shi, GPU Implementation of CG solver for MILC, Internal presentation, November 2009 (presentation)
- D. Roeh, J. Troup, G. Shi, V. Kindratenko, Porting MILC to GPU: Lessons learned, Workshop on using GPUs for LQCD, August 19-21 2009, Thomas Jefferson National Accelerator Facility, Newport News, Virginia (presentation)

Codes
- MILC GPU implementation

NASA grant NNG06GH15G: Advanced Astrophysical Algorithms to Novel Supercomputing Hardware

We consider a class of cosmology applications that are based on a common algorithm—multidimensional distance calculations. Examples of such applications include n-point correlation functions, instance-based learning algorithms, and power spectrum estimation—all of these algorithms are used in a number of different scientific and engineering domains. Specifically, in this investigation we focus on the 2-point angular correlation function (TPACF) used to characterize the clustering of sources on the celestial sphere. TPACF serves as one of the main tools in studying the distribution of the matter in the Universe. Due to the large size of datasets produced by modern astronomical instruments and the O(N^2) computational complexity of the algorithm, significant computing resources are required to perform the calculations for modern datasets. We have implemented a reference TPACF algorithm and ported it to SRC-6 and SGI RC100 reconfigurable computers. We have also conducted a preliminary investigation of its amenability for GPU implementation. We implemented several instance-based learning algorithms and ported artificial neural network based probability density functions code to SRC-7 reconfigurable computer.

Publications
- V. Kindratenko, A. Myers, R. Brunner, Implementation of the two-point angular correlation function on a high-performance reconfigurable computer, Scientific Programming, vol. 17, no. 3, pp. 247-259, 2009 (paper)
- V. Kindratenko, R. Brunner, A. Myers, Dynamic load-balancing on multi-FPGA systems: a case study, In Proc. 3rd Annual Reconfigurable Systems Summer Institute - RSSI'07, 2007 (paper, presentation)
- R. Brunner, V. Kindratenko, and A. Myers, Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware, In Proc. NASA Science Technology Conference - NSTC'07, 2007 (paper, presentation)
- V. Kindratenko, R. Brunner, A. Myers, Mitrion-C Application Development on SGI Altix 350/RC100, In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines - FCCM'07, 2007 (paper, presentation)

Technical reports and presentations
- V. Kindratenko, Accelerating Cosmology Applications: from 80 MFLOPS to 8 GFLOPS in 4 steps, 13th SIAM Conference on Parallel Processing for Scientific Computing, 2008 (presentation)
- R. Brunner, Year Two Progress Report for NASA Grant NNG06GH15G, 2007 (report, presentation)
- Preliminary investigation of NVIDIA G80 GPU suitability for computing TPACF, ECE498AL class exercise, 2007 (problem statement, report, presentation)
- R. Brunner, Year One Progress Report for NASA Grant NNG06GH15G, 2006 (report)
- V. Kindratenko, Exploring Coarse-grain and Fine-grain Parallelism on SRC-6 Reconfigurable Computer, 2nd Annual Reconfigurable Systems Summer Institute - RSSI'06, 2006 (poster)

Codes
- TPACF 1.0 reference implementation
- TPACF 1.0 SRC-6 implementation
- TPACF 1.0 SGI RC100 implementation
- ANN PDFs SRC-7 implementation

NSF grant 0626354: Chemical computations on future high-end computers

Chemical simulations present a computational approach to study the behavior of molecules and atoms at atomic and sub-atomic details. Such simulations, however, are greatly limited in size and timescale due to the complexity of the underlying mathematical models that translates into computationally demanding and time-consuming algorithms. For example, in molecular dynamics (MD), the non-bonded force-field calculations are typically responsible for over 80% of the overall execution time of MD codes and are the main bottleneck in achieving the microsecond timescales. In quantum chemistry, the calculation of two-electron repulsion integrals (ERIs) remains a bottleneck in many of the ab initio molecular orbital or density functional theory electronic structure codes. In direct self-consistent field (SCF) methods many millions of ERIs are recomputed every SCF iteration and count for the vast majority of the execution time. We are investigating the use of GPUs, FPGAs, and the Cell/B.E. to accelerate the execution of kernels used in chemistry codes, such as two-electron repulsion integral calculations used in direct SCF codes and the non-bonded force-field calculations used in NAMD. We have implemented Rys quadrature scheme for two-electron Coulomb repulsion integrals to evaluate primitive integrals [pq|rs] for Gaussian-type orbitals (GTO) basis sets on the SRC-6/7 reconfigurable computers and IBM Cell/B.E. blade system. We also implemented NAMD's non-bonded force-field kernel on SRC-6 reconfigurable computer and IBM Cell/B.E. processor. (See project website for more details.)

Publications
- A. Titov, V. Kindratenko, I. Ufimtsev, T. Martinez, Generation of Kernels to Calculate Electron Repulsion Integrals of High Angular Momentum Functions on GPUs – Preliminary Results, in Proc. Symposium on Application Accelerators in High-Performance Computing - SAAHPC'10, 2010 (paper)
- G. Shi, I. Ufimtsev, V. Kindratenko, T. Martinez, Direct Self-Consistent Field Computations on GPU Clusters, in Proc. IEEE International Parallel and Distributed Processing Symposium – IPDPS, 2010 (paper)
- G. Shi, V. Kindratenko, I. Ufimtsev, T. Martinez, J. Phillips, S. Gottlieb, Implementation of scientific computing applications on the Cell Broadband Engine, Scientific Programming, vol. 17, no. 1-2, pp. 135-152, 2009 (paper)
- V. Kindratenko, I. Ufimtsev, T. Martínez, Evaluation of two-electron repulsion integrals over Gaussian basis functions on SRC-6 reconfigurable computer, In Proc. 4th Annual Reconfigurable Systems Summer Institute - RSSI'08, 2008 (paper, poster)
- G. Shi, V. Kindratenko, Implementation of NAMD molecular dynamics non-bonded force-field on the Cell Broadband Engine processor, In Proc. 9th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing - PDSEC, 2008 (paper, presentation)
- V. Kindratenko, and D. Pointer, A case study in porting a production scientific supercomputing application to a reconfigurable computer, In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines - FCCM’06, 2006, pp. 13-22 (paper, presentation)

Technical reports and presentations
- V. Kindratenko, Computational Chemistry Applications at NCSA/UIUC, US-China Workshop on High Performance Computing Application Acceleration, 2010 (presentation)
- G. Shi, V. Kindratenko, I. Ufimtsev, T. Martinez, Two-Electron Integral Evaluation on FPGA, Cell and GPU accelerators, Path to Petascale: Adapting GEO/CHEM/ASTRO Applications for Accelerators and Accelerator Clusters workshop, April 2009 (poster)
- G. Shi, V. Kindratenko, Implementation of Scientific Computing Applications on the Cell Broadband Engine processor, 2nd Annual Georgia Tech, Sony/Toshiba/IBM Workshop on Software and Applications for the Cell/B.E. processor, 2008 (presentation)
- G. Shi, V. Kindratenko, Implementation of NAMD molecular dynamics non-bonded force-field on the Cell Broadband Engine processor, Supercomuting, 2007 (poster)
- V. Kindratenko, Summary of Current and Future CyberChemActivities at ISL/NCSA, Internal project meeting, 2007 (presentation)

Codes
- NAMD SCR-6 implementation
- NAMD Cell/B.E. implementation
- ERIs reference implementation
- ERIs SCR-6 implementation
- ERIs Cell/B.E. implementation

NSF grant 0810563: Investigating Application Analysis and Design Methodologies for Computational Accelerators

The impact of computational accelerators on scientific applications and the investment required to utilize these resources is not fully understood in the scientific computing community. While accelerator-based computing architectures offer great potential performance, the execution models, software architectures, and development processes that are required to realize that potential currently differ dramatically from exiting computational architectures. We are conducting an exploratory investigation to understand the impact of accelerator technologies on scientific and engineering codes and to quantify the efforts and requirements necessary to implement these codes on the newly emerging accelerator technologies. We are also investigating formal methods in application analysis and cross-platform software engineering for accelerator technologies. Our approach is based on implementing a commonly used algorithm on several accelerator architectures and, in doing so, developing formal guidelines and recipes that other researchers can adopt when porting their own applications to similar accelerator-based architectures. Specifically, we use the 2-point correlation function (TPACF) algorithm (previously developed under the NASA grant and extended to allow for error estimation) as a testbed for cross-platform implementation.

Publications
- V. Kindratenko, R. Brunner, Accelerating Cosmological Data Analysis with FPGAs, In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines - FCCM'09, 2009 (paper, presentation)
- D. Roeh, V. Kindratenko, R. Brunner, Accelerating Cosmological Data Analysis with Graphics Processors, in Proc. 2nd Workshop on General-Purpose Computation on Graphics Processing Units workshop - GPGPU-2, 2009 (paper, presentation)

Technical reports and presentations
- V. Kindratenko, D. Roeh, G. Shi, R. Brunner, Accelerating Cosmology Codes, Path to Petascale: Adapting GEO/CHEM/ASTRO Applications for Accelerators and Accelerator Clusters workshop, April 2009 (poster)
- V. Kindratenko, R. Brunner, G. Shi, D. Roeh, A. Martinez, Investigating Application Analysis and Design Methodologies for Computational Accelerators, NCSA Technical Report, 2009 (report)
- V. Kindratenko, D. Roeh, Internal NCSA GPU programming tutorial, December 2008 (part 1, part 2)
- V. Kindratenko, C. Steffen, Introduction to reconfigurable computing, July 2008 (tutorial)

Codes

back to my homepage

High Performance Computing with Accelerators

Publications

Technical reports

Tutorials

Presentations

Codes

NARA grant: Innovative Systems and Software: Applications to NARA Research Problems

Publications

Technical reports and presentations

Codes

IACAT Project: Implementation of MILC on computational accelerators

Publications

Technical reports and presentations

Codes

NASA grant NNG06GH15G: Advanced Astrophysical Algorithms to Novel Supercomputing Hardware

Publications

Technical reports and presentations

Codes

NSF grant 0626354: Chemical computations on future high-end computers

Publications

Technical reports and presentations

Codes

NSF grant 0810563: Investigating Application Analysis and Design Methodologies for Computational Accelerators

Publications

Technical reports and presentations

Codes