Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. (2015). Paper

Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng. DeepSpeech: Scaling up end-to-end speech recognition. (2014). Paper

Haicheng Wu, Gregory Diamos, Tim Sheard, Molham Aref, and Sudhakar Yalamanchili. Red Fox: Acceleration and Execution of Relational Queries using GPUs. International Symposium on Code Generation and Optimization (CGO 2014). Paper

Gregory Diamos, Haicheng Wu, Jin Wang, Ashwin Lele, and Sudhaka Yalamanchili. Relational Algorithms for Multi-Bulk-Synchronous Processors. The 18th Symposium on Principles and Practice of Parallel Programming (PPoPP), 2013. Paper

Haicheng Wu, Gregory Diamos, Srihari Cadambi, and Sudhakar Yalamanchili. Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation. 45th International Symposium on Microarchitecture (MICRO 45), 2012. Paper (bibtex)

Nicolas Brunie, Sylvain Collange, and Gregory Diamos. Simultaneous Branch and Warp Interweaving for Sustained GPU Performance. The 39th International Symposium on Computer Architecture (ISCA 39), 2012. Paper (bibtex)

Gregory Diamos, Haicheng Wu, Ashwin Lele, Jin Wang, and Sudhakar Yalamanchili. Relational Algorithms for Multi-Bulk-Synchronous Processors. Tech Report. Paper (bibtex)

Naila Farooqui, Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili. Lynx: Dynamic Instrumentation System for Data-Parallel Applications on GPGPU-based Architectures. ISPASS April 2012. Paper (bibtex)

Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili. Dynamic Compilation of Data-Parallel Kernels for Vector Processors. Code Generation and Optimization (CGO 2012). Paper (bibtex)

Gregory Diamos, Benjamin Ashbaugh, Subramaniam Maiyuran, Andrew Kerr, Haicheng Wu, Sudhakar Yalamanchili. SIMD Re-convergence at Thread Frontiers. 44th International Symposium on Microarchitecture (MICRO 44). Paper (bibtex) (presentation)

Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili. GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot. GPU Computing GEMS , vol. 1, 2011. Paper (bibtex)

Gregory Diamos. Harmony: An Execution Model For Heterogeneous Systems. PhD Thesis. December 2011. Paper (bibtex)

Haicheng Wu, Gregory Diamos, Si Li, and Sudhakar Yalamanchili. Characterization and Transformation of Unstructured Control Flow in GPU Applications . The First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems. June 2011. Paper (bibtex)

Naila Farooqui, Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili, and Karsten Schwan. A Framework for Dynamically Instrumenting GPU Compute Applications within GPU Ocelot . Fourth Workshop on General-Purpose Computation on Graphics Procesing Units. March 2011. Paper (bibtex)

Gregory Diamos. An Execution Model and Runtime for Heterogeneous Many-Core Systems. PhD Dissertation Proposal. January 2010. Proposal

Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili, and Nathan Clark. Ocelot: A Dynamic Compiler for Bulk-Synchronous Applications in Heterogeneous Systems. The Nineteenth International Conference on Parallel Architectures and Compilation Techniques. September 2010. Paper (bibtex)

Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. Modeling GPU-CPU Workloads and Systems. Third Workshop on General-Purpose Computation on Graphics Procesing Units. March 2010. Paper (bibtex)

Sudnya Padalikar and Gregory Diamos. Exploring The Latency and Bandwidth Tolerance of CUDA Applications. NFinTes Tech Report. December 2009. Paper

Gregory Diamos. The Design and Implementation of Ocelot's Dynamic Binary Translator from PTX to Multi-Core x86. CERCS Tech Report. December 2009. Paper

Gregory Diamos and Sudhakar Yalamanchili. Speculative Execution On Multi-GPU Systems. IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010). April 2010. Paper (bibtex)

Gregory Diamos and Sudhakar Yalamanchili. Speculative Execution On Multi-GPU Systems. CERCS Tech Report. September 2009. Paper

Gregory Diamos and Sudhakar Yalamanchili. Harmony: An Execution Model and Runtime for Heterogeneous Many-Core Processors. High Performance Distributed Computing (HPDC08). Jun 2008. Paper (bibtex)

Andrew Kerr, Gregory Diamos, and Sudhakar Yalamanchili. A Characterization and Analysis of PTX Kernels. IEEE International Symposium on Workload Characterization (IISWC). October 2009. Paper

Gregory Diamos, Andrew Kerr, Mukil Kesavan. Translating GPU Binaries to Tiered Many-Core Architectures with Ocelot. Tech Report. January 2009. Paper

Gregory Diamos and Sudhakar Yalamanchili. Harmony: An Execution Model and Runtime for Heterogeneous Many-Core Processors. Tech Report. December 2007. Paper (bibtex)

Gregory Diamos. State Explosion: An Obvious Limitation to Strong Scaling. Short Paper. September 2009. Paper

Gregory Diamos and Sudhakar Yalamanchili. STARS: A System for Tuning and Automatically Reconfiguring SoC Links. Design Automation and Test in Europe (NoC Workshop) (DATE08). April 2008. Paper