PACT 2024October 13–16, 2024

Proceedings

Schedule

Sunday, October 13, 2024 (Room: Catalina)

Time What
07:15 Breakfast (provided)
08:00 Tutorial: QVT: The Quantum Visualization Toolkit
09:30-10 Break
12:00 Lunch (provided)
13:30 Workshop: The 4th International Workshop on Machine Learning for Software Hardware Co-Design (MLS/H)
15-15:30 Break
18:00 Reception and Posters

Monday, October 14, 2024 (Room: Catalina)

Time What
07:15 Breakfast (provided)
08:00 Opening
08:30 Keynote: Creating DSLs Made Easy, Saman Amarasinghe, Massachusetts Institute of Technology.
09:30 Break
10:00 Session 1: Machine Learning (4 papers)
Chair: Hyoungwook Nam, University of Illinois at Urbana-Champaign
  • GraNNDis: Fast Distributed Graph Neural Network Training Framework for Multi-Server Clusters
    Jaeyong Song, Hongsun Jang, Hunseong Lim, Jaewon Jung (Seoul National University); Youngsok Kim (Yonsei University); Jinho Lee (Seoul National University)
  • Activation Sequence Caching: High-Throughput and Memory-Efficient Generative Inference with a Single GPU
    Sowoong Kim, Eunyeong Sim (UNIST); Youngsam Shin, YeonGon Cho (Samsung Advanced Institute of Technology); Woongki Baek (UNIST)
  • Improving Throughput-oriented LLM Inference with CPU Computations
    Daon Park, Bernhard Egger (Seoul National University)
  • BOOM: Use your Desktop to Accurately Predict the Performance of Large Deep Neural Networks
    Qidong Su (University of Toronto / Vector Institute / CentML); Jiacheng Yang (University of Toronto / Vector Institute); Gennady Pekhimenko (University of Toronto / Vector Institute / CentML)
12:00 Lunch (provided)
13:30 Session 2: Architecture and Application Co-design (3 papers)
Chair: Bernhard Egger, Seoul National University
  • PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System
    Steve Rhyner, Haocong Luo (ETH Zurich); Juan Gómez-Luna (NVIDIA); Mohammad Sadrosadati (ETH Zurich); Jiawei Jiang (Wuhan University); Ataberk Olgun, Harshita Gupta (ETH Zurich); Ce Zhang (University of Chicago); Onur Mutlu (ETH Zurich)
  • ZeD: A Generalized Accelerator for Variably Sparse Matrix Computations in ML
    Pranav Dangi, Zhenyu Bai, Rohan Juneja, Dhananjaya Wijerathne, Tulika Mitra (National University of Singapore)
  • A Parallel Hash Table for Streaming Applications
    Magnus Östgren, Ioannis Sourdis (Chalmers University of Technology)
15:00 Break
15:30 Session 3: Parallelism (3 papers)
Chair: Ying Jing, University of Illinois at Urbana-Champaign
  • Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment
    Alberto Zeni (Politecnico di Milano, Italy); Seth Onken (NVIDIA Corporation); Marco D. Santambrogio (Politecnico di Milano, Italy); Mehrzad Samadi (NVIDIA Corporation)
  • ACE: Efficient GPU Kernel Concurrency for Input-Dependent Irregular Computational Graphs
    Sankeerth Durvasula, Junan Zhao, Raymond Kiguru, Yushi Guan, Zhonghan Chen, Nandita Vijaykumar (University of Toronto)
  • Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search
    Jakob Hartmann, Guoliang He, Eiko Yoneki (University of Cambridge)
17:30 Business Meeting

Tuesday, October 15, 2024 (Room: Catalina)

Time What
07:15 Breakfast (provided)
08:30 Keynote: Every “Bit” Matters: Fostering Innovation in Deep Learning and Beyond, Andreas Moshovos, University of Toronto
09:30 Break
10:00 Session 4: Compilers (4 papers)
Chair: J. Nelson Amaral, University of Alberta
  • A Transducers-based Programming Framework for Efficient Data Transformation
    Tri Nguyen, Michela Becchi (North Carolina State University)
  • MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations
    Akash Dutta, Ali Jannesari (Iowa State University)
  • Parallel Loop Locality Analysis for Symbolic Thread Counts
    Fangzhou Liu, Yifan Zhu, Shaotong Sun, Chen Ding, Wesley Smith, Kaave Hosseini (University of Rochester
  • PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency Model
    An Qi Zhang (University of Utah) , Andrés Goens (University of Amsterdam), Nicolai Oswald (Nvidia), Tobias Grosser (University of Cambridge), Daniel Sorin (Duke University), Vijay Nagarajan (University of Utah)
12:00 Lunch (provided)
13:30 Session 5: Security (4 papers)
Chair: Donald Yeung, University of Maryland
  • FriendlyFoe: Adversarial Machine Learning as a Practical Architectural Defense against Side Channel Attacks
    Hyoungwook Nam (University of Illinois at Urbana-Champaign); Raghavendra Pradyumna Pothukuchi (Yale University); Bo Li, Nam Sung Kim (University of Illinois, Urbana-Champaign); Josep Torrellas (University of Illinois at Urbana Champaign)
  • Toast: A Heterogeneous Memory Management System
    Maurice Bailleu (Huawei Research); Dimitrios Stavrakakis (TU Munich / The University of Edinburgh); Rodrigo Rocha (Huawei Research); Soham Chakraborty (TU Delft); Deepak Garg (Max Planck Institute for Software Systems (MPI-SWS)); Pramod Bhatotia (TU Munich / The University of Edinburgh)
  • BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons
    Ardhi Wiratama Baskara Yudha, Jiaqi Xue, Qian Lou (University of Central Florida); Huiyang Zhou (North Carolina State University); Yan Solihin (University of Central Florida)
  • SZKP: A Scalable Accelerator Architecture for Zero-Knowledge Proofs
    Alhad Daftardar, Brandon Reagen, Siddharth Garg (New York University)
15:30 Break
16:00 Session 6: Quantum & Neuromorphic (3 papers)
Chair: Qian Lou, University of Central Florida
  • Recompiling QAOA Circuits on Various Rotational Directions
    Enhyeok Jang, Dongho Ha, Seungwoo Choi, Youngmin Kim, Jaewon Kwon, Yongju Lee, Sungwoo Ahn, Hyungseok Kim, Won Woo Ro (Yonsei University)
  • Faster and More Reliable Quantum SWAPs via Native Gates
    Pranav Gokhale, Teague Tomesh (Infleqtion); Martin Suchara (Microsoft); Fred Chong (University of Chicago)
  • NavCim: Comprehensive Design Space Exploration for Analog Computing-in-Memory Architectures
    Juseong Park, Boseok Kim (Pohang University of Science and Technology); Hyojin Sung (Seoul National University)
17:30 Break
18:00 Conference Banquet (Room: Gallery)

Wednesday, October 16, 2024 (Room: Catalina)

Time What Where
07:15 Breakfast (provided)
08:00 Session 7: Memory (3 papers)
Chair: Ioannis Sourdis, Chalmers University of Technology
  • MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and Endurance
    Devesh Singh (University of Maryland, College Park); Donald Yeung (University of Maryland)
  • Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems
    Yiwei Li, Boyu Tian, Mingyu Gao (Tsinghua University)
  • Chimera: Leveraging Hybrid Offsets for Efficient Data Prefetching
    Shuiyi He, Zicong Wang, Xuan Tang, Qiyao Sun, Dezun Dong (National University of Defense Technology)
09:30 Break
10:00 SRC poster winners presentations
11:00 Session 8: Address Translation, Coherence, and Communication (3 papers)
Chair: Devesh Singh, Samsung
  • Rethinking Page Table Structure for Fast Address Translation in GPUs: A Fixed-Size Hashed Page Table
    Sungbin Jang, Junhyeok Park, Osang Kwon, Yongho Lee, Seokin Hong (Sungkyunkwan University)
  • Mozart: Taming Taxes and Composing Accelerators with Shared-Memory
    Vignesh Suresh, Bakshree Mishra, Ying Jing, Zeran Zhu, Naiyin Jin, Charles Block (University of Illinois at Urbana-Champaign); Paolo Mantovani, Davide Giri, Joseph Zuckerman, Luca P. Carloni (Columbia University); Sarita Adve (University of Illinois at Urbana-Champaign)
  • vSPACE: Supporting Parallel Network Packet Processing in Virtualized Environments through Dynamic Core Management
    Gyeongseo Park, Minho Kim (DGIST); Ki-Dong Kang (DGIST/ETRI); Yunhyeong Jeon, Sungju Kim, Hyosang Kim (DGIST); Daehoon Kim (Yonsei University)
12:30 Closing


Keynotes

    

Monday, October 14, 2024: Creating DSLs Made Easy, Saman Amarasinghe

Today, applications that require high-performance rely on libraries of hand-optimized kernels, with thousands available across various domains and architectures, while Domain-Specific Languages (DSLs) and their accompanying compilers remain relatively rare. A well-designed DSL can describe a much wider variety of programs within a given domain than even the most comprehensive library, while also unlocking powerful cross-function and global domain-specific optimizations that hand-optimized kernels cannot achieve. As Hennessy and Patterson emphasized in their Turing Award Lecture, the widespread adoption of Domain-Specific Accelerators depends on the availability of DSLs to fully harness these accelerators' high-performance capabilities. However, building high-performance DSLs is complex and time-consuming, often requiring compiler experts to devote years to development.

In this talk, I will introduce BuildIt, a C++ framework designed for the rapid prototyping of high-performance DSLs. BuildIt uses a multi-stage programming approach to combine the flexibility of libraries with the performance and specialization of code generation. With BuildIt, domain experts can transform existing libraries into efficient, specialized compilers simply by modifying types of the variables. Moreover, it allows them to implement analyses and transformations without needing to write traditional compiler code. Currently, BuildIt supports code generation for multi-core CPUs and GPUs, with FPGA support coming soon. I will also showcase three DSLs created with BuildIt to highlight its power and ease of use: a reimplementation of the GraphIt graph computing language, the BREeze DSL for regular expressions, and NetBlocks, a DSL for custom network protocol development.</p>

Bio:

Saman Amarasinghe is a Professor in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), where he leads the Commit compiler group. Under his leadership, the Commit group has developed a wide range of innovative programming languages and compilers, including StreamIt, StreamJIT, PetaBricks, Halide, TACO, Finch, Systec, GraphIt, Simit, MILK, Cimple, BioStream, NetBlocks, BREeze, CoLa, Shim, AskIt, and Seq. Additionally, the group has created compiler and runtime frameworks such as DynamoRIO, Helium, Tiramisu, Codon, BuildIt, and D2X as well as tools for vectorization like Superword Level Parallelism (SLP), goSLP, and VeGen. Saman’s team also developed Ithemal, a machine-learning-based performance predictor, Program Shepherding to protect programs from external attacks, the OpenTuner extendable autotuner, and the Kendo deterministic execution system. He was also co-leader of the Raw architecture project. Outside academia, Saman has co-founded several companies, including Determina, Lanka Internet Services Ltd., Venti Technologies, DataCebo, and Exaloop. He earned his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D. from Stanford University in 1990 and 1997, respectively. He is also a Fellow of the ACM.

    

Tuesday, October 15, 2024: Every “Bit” Matters: Fostering Innovation in Deep Learning and Beyond, Andreas Moshovos

Computers are widely utilized across various disciplines. By developing increasingly more powerful and efficient machines, we can foster innovation and have a broad impact across many other communities. This talk overviews our own perspective and approach to enabling more efficient hardware/software stacks for deep learning. It will focus on our most recent efforts to enable more efficient training and inference through datatype learning and the optimized encoding of information during data transfers. Rather than guessing what good datatypes can be (the current state-of-the-art practice), our methods learn them. Further, rather than storing data as-is, our methods efficiently pack them using fewer bits without loss of information. We will also comment on what challenges lie ahead in this specific field.

Moving beyond the specific work of our group, I will reflect on how dissemination and funding practices and perspectives impact academic innovation, our growth and wellbeing as researchers and practitioners, and their broader implications for future growth of our community.

Bio:

Andreas Moshovos along with his students has been answering the question “what is the best possible digital computation structure/software combination to solve problem X or to run application Y?” where “best” is a characteristic (or combination thereof) such as power, cost, complexity, etc. Much of his earlier work has been on high-performance processor and memory system design and it has influenced commercial designs. His more recent work has been on hardware/software acceleration methods for machine learning. He has been with the University of Toronto since 2000, but has also taught at Ecole Polytechnique Fédérale de Lausanne, Northwestern University, University of Athens, and the Hellenic Open University. He was the Scientific Director of the Canadian NSERC COHESA Research Network targeting machine learning optimizations, a consortium of 25+ research groups and industry partners.

Important Dates and Deadlines

Conference Papers:

  • Abstract submission deadline: Mar 22, 2024
    Extended to March 25, 2024
  • Paper submission deadline: Mar 27, 2024
    Extended to April 1, 2024
  • Rebuttal period: Jun 3-9, 2024
  • Author notification: Jul 1, 2024
  • Artifact submission: Jul 8, 2024
  • Camera ready papers: Aug 24, 2024

ACM SRC:

  • Abstract Registration Deadline: August 15, 2024
  • Abstract Submission Deadline: August 18, 2024

Conference: October 13–16, 2024


Sponsors

Bronze

Supporters


Previous PACTs

Earlier PACTs