Schedule

Sunday, October 13, 2024
Monday, October 14, 2024
Tuesday, October 15, 2024
Wednesday, October 16, 2024

Sunday, October 13, 2024 (Room: Catalina)

Time	What
07:15	Breakfast (provided)
08:00	Tutorial: QVT: The Quantum Visualization Toolkit
09:30-10	Break
12:00	Lunch (provided)
13:30	Workshop: The 4th International Workshop on Machine Learning for Software Hardware Co-Design (MLS/H)
15-15:30	Break
18:00	Reception and Posters

Monday, October 14, 2024 (Room: Catalina)

Time	What
07:15	Breakfast (provided)
08:00	Opening
08:30	Keynote: Creating DSLs Made Easy, Saman Amarasinghe, Massachusetts Institute of Technology.
09:30	Break
10:00	Session 1: Machine Learning (4 papers) Chair: Hyoungwook Nam, University of Illinois at Urbana-Champaign GraNNDis: Fast Distributed Graph Neural Network Training Framework for Multi-Server Clusters Jaeyong Song, Hongsun Jang, Hunseong Lim, Jaewon Jung (Seoul National University); Youngsok Kim (Yonsei University); Jinho Lee (Seoul National University) Activation Sequence Caching: High-Throughput and Memory-Efficient Generative Inference with a Single GPU Sowoong Kim, Eunyeong Sim (UNIST); Youngsam Shin, YeonGon Cho (Samsung Advanced Institute of Technology); Woongki Baek (UNIST) Improving Throughput-oriented LLM Inference with CPU Computations Daon Park, Bernhard Egger (Seoul National University) BOOM: Use your Desktop to Accurately Predict the Performance of Large Deep Neural Networks Qidong Su (University of Toronto / Vector Institute / CentML); Jiacheng Yang (University of Toronto / Vector Institute); Gennady Pekhimenko (University of Toronto / Vector Institute / CentML)
12:00	Lunch (provided)
13:30	Session 2: Architecture and Application Co-design (3 papers) Chair: Bernhard Egger, Seoul National University PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System Steve Rhyner, Haocong Luo (ETH Zurich); Juan Gómez-Luna (NVIDIA); Mohammad Sadrosadati (ETH Zurich); Jiawei Jiang (Wuhan University); Ataberk Olgun, Harshita Gupta (ETH Zurich); Ce Zhang (University of Chicago); Onur Mutlu (ETH Zurich) ZeD: A Generalized Accelerator for Variably Sparse Matrix Computations in ML Pranav Dangi, Zhenyu Bai, Rohan Juneja, Dhananjaya Wijerathne, Tulika Mitra (National University of Singapore) A Parallel Hash Table for Streaming Applications Magnus Östgren, Ioannis Sourdis (Chalmers University of Technology)
15:00	Break
15:30	Session 3: Parallelism (3 papers) Chair: Ying Jing, University of Illinois at Urbana-Champaign Leveraging Difference Recurrence Relations for High-Performance GPU Genome Alignment Alberto Zeni (Politecnico di Milano, Italy); Seth Onken (NVIDIA Corporation); Marco D. Santambrogio (Politecnico di Milano, Italy); Mehrzad Samadi (NVIDIA Corporation) ACE: Efficient GPU Kernel Concurrency for Input-Dependent Irregular Computational Graphs Sankeerth Durvasula, Junan Zhao, Raymond Kiguru, Yushi Guan, Zhonghan Chen, Nandita Vijaykumar (University of Toronto) Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search Jakob Hartmann, Guoliang He, Eiko Yoneki (University of Cambridge)
17:30	Business Meeting

Tuesday, October 15, 2024 (Room: Catalina)

Time	What
07:15	Breakfast (provided)
08:30	Keynote: Every “Bit” Matters: Fostering Innovation in Deep Learning and Beyond, Andreas Moshovos, University of Toronto
09:30	Break
10:00	Session 4: Compilers (4 papers) Chair: J. Nelson Amaral, University of Alberta A Transducers-based Programming Framework for Efficient Data Transformation Tri Nguyen, Michela Becchi (North Carolina State University) MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations Akash Dutta, Ali Jannesari (Iowa State University) Parallel Loop Locality Analysis for Symbolic Thread Counts Fangzhou Liu, Yifan Zhu, Shaotong Sun, Chen Ding, Wesley Smith, Kaave Hosseini (University of Rochester PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency Model An Qi Zhang (University of Utah) , Andrés Goens (University of Amsterdam), Nicolai Oswald (Nvidia), Tobias Grosser (University of Cambridge), Daniel Sorin (Duke University), Vijay Nagarajan (University of Utah)
12:00	Lunch (provided)
13:30	Session 5: Security (4 papers) Chair: Donald Yeung, University of Maryland FriendlyFoe: Adversarial Machine Learning as a Practical Architectural Defense against Side Channel Attacks Hyoungwook Nam (University of Illinois at Urbana-Champaign); Raghavendra Pradyumna Pothukuchi (Yale University); Bo Li, Nam Sung Kim (University of Illinois, Urbana-Champaign); Josep Torrellas (University of Illinois at Urbana Champaign) Toast: A Heterogeneous Memory Management System Maurice Bailleu (Huawei Research); Dimitrios Stavrakakis (TU Munich / The University of Edinburgh); Rodrigo Rocha (Huawei Research); Soham Chakraborty (TU Delft); Deepak Garg (Max Planck Institute for Software Systems (MPI-SWS)); Pramod Bhatotia (TU Munich / The University of Edinburgh) BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons Ardhi Wiratama Baskara Yudha, Jiaqi Xue, Qian Lou (University of Central Florida); Huiyang Zhou (North Carolina State University); Yan Solihin (University of Central Florida) SZKP: A Scalable Accelerator Architecture for Zero-Knowledge Proofs Alhad Daftardar, Brandon Reagen, Siddharth Garg (New York University)
15:30	Break
16:00	Session 6: Quantum & Neuromorphic (3 papers) Chair: Qian Lou, University of Central Florida Recompiling QAOA Circuits on Various Rotational Directions Enhyeok Jang, Dongho Ha, Seungwoo Choi, Youngmin Kim, Jaewon Kwon, Yongju Lee, Sungwoo Ahn, Hyungseok Kim, Won Woo Ro (Yonsei University) Faster and More Reliable Quantum SWAPs via Native Gates Pranav Gokhale, Teague Tomesh (Infleqtion); Martin Suchara (Microsoft); Fred Chong (University of Chicago) NavCim: Comprehensive Design Space Exploration for Analog Computing-in-Memory Architectures Juseong Park, Boseok Kim (Pohang University of Science and Technology); Hyojin Sung (Seoul National University)
17:30	Break
18:00	Conference Banquet (Room: Gallery)

Wednesday, October 16, 2024 (Room: Catalina)

Time	What	Where
07:15	Breakfast (provided)
08:00	Session 7: Memory (3 papers) Chair: Ioannis Sourdis, Chalmers University of Technology MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and Endurance Devesh Singh (University of Maryland, College Park); Donald Yeung (University of Maryland) Trimma: Trimming Metadata Storage and Latency for Hybrid Memory Systems Yiwei Li, Boyu Tian, Mingyu Gao (Tsinghua University) Chimera: Leveraging Hybrid Offsets for Efficient Data Prefetching Shuiyi He, Zicong Wang, Xuan Tang, Qiyao Sun, Dezun Dong (National University of Defense Technology)
09:30	Break
10:00	SRC poster winners presentations
11:00	Session 8: Address Translation, Coherence, and Communication (3 papers) Chair: Devesh Singh, Samsung Rethinking Page Table Structure for Fast Address Translation in GPUs: A Fixed-Size Hashed Page Table Sungbin Jang, Junhyeok Park, Osang Kwon, Yongho Lee, Seokin Hong (Sungkyunkwan University) Mozart: Taming Taxes and Composing Accelerators with Shared-Memory Vignesh Suresh, Bakshree Mishra, Ying Jing, Zeran Zhu, Naiyin Jin, Charles Block (University of Illinois at Urbana-Champaign); Paolo Mantovani, Davide Giri, Joseph Zuckerman, Luca P. Carloni (Columbia University); Sarita Adve (University of Illinois at Urbana-Champaign) vSPACE: Supporting Parallel Network Packet Processing in Virtualized Environments through Dynamic Core Management Gyeongseo Park, Minho Kim (DGIST); Ki-Dong Kang (DGIST/ETRI); Yunhyeong Jeon, Sungju Kim, Hyosang Kim (DGIST); Daehoon Kim (Yonsei University)
12:30	Closing

Keynotes

Monday, October 14, 2024: Creating DSLs Made Easy, Saman Amarasinghe

Today, applications that require high-performance rely on libraries of hand-optimized kernels, with thousands available across various domains and architectures, while Domain-Specific Languages (DSLs) and their accompanying compilers remain relatively rare. A well-designed DSL can describe a much wider variety of programs within a given domain than even the most comprehensive library, while also unlocking powerful cross-function and global domain-specific optimizations that hand-optimized kernels cannot achieve. As Hennessy and Patterson emphasized in their Turing Award Lecture, the widespread adoption of Domain-Specific Accelerators depends on the availability of DSLs to fully harness these accelerators' high-performance capabilities. However, building high-performance DSLs is complex and time-consuming, often requiring compiler experts to devote years to development.

In this talk, I will introduce BuildIt, a C++ framework designed for the rapid prototyping of high-performance DSLs. BuildIt uses a multi-stage programming approach to combine the flexibility of libraries with the performance and specialization of code generation. With BuildIt, domain experts can transform existing libraries into efficient, specialized compilers simply by modifying types of the variables. Moreover, it allows them to implement analyses and transformations without needing to write traditional compiler code. Currently, BuildIt supports code generation for multi-core CPUs and GPUs, with FPGA support coming soon. I will also showcase three DSLs created with BuildIt to highlight its power and ease of use: a reimplementation of the GraphIt graph computing language, the BREeze DSL for regular expressions, and NetBlocks, a DSL for custom network protocol development.</p>

Bio:

Saman Amarasinghe is a Professor in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), where he leads the Commit compiler group. Under his leadership, the Commit group has developed a wide range of innovative programming languages and compilers, including StreamIt, StreamJIT, PetaBricks, Halide, TACO, Finch, Systec, GraphIt, Simit, MILK, Cimple, BioStream, NetBlocks, BREeze, CoLa, Shim, AskIt, and Seq. Additionally, the group has created compiler and runtime frameworks such as DynamoRIO, Helium, Tiramisu, Codon, BuildIt, and D2X as well as tools for vectorization like Superword Level Parallelism (SLP), goSLP, and VeGen. Saman’s team also developed Ithemal, a machine-learning-based performance predictor, Program Shepherding to protect programs from external attacks, the OpenTuner extendable autotuner, and the Kendo deterministic execution system. He was also co-leader of the Raw architecture project. Outside academia, Saman has co-founded several companies, including Determina, Lanka Internet Services Ltd., Venti Technologies, DataCebo, and Exaloop. He earned his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D. from Stanford University in 1990 and 1997, respectively. He is also a Fellow of the ACM.

Tuesday, October 15, 2024: Every “Bit” Matters: Fostering Innovation in Deep Learning and Beyond, Andreas Moshovos

Computers are widely utilized across various disciplines. By developing increasingly more powerful and efficient machines, we can foster innovation and have a broad impact across many other communities. This talk overviews our own perspective and approach to enabling more efficient hardware/software stacks for deep learning. It will focus on our most recent efforts to enable more efficient training and inference through datatype learning and the optimized encoding of information during data transfers. Rather than guessing what good datatypes can be (the current state-of-the-art practice), our methods learn them. Further, rather than storing data as-is, our methods efficiently pack them using fewer bits without loss of information. We will also comment on what challenges lie ahead in this specific field.

Moving beyond the specific work of our group, I will reflect on how dissemination and funding practices and perspectives impact academic innovation, our growth and wellbeing as researchers and practitioners, and their broader implications for future growth of our community.

Bio:

Andreas Moshovos along with his students has been answering the question “what is the best possible digital computation structure/software combination to solve problem X or to run application Y?” where “best” is a characteristic (or combination thereof) such as power, cost, complexity, etc. Much of his earlier work has been on high-performance processor and memory system design and it has influenced commercial designs. His more recent work has been on hardware/software acceleration methods for machine learning. He has been with the University of Toronto since 2000, but has also taught at Ecole Polytechnique Fédérale de Lausanne, Northwestern University, University of Athens, and the Hellenic Open University. He was the Scientific Director of the Canadian NSERC COHESA Research Network targeting machine learning optimizations, a consortium of 25+ research groups and industry partners.

Important Dates and Deadlines

Conference Papers:

Abstract submission deadline: ~~Mar 22, 2024~~
~~Extended to March 25, 2024~~
Paper submission deadline: ~~Mar 27, 2024~~
~~Extended to April 1, 2024~~
~~Rebuttal period: Jun 3-9, 2024~~
~~Author notification: Jul 1, 2024~~
Artifact submission: Jul 8, 2024
Camera ready papers: Aug 24, 2024