Time | What |
---|---|
07:15 | Breakfast (provided) |
08:00 | Tutorial: QVT: The Quantum Visualization Toolkit |
09:30-10 | Break |
12:00 | Lunch (provided) |
13:30 | Workshop: The 4th International Workshop on Machine Learning for Software Hardware Co-Design (MLS/H) |
15-15:30 | Break |
18:00 | Reception and Posters |
Time | What |
---|---|
07:15 | Breakfast (provided) |
08:00 | Opening |
08:30 | Keynote: Creating DSLs Made Easy, Saman Amarasinghe, Massachusetts Institute of Technology. |
09:30 | Break |
10:00 |
Session 1: Machine Learning (4 papers)
Chair: Hyoungwook Nam, University of Illinois at Urbana-Champaign
|
12:00 | Lunch (provided) |
13:30 |
Session 2: Architecture and Application Co-design (3 papers)
Chair: Bernhard Egger, Seoul National University
|
15:00 | Break |
15:30 |
Session 3: Parallelism (3 papers)
Chair: Ying Jing, University of Illinois at Urbana-Champaign
|
17:30 | Business Meeting |
Time | What |
---|---|
07:15 | Breakfast (provided) |
08:30 | Keynote: Every “Bit” Matters: Fostering Innovation in Deep Learning and Beyond, Andreas Moshovos, University of Toronto |
09:30 | Break |
10:00 |
Session 4: Compilers (4 papers)
Chair: J. Nelson Amaral, University of Alberta
|
12:00 | Lunch (provided) |
13:30 |
Session 5: Security (4 papers)
Chair: Donald Yeung, University of Maryland
|
15:30 | Break |
16:00 |
Session 6: Quantum & Neuromorphic (3 papers)
Chair: Qian Lou, University of Central Florida
|
17:30 | Break |
18:00 | Conference Banquet (Room: Gallery) |
Time | What | Where |
---|---|
07:15 | Breakfast (provided) |
08:00 |
Session 7: Memory (3 papers)
Chair: Ioannis Sourdis, Chalmers University of Technology
|
09:30 | Break |
10:00 | SRC poster winners presentations |
11:00 |
Session 8: Address Translation, Coherence, and Communication (3 papers)
Chair: Devesh Singh, Samsung
|
12:30 | Closing |
Today, applications that require high-performance rely on libraries of hand-optimized kernels, with thousands available across various domains and architectures, while Domain-Specific Languages (DSLs) and their accompanying compilers remain relatively rare. A well-designed DSL can describe a much wider variety of programs within a given domain than even the most comprehensive library, while also unlocking powerful cross-function and global domain-specific optimizations that hand-optimized kernels cannot achieve. As Hennessy and Patterson emphasized in their Turing Award Lecture, the widespread adoption of Domain-Specific Accelerators depends on the availability of DSLs to fully harness these accelerators' high-performance capabilities. However, building high-performance DSLs is complex and time-consuming, often requiring compiler experts to devote years to development.
In this talk, I will introduce BuildIt, a C++ framework designed for the rapid prototyping of high-performance DSLs. BuildIt uses a multi-stage programming approach to combine the flexibility of libraries with the performance and specialization of code generation. With BuildIt, domain experts can transform existing libraries into efficient, specialized compilers simply by modifying types of the variables. Moreover, it allows them to implement analyses and transformations without needing to write traditional compiler code. Currently, BuildIt supports code generation for multi-core CPUs and GPUs, with FPGA support coming soon. I will also showcase three DSLs created with BuildIt to highlight its power and ease of use: a reimplementation of the GraphIt graph computing language, the BREeze DSL for regular expressions, and NetBlocks, a DSL for custom network protocol development.</p>Saman Amarasinghe is a Professor in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL), where he leads the Commit compiler group. Under his leadership, the Commit group has developed a wide range of innovative programming languages and compilers, including StreamIt, StreamJIT, PetaBricks, Halide, TACO, Finch, Systec, GraphIt, Simit, MILK, Cimple, BioStream, NetBlocks, BREeze, CoLa, Shim, AskIt, and Seq. Additionally, the group has created compiler and runtime frameworks such as DynamoRIO, Helium, Tiramisu, Codon, BuildIt, and D2X as well as tools for vectorization like Superword Level Parallelism (SLP), goSLP, and VeGen. Saman’s team also developed Ithemal, a machine-learning-based performance predictor, Program Shepherding to protect programs from external attacks, the OpenTuner extendable autotuner, and the Kendo deterministic execution system. He was also co-leader of the Raw architecture project. Outside academia, Saman has co-founded several companies, including Determina, Lanka Internet Services Ltd., Venti Technologies, DataCebo, and Exaloop. He earned his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D. from Stanford University in 1990 and 1997, respectively. He is also a Fellow of the ACM.
Computers are widely utilized across various disciplines. By developing increasingly more powerful and efficient machines, we can foster innovation and have a broad impact across many other communities. This talk overviews our own perspective and approach to enabling more efficient hardware/software stacks for deep learning. It will focus on our most recent efforts to enable more efficient training and inference through datatype learning and the optimized encoding of information during data transfers. Rather than guessing what good datatypes can be (the current state-of-the-art practice), our methods learn them. Further, rather than storing data as-is, our methods efficiently pack them using fewer bits without loss of information. We will also comment on what challenges lie ahead in this specific field.
Moving beyond the specific work of our group, I will reflect on how dissemination and funding practices and perspectives impact academic innovation, our growth and wellbeing as researchers and practitioners, and their broader implications for future growth of our community.
Andreas Moshovos along with his students has been answering the question “what is the best possible digital computation structure/software combination to solve problem X or to run application Y?” where “best” is a characteristic (or combination thereof) such as power, cost, complexity, etc. Much of his earlier work has been on high-performance processor and memory system design and it has influenced commercial designs. His more recent work has been on hardware/software acceleration methods for machine learning. He has been with the University of Toronto since 2000, but has also taught at Ecole Polytechnique Fédérale de Lausanne, Northwestern University, University of Athens, and the Hellenic Open University. He was the Scientific Director of the Canadian NSERC COHESA Research Network targeting machine learning optimizations, a consortium of 25+ research groups and industry partners.
Conference Papers:
ACM SRC:
Conference: October 13–16, 2024