EECS 358 - Introduction to Parallel Computing
CATALOG DESCRIPTION: Introduction to parallel computing for scientists and engineers. Shared memory parallel architectures and programming, distributed memory, message-passing data-parallel architectures, and programming.
REQUIRED TEXTS: A. Grama, A. Gupta, G. Karypis, and Vipin Kumar, Introduction to Parallel Computing , Addison Wesley, 2 nd edition, 2003
• I. Foster, Designing and Building Parallel Programs, Addison Wesley 1995.
• B. Bauer, Practical Parallel Programming , Academic Press 1992.
• W. Gropp et al, Using MPI: Portable Parallel Programming with the Message Passing Interface , MIT Press 1994.
• C. Koelbel et al, The High-Performance Fortran Handbook , MIT Press 1994.
COURSE COORDINATOR: Prof. Gokhan Memik
COURSE GOALS: To provide an introduction to the field of parallel computing. The goals are to provide an overview of the three basic types of parallel computing: shared memory, distributed memory message-passing, and data parallel computing, with hands-on experience with real parallel programming on actual parallel machines.
PREREQUISITES : EECS 361 and EECS 230 or EECS 231 or EECS 211
PREREQUISITES BY TOPIC:
- An overview of computer architecture
- Basic concepts of processors, ALUs, memories, caches, input-output
- Basic to intermediate concepts on programming of serial computers using C or Fortran
- Simple concepts of data structures like arrays and link lists in programs
- Some knowledge of scientific and engineering applications
DETAILED COURSE TOPICS:
Week 1 : Introduction to parallel computing: motivation for parallel computing, options of parallel computing, economics of parallel computing, basic concepts of parallel algorithms. Introduction to parallel programming: data and task parallelism, coarse and fine grain parallelism, performance of parallel programs, load balancing and scheduling, analysis of simple parallel programs.
Week 2 : Overview of shared memory parallel architectures: memory organization, interconnect organization, cache coherence, case studies of machines such as SGI Challenge, IBM J-30, HP/Convex Exemplar. Introduction to shared memory parallel programming: shared memory model, process creation and destruction, mutual exclusion, locks, barriers.
Week 3: Explicit shared memory programming: loop scheduling, static and dynamic, loop parallelization strategies. Shared memory parallel programming: use of PTHREADS libraries, case studies of explicit parallel programming, computation of PI, matrix multiplication, solution of partial differential equations.
Week 4 : Implicit shared memory parallel programming: use of compiler directives for parallel programming, DOALL and DOACROSS and PRAGMA directives for loop level parallelism, parallel programming examples using directives.
Week 5 : Distributed memory multicomputer architectures: overview of distributed memory parallel machines, message passing schemes, store and forward versus wormhole routing, interconnection networks, case studies of parallel machines such as Intel Paragon, IBM SP-2, Thinking Machine CM-5. Global Communication operations in distributed memory machines: one-to-all broadcast, reduction, shift, scatter, gather operations, analysis of performance of above operations on various parallel architectures.
Week 6 : Introduction to message-passing programming: basics of message passing, global and local addresses, single-program multiple data (SPMD programs) introduction to Message Passing Interface (MPI). Intermediate concepts in message passing programming: global and local addresses, loop scheduling for parallel loops. Advanced message-passing concepts: topologies, and decompositions, case studies of example applications, matrix multiplication, solution of partial differential equations.
Week 7 : Introduction to SIMD parallel architectures: Single-instruction multiple data stream architectures, control and data units, interconnection networks, case studies of machines such as Thinking Machines CM-2, CM-5 and Masspar MP-2. Introduction to data parallel programming: Fortran-90, array sections, array operations, array intrinsic operations.
Week 8 : Introduction to High Performance Fortran (HPF): FORALL directives, INDEPENDENT directives, simple parallel programs. High Performance Fortran data distribution and alignment directives, simple parallel programming examples, matrix multiplication, solution of partial differential equations.
Week 9 : Methodology for Parallel Algorithm Design: concurrency, locality, scalability, modularity; partitioning, agglomeration, communication, mapping, performance analysis of parallel algorithms. Parallel Matrix Algorithms: matrix representations, parallel dense matrix operations, matrix-vector, matrix-matrix multiplication, solutions of linear system of equations.
Week 10 : Parallel Sparse Matrix Solvers: sparse matrix representations, parallel iterative methods, parallel direct methods. Parallel Search Algorithms: optimization methods, parallel best first search, parallel depth-first search, speedup anomalies.
COMPUTER USAGE: Students get hands-on parallel programming experience on 3 parallel machines at the Electrical and Computer Engineering Department, including a 16 processor IBM SP-2 distributed memory machine, an 8 processor IBM J-40 shared memory machine, and an 8-processor SGI Origin 2000 distributed shared memory multiprocessor. In addition, students will use the machines in the Wilkinson Lab as a cluster as well as access a dedicated cluster with 16 processors for the final project.
Homework 1: Design problems dealing with shared memory parallel programming, examples of program transformations to parallelize loops, use of explicit and implicit parallel programs using both the SGI directives and PTHREADS.
Homework 2: Design problems dealing with distributed memory message-passing parallel programming, use of MPI, analysis of communication patterns.
Homework 3: Design programs related to data parallel programming, use of High Performance Fortran, data layouts and alignments.
Homework 4: Design of parallel algorithms for various problems including matrix operations on dense and sparse matrices, analysis of parallel algorithms.
Lab 1: Development of a parallel program for solving a set of linear system of equations using Gaussian Elimination using both explicit parallel programs with PTHREADS and implicit parallel programs using SGI directions, experiments on SGI Origin 2000 and IBM J-40 shared memory multiprocessors.
Lab 2: Development of a parallel program for solving a set of linear system of equations using Gaussian Elimination using message-passing parallel programming using C or Fortran with MPI message passing, and experiments on IBM SP-2 distributed memory and SGI Origin 2000 multiprocessor for portability.
Lab 3: Development of a parallel program for solving a set of linear system of equations using Gaussian Elimination using data parallel programming with High Performance Fortran and experiments on the IBM SP-2 distributed memory multiprocessor.
- Four homeworks - 20 %
- Three labs - 20 %
- Midterm exam - 30%
- Final exam - 30%
COURSE OBJECTIVES: When a student completes this course, s/he should be able to:
- Solve a given problem using parallel computing. Analyze the problem for various ways of parallelization, and design the best parallel algorithm.
- Have a broad understanding of shared memory parallel architectures and programming.
- Design a shared memory parallel program for a given parallel algorithm using both explicit and implicit parallel programming, measure real speedups, identify bottlenecks, and devise improvements to the parallel program.
- Have a broad understanding of distributed memory parallel architectures and programming.
- Design a message-passing distributed memory parallel program for a given parallel algorithm using the portable Message-Passing Interface (MPI), measure real speedups, identify bottlenecks, and devise improvements to the parallel program.
- Have a broad understanding of data parallel architectures and programming.
- Design a data parallel program for a given parallel algorithm using High Performance Fortran (HPF), measure real speedups, identify bottlenecks, and devise improvements to the parallel program.
ABET CONTENT CATEGORY: 100% Engineering (Design component).