CATALOG DESCRIPTION: Introduction to parallel computing for scientists and engineers. Shared memory parallel architectures and programming, distributed memory, message-passing data-parallel architectures, and programming.

REQUIRED TEXTS: A. Grama, A. Gupta, G. Karypis, and Vipin Kumar, Introduction to Parallel Computing , Addison Wesley, 2 nd edition, 2003

REFERENCE TEXTS:

•  I. Foster, Designing and Building Parallel Programs, Addison Wesley 1995.

•  B. Bauer, Practical Parallel Programming , Academic Press 1992.

•  W. Gropp et al, Using MPI: Portable Parallel Programming with the Message Passing Interface , MIT Press 1994.

•  C. Koelbel et al, The High-Performance Fortran Handbook , MIT Press 1994.

COURSE COORDINATOR: Prof. Gokhan Memik

COURSE GOALS: To provide an introduction to the field of parallel computing. The goals are to provide an overview of the three basic types of parallel computing: shared memory, distributed memory message-passing, and data parallel computing, with hands-on experience with real parallel programming on actual parallel machines.

PREREQUISITES : EECS 361 and EECS 230 or EECS 231 or EECS 211

PREREQUISITES BY TOPIC:

  1. An overview of computer architecture
  2. Basic concepts of processors, ALUs, memories, caches, input-output
  3. Basic to intermediate concepts on programming of serial computers using C or Fortran
  4. Simple concepts of data structures like arrays and link lists in programs
  5. Some knowledge of scientific and engineering applications

DETAILED COURSE TOPICS:

Week 1 : Introduction to parallel computing: motivation for parallel computing, options of parallel computing, economics of parallel computing, basic concepts of parallel algorithms. Introduction to parallel programming: data and task parallelism, coarse and fine grain parallelism, performance of parallel programs, load balancing and scheduling, analysis of simple parallel programs.

Week 2 : Overview of shared memory parallel architectures: memory organization, interconnect organization, cache coherence, case studies of machines such as SGI Challenge, IBM J-30, HP/Convex Exemplar. Introduction to shared memory parallel programming: shared memory model, process creation and destruction, mutual exclusion, locks, barriers.

Week 3: Explicit shared memory programming: loop scheduling, static and dynamic, loop parallelization strategies. Shared memory parallel programming: use of PTHREADS libraries, case studies of explicit parallel programming, computation of PI, matrix multiplication, solution of partial differential equations.

Week 4 : Implicit shared memory parallel programming: use of compiler directives for parallel programming, DOALL and DOACROSS and PRAGMA directives for loop level parallelism, parallel programming examples using directives.

Week 5 : Distributed memory multicomputer architectures: overview of distributed memory parallel machines, message passing schemes, store and forward versus wormhole routing, interconnection networks, case studies of parallel machines such as Intel Paragon, IBM SP-2, Thinking Machine CM-5. Global Communication operations in distributed memory machines: one-to-all broadcast, reduction, shift, scatter, gather operations, analysis of performance of above operations on various parallel architectures.

Week 6 : Introduction to message-passing programming: basics of message passing, global and local addresses, single-program multiple data (SPMD programs) introduction to Message Passing Interface (MPI). Intermediate concepts in message passing programming: global and local addresses, loop scheduling for parallel loops. Advanced message-passing concepts: topologies, and decompositions, case studies of example applications, matrix multiplication, solution of partial differential equations.

Week 7 : Introduction to SIMD parallel architectures: Single-instruction multiple data stream architectures, control and data units, interconnection networks, case studies of machines such as Thinking Machines CM-2, CM-5 and Masspar MP-2. Introduction to data parallel programming: Fortran-90, array sections, array operations, array intrinsic operations.

Week 8 : Introduction to High Performance Fortran (HPF): FORALL directives, INDEPENDENT directives, simple parallel programs. High Performance Fortran data distribution and alignment directives, simple parallel programming examples, matrix multiplication, solution of partial differential equations.

Week 9 : Methodology for Parallel Algorithm Design: concurrency, locality, scalability, modularity; partitioning, agglomeration, communication, mapping, performance analysis of parallel algorithms. Parallel Matrix Algorithms: matrix representations, parallel dense matrix operations, matrix-vector, matrix-matrix multiplication, solutions of linear system of equations.

Week 10 : Parallel Sparse Matrix Solvers: sparse matrix representations, parallel iterative methods, parallel direct methods. Parallel Search Algorithms: optimization methods, parallel best first search, parallel depth-first search, speedup anomalies.

COMPUTER USAGE: Students get hands-on parallel programming experience on 3 parallel machines at the Electrical and Computer Engineering Department, including a 16 processor IBM SP-2 distributed memory machine, an 8 processor IBM J-40 shared memory machine, and an 8-processor SGI Origin 2000 distributed shared memory multiprocessor. In addition, students will use the machines in the Wilkinson Lab as a cluster as well as access a dedicated cluster with 16 processors for the final project.

HOMEWORK ASSIGNMENTS:

Homework 1: Design problems dealing with shared memory parallel programming, examples of program transformations to parallelize loops, use of explicit and implicit parallel programs using both the SGI directives and PTHREADS.

Homework 2: Design problems dealing with distributed memory message-passing parallel programming, use of MPI, analysis of communication patterns.

Homework 3: Design programs related to data parallel programming, use of High Performance Fortran, data layouts and alignments.

Homework 4: Design of parallel algorithms for various problems including matrix operations on dense and sparse matrices, analysis of parallel algorithms.

LABORATORY PROJECTS:

Lab 1: Development of a parallel program for solving a set of linear system of equations using Gaussian Elimination using both explicit parallel programs with PTHREADS and implicit parallel programs using SGI directions, experiments on SGI Origin 2000 and IBM J-40 shared memory multiprocessors.

Lab 2: Development of a parallel program for solving a set of linear system of equations using Gaussian Elimination using message-passing parallel programming using C or Fortran with MPI message passing, and experiments on IBM SP-2 distributed memory and SGI Origin 2000 multiprocessor for portability.

Lab 3: Development of a parallel program for solving a set of linear system of equations using Gaussian Elimination using data parallel programming with High Performance Fortran and experiments on the IBM SP-2 distributed memory multiprocessor.

GRADES:

  • Four homeworks - 20 %
  • Three labs - 20 %
  • Midterm exam - 30%
  • Final exam - 30%

COURSE OBJECTIVES: When a student completes this course, s/he should be able to:

  1. Solve a given problem using parallel computing. Analyze the problem for various ways of parallelization, and design the best parallel algorithm.
  2. Have a broad understanding of shared memory parallel architectures and programming.
  3. Design a shared memory parallel program for a given parallel algorithm using both explicit and implicit parallel programming, measure real speedups, identify bottlenecks, and devise improvements to the parallel program.
  4. Have a broad understanding of distributed memory parallel architectures and programming.
  5. Design a message-passing distributed memory parallel program for a given parallel algorithm using the portable Message-Passing Interface (MPI), measure real speedups, identify bottlenecks, and devise improvements to the parallel program.
  6. Have a broad understanding of data parallel architectures and programming.
  7. Design a data parallel program for a given parallel algorithm using High Performance Fortran (HPF), measure real speedups, identify bottlenecks, and devise improvements to the parallel program.

ABET CONTENT CATEGORY: 100% Engineering (Design component).