From WorkshopOnDeterministicMultiprocessingAndParallelProgramming
Jump to: navigation, search

Rethinking Parallel Execution for Multicore Processors

Guri Sohi, University of Wisconsin

Parallel processing has finally come into the mainstream of computing as almost every processing chip going forward will have multiple processing cores. Software which is typically written as a sequential program, with the intention of running on a uniprocessor, now has to be written to run correctly and efficiently on multiple processor cores. This is a decades old problem, and several decades of research have been invested in trying to find solutions but success has been very limited. New ways about achieving parallel execution are needed in order to easily make use of multicore processors.

The prevailing wisdom, based upon many decades of experience, says that a parallel program representation is required to achieve a parallel execution on multiple processors. This talk will challenge that decades-old wisdom and present a new paradigm for achieving parallel execution. This paradigm, which we call Dynamic Serialization, achieves parallel execution with a sequential program representation using an ordinary object-oriented programming language (C++) on stock commercial hardware using an ordinary stock compiler. Unlike other approaches that are being explored, such as thread-level speculation and transactional memory, dynamic serialialization currently employs no speculation in either hardware or software. This talk will present experimental results gathered on real machines (AMD Barcelona, Intel Nehalem, Sun Niagara) showing that the new paradigm can achieve parallel execution speedups comparable to traditional parallel execution techniques, but can do so with a sequential program representation that does not suffer from the challenges and drawbacks of a parallel representation.

Simple Read/Write Effect Typing in C#

Joe Duffy, Microsoft

This talk presents a simple imperative type system extension that make effects known to the Common Language Runtime (CLR)'s type system. The motivation for this work is enabling deterministic and provably race-free concurrent programming statically, by construction, with a model that spans data parallel and message passing programming styles. By fusing "effects" and "regions" using simple and natural object oriented concepts, it has been possible to seamlessly add this support to an existing programming language and framework: namely, C# and the .NET Framework. And what's better, it allows parallelism to be extracted often without the programmer knowing it, in constructs like data comprehensions and library routines.

BIO: Joe Duffy is an architect in the technical strategy group at Microsoft, where his main focus is on concurrent languages, runtimes, and operating systems. Previously he led the creation of Parallel Language Integrated Query (PLINQ), Parallel Extensions to the .NET Framework, and software transactional memory on the Common Language Runtime (CLR) team. His latest book, Concurrent Programming on Windows (Addison-Wesley), was published in November 2008, and his next, Notation & Thought: A Brief History of Programming Languages, is well underway. When not working, Joe enjoys studying the history of mathematics, dabbling in various branches of abstract algebra, and, most of all, spending time and traveling with his fiancée Kim.

Disciplined Support for Deterministic and Non-deterministic Parallelism

VIkram Adve, University of Illinois

Simple Architectural Support for Determinism

Josep Torrellas, University of Illinois"

Machine architecture will play a role in the push for determinism. It may take the more expensive approach of enforcing determinism, or may follow the cheaper approach of encouraging or even just checking for determinism. This talk will explore some alternatives, focusing on the latter approaches.

Grace: Safe Multithreaded Programming for C/C++

Emery Berger, University of Massachusetts"

The shift from single to multiple core architectures means that programmers must write concurrent, multithreaded programs in order to increase application performance. Unfortunately, multithreaded applications are susceptible to numerous errors, including deadlocks, race conditions, atomicity violations, and order violations. These errors are notoriously difficult for programmers to debug.

This talk presents Grace, a software-only runtime system designed to enforce the correctness of a class of multithreaded programs: those based on fork-join parallelism. By turning threads into processes, leveraging virtual memory protection, and imposing a sequential commit protocol, Grace provides programmers with the appearance of deterministic, sequential execution, while taking advantage of available processing cores to run code concurrently and efficiently. Experimental results demonstrate Grace's effectiveness: with modest code changes (1-16 lines) across a suite of computationally-intensive benchmarks, Grace can achieve high scalability and performance while enforcing correct execution.

Deterministic Shared Memory Multiprocessing

Joseph Devietti, University of Washington

urrent shared memory multicore and multiprocessor systems are nondeterministic. This frustrates debugging and limits the ability to properly test multithreaded code, becoming a major stumbling block to the much-needed widespread adoption of parallel programming. In this paper we make the case for fully deterministic shared memory multiprocessing (DMP). Previous approaches to coping with nondeterminism in multithreaded programs have focused on replay, a technique useful only for debugging. In contrast, while DMP systems are directly useful for debugging by offering repeatability by default, we argue that parallel programs should execute deterministically in the field as well. We show that determinism can be provided with little performance cost on future hardware.

CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution

Tom Bergan, University of Washington

This talk presents CoreDet, a software-only system that enforces full execution-level determinism of arbitrary multithreaded C and C++ programs. CoreDet builds on techniques developed by the deterministic multiprocessing (DMP) project, but unlike DMP, CoreDet does not require any special hardware. In addition, we have developed the first deterministic execution protocol that uses a relaxed (non-sequentially consistent) memory model. Our empirical evaluation shows that CoreDet imposes some overhead but can achieve scalability comparable to nondeterministic execution.

Concurrent Collections (CnC): a deterministic parallel language

Kathleen Knobe, Intel

Experience of the last few decades indicates that parallel programming is hard. The problem is only getting worse as the set of domain experts that must deal with parallelism is expanding rapidly. A common approach is to find a few parallelism constructs to add to a serial language. The hope is that these constructs are simple enough for the domain expert to understand and yet powerful enough to enable efficient execution. This compromise often results in a system that is not simple enough for the domain experts and at the same time may unnecessarily limit the flexibility for the tuning experts. Because the issues of mapping to the parallel architecture are intertwined with issues of application functionality, both activities suffer. Modification to the functionality must take care not to break the parallelism constructs and modifications to the parallelism constructs must understand the application code enough to ensure that the modifications are legal.

In Concurrent Collections (CnC) we view the problem as one of interface design. A good interface separates one level of concern from another and yet facilitates the necessary communication across that interface. In CnC, the domain expert expresses the semantics of the application as a CnC specification that has nothing to do with parallelism. The domain expert explicitly states the partial ordering requirements without common over-constraints. It is all about the application. The domain expert does not think about parallelism. The tuning expert, given these constraints, does not need to understand the domain of the application but just the ordering constraints. The two experts may be the same person at different times or in fact different people. The tuning expert might not be a person but rather a static analyzer or a runtime system. If both are people, the CnC specification can be used to raise the level of their discourse.

This talk will introduce Concurrent Collections with a focus on the deterministic aspects of the language.

Concurrency unit testing

Shaz Qadeer, Microsoft

In this talk, I will describe the progress we have made over the last three years towards addressing the problem of unit testing for concurrent software modules. I will present both dynamic and static approaches to concurrency unit testing, exemplified by the CHESS and STORM tools respectively.

The XMOS architecture and the XC language

David May, University of Bristol

I will outline the XMOS architecture for multi-threaded multi-core systems - along with the XC concurrent programming language. XC is a variant of C extended for concurrency, communication, timing and input-output. We compile concurrent XC programs directly to the instruction set of our XCore processors; the cost of communications, synchronisations, inputs and outputs is reduced to that of loads, stores, branches and arithmetic!

XC is based on CSP and occam; it supports deterministic multi-processing and uses guards to control non-deterministic event-handling. Beyond this, the XMOS architecture is deterministic in a timing sense - our tools can predict exactly how long paths through the program will take - so we can guarantee the correct behaviour of real-time programs.

I will also outline some ideas I have for further language, compiler and architecture enhancements especially in the representation and implementation of communication and timing.

Safe Parallel programming in a sequential Language

Vijay Saraswat. IBM Research

I will present a set of techniques powerful enough to establish compile-time safety for a large class of high-performance parallel programs, particularly those dealing with (mutable, distributed) arrays. A safe parallel program is one which has the same semantics as its serial elision -- hence is scheduler-determinate.This means that in fact the parallel program can be presented as a sequential program augmented with concurrency refactorings (e.g. @transform(f), where f is a unitary transform on a loop nest) which may be thought of as introducing structured concurrency constructs (finish, async) in a semantics-preserving way.

The set of techniques presented rely on a modular constraint-based static analysis which extends the constrained types framework for OO languages introduced in OOPSLA 08 to constrained effects.