Challenges and Cross-Paradigm Fertilization
in Automatic Parallelization

Manuel Hermenegildo
herme@fi.upm.es
http://www.clip.dia.fi.upm.es/~herme/
Computer Science Department
Technical University of Madrid (UPM), Spain

Multiprocessing hardware is already available which offers significant advantages in either performance or cost/performance over uniprocessors. For example, departmental servers using fast, inexpensive off-the-shelf processors, are being produced at a fraction of the cost of the mainframes they replace, and even multiprocessor workstations are now relatively commonplace. Faster and more ubiquitous networks increase the potential of exploiting distributed execution.

One of the recurring facts that hamper the progress of widespread use of parallelism is that in practice, beyond some manually parallelized high volume applications and scientific codes, still comparatively few programs are written or transformed to exploit parallelism. The traditional explanation that parallelization is a difficult and error-prone task (see, e.g., [14]) seems to remain valid, and still points to the necessity of improving the tools used in the process. This includes developing languages that offer better support for parallel programming, improved libraries for supporting parallel programming on conventional languages, and significant progress in support tools, from parallelizing compilers to performance analyzers.

While manual parallelization may of course always have a place, parallelizing compilers are interesting in that they at the minimum have the potential to dramatically lessen the parallelization burden. However, despite much progress, it appears that significant challenges still remain in the area of automatic parallelization, including dealing well with both regular and irregular computations, performing efficient partitioning for both types of computations, automatically changing data structures for more efficient exploitation of parallelism, and developing parallelization techniques for new, more complex programming paradigms.

The goal of developing effective parallelizing compilers is being sought after in parallel and somewhat independently in the context of different programming paradigms or even individual languages. One of the points of this paper is to argue that, despite the apparent differences among imperative, functional, logic, and object oriented languages, the fundamental issues being tackled are quite similar and progress towards more effective parallelizing compilers for all programming paradigms can be sped up by cross fertilization. In order to elaborate this point we will center on two well known programming paradigms on which extensive research on automatic parallelization has been performed: imperative programming (e.g., in the context of FORTRAN) and logic (and constraint) programming.

Some very significant progress has been made in parallelizing compilers for regular, numerical computations, generally based on the FORTRAN language (see, e.g., [2]). This research has resulted in well known concepts and techniques including a well understood notion of independence (based on the Bernstein conditions or more recent notions of ``semantic independence''), sophisticated syntactic loop transformations, other transformations based on polytope models, extensive work on partitioning and placement, etc. On the other hand, the applicability of these techniques has remained quite limited for irregular or symbolic computations, and few systems deal with parallelization of procedure calls. Also, the techniques often rely on the relative cleanliness of traditional FORTRAN as a programming language and additional work is needed in order to extend them to other common languages like C or C++. These languages include features such as complex data structures and pointer manipulation which complicate determining independence among statements or procedure calls.

In this context logic programming and constraint programming languages [13] offer a quite interesting case study for automatic parallelization. On the one hand, these programming paradigms pose significant challenges to the parallelization task, which appear to relate closely to the more difficult challenges faced in imperative language parallelization. Such challenges include highly irregular computations and dynamic control flow (due to the symbolic nature of many of their applications), non-trivial notions of (semantic) independence, meta-programming (higher order), and the presence of dynamically allocated, complex data structures containing (well behaved) pointers. In logic languages pointers are represented by logical variables, which allow creating, for example, incomplete data structures (which can be filled in at later points in the execution) or structures which include pointers to other structures.

On the other hand, due to their high-level nature these languages also facilitate the study of parallelization issues. For example, logical variables are actually ``well behaved'' pointers, in the sense that no castings or pointer arithmetic other than array indexing is allowed (in this way, pointers in these languages have similar behavior to those used, for example, in ``clean'' versions of C or, to a lesser extent, in Java). In addition, similarly to functional languages, logic and constraint languages encourage coding in a way which expresses computations (i.e., the intended algorithm, which can thus be efficiently implemented) while at the same time using formulations which remain close to specifications, and make the parallelism in the problem much more accessible to the compiler. The relatively clean semantics of these languages also makes it comparatively easy to use formal methods and prove the transformations performed by the parallelizing compiler both correct (in terms of computed outputs) and efficient (in terms of computational cost). Functional programming is another paradigm which also generally facilitates exploitation of parallelism. In fact, the lack of certain features, such as pointers and nondeterminism, makes the parallelization problem easier. On the other hand the absence of such features also precludes in principle studying some interesting problems.

Also quite significant progress has been made in the past decade in the area of automatic program parallelization for logic programs [6] and some of the challenges have been tackled quite effectively. As a result, robust, publicly available compilers for the Prolog language exist which automatically exploit parallelism in complex applications. The capabilities of such compilers include for example automatically exploiting the parallelism stemming from Prolog's built-in search (``or-parallelism'') [16, 1, 9]. While determining independence of tasks is trivial in this case, quite interesting dynamic scheduling and granularity control techniques have been developed for these systems. Perhaps more interestingly for our discussion, parallelizing compilers and run-time systems are available which exploit the parallelism arising from independent computations [12, 17]. This corresponds to traditional parallelism, and is called ``and-parallelism'' in this context. Interesting developments in this area include quite sophisticated, incremental inter-procedural analyzers for detecting sharing patterns among complex data structures containing pointers (see, for example [4]), based on the standard technique of abstract interpretation [7]. These analyzers also derive other important properties such as variable instantiation state, determinism, non-failure, and cardinality of answers, and the parallelizers that use this information have been able to effectively parallelize application class programs. The speed and robustness of these compilers has demonstrated that abstract interpretation provides a very adequate framework for developing provably correct, powerful, and efficient global analyzers and, consequently, parallelizers. In addition, significant progress has also been made in run-time systems including dynamic task allocation, distributed dynamic memory management, and parallel implementation of backtracking search.

While, as mentioned above, much progress has been made in the context of logic programming in the parallelization of irregular computations, the techniques developed are weaker than those developed in the context of numerical computing for regular problems. Logic programming parallelizers can discover the parallelism in complex recursive traversals of data structures, but do not handle well traversals that are based on integer (i.e., array subscript) arithmetic, for which much work exists in the area of imperative languages. It thus appears that it would be quite interesting to merge the complementary work done in these areas by the different communities. Some progress has been made in this direction in the context of ``data parallelism'' [3, 11], but it still seems like a very promising avenue for future research.

Another important area which has also received a significant amount of attention in the context of logic programming parallelization is granularity control (task partitioning). Controlling the granularity of tasks to be executed in parallel is simply an optimization in machines with a fast interconnection switch, such as shared memory multiprocessors, but a necessity when parallelizing for machines with slower interconnections. The problem is challenging because in the logic programming context the tasks being parallelized are often procedure calls whose computational cost greatly depends on dynamic characteristics of the input data. One of the solutions currently used in logic programming is to derive at compile time complexity cost functions which give upper and lower bounds on task execution time as a function of certain measures of input data [8, 15]. Such measures include procedure argument values as well as more complex ones such as length or depth of the structures being passed as arguments. Programs are transformed at compile-time into equivalent counterparts which automatically control granularity at run-time based on such functions. The performance improvements resulting from the incorporation of this type of grain size control have been shown to be quite good, specially for systems with medium to large parallel execution overheads. However, while current systems are reasonably good at dealing with tasks with dynamic costs, the techniques currently used are again comparatively weaker for the static case than the partitioning algorithms used in imperative program parallelization. Again, cross-fertilization looks like an interesting avenue for further progress. Ideally, a parallelizing compiler should perform optimal partitioning and placement for any kind of architecture, using dynamic techniques when unavoidable.

Finally, we would like to point to the parallelization of the very powerful framework of constraint programming as an interesting area for future work. The growing acceptance of this programming paradigm is underlined by significant industrial use of constraint logic programming systems [13] and also of libraries which provide constraint solving capabilities (through interfaces with constraint programming systems) in conventional languages such as C++. Constraint logic programming extends the high-level programming paradigm that logic programming offers in symbolic applications to numerical domains and offers a natural platform in which to study the merging of the parallelization techniques used in the numerical and symbolic programming fields. Such merging, for example, may entail studying new concepts of independence [10]. In fact, concurrent constraint programming languages already exist which can be used as a compilation target and which bring, among other things, a very simple and powerful synchronization mechanism based on constraint entailment [18]. Also, these systems, due to their combination of features, appear to be well suited as tools for the distributed/network-wide form of programming which may become mainstream in the coming years [5].

References

1
K. A. M. Ali and R. Karlsson. The Muse Or-Parallel Prolog Model and its Performance. In 1990 North American Conference on Logic Programming, pages 757-776. MIT Press, October 1990.

2
D. Bacon, S. Graham, and O. Sharp. Compiler Transformations for High-Performance Computing. Computing Surveys, 26(4):345-420, December 1994.

3
J. Bevemyr, T. Lindgren, and H. Millroth. Reform Prolog: the language and its implementation. In Proc. 10th Intl. Conf. Logic Programming, Cambridge, Mass., 1993. MIT Press.

4
F. Bueno, M. García de la Banda, and M. Hermenegildo. Effectiveness of Global Analysis in Strict Independence-Based Automatic Program Parallelization. In International Symposium on Logic Programming, pages 320-336. MIT Press, November 1994.

5
D. Cabeza and M. Hermenegildo. Distributed Concurrent Constraint Execution in the CIAO System. In Proc. of the 1995 COMPULOG-NET Workshop on Parallelism and Implementation Technologies, Utrecht, NL, September 1995. U. Utrecht / T.U. Madrid. Available from http://www.clip.dia.fi.upm.es/.

6
J. Chassin and P. Codognet. Parallel Logic Programming Systems. Computing Surveys, 26(3):295-336, September 1994.

7
P. Cousot and R. Cousot. Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Fourth ACM Symposium on Principles of Programming Languages, pages 238-252, 1977.

8
S. K. Debray, N.-W. Lin, and M. Hermenegildo. Task Granularity Analysis in Logic Programs. In Proc. of the 1990 ACM Conf. on Programming Language Design and Implementation, pages 174-188. ACM Press, June 1990.

9
European Computer Research Center. Eclipse User's Guide, 1993.

10
M. García de la Banda, M. Hermenegildo, and K. Marriott. Independence in Constraint Logic Programs. In 1993 International Logic Programming Symposium, pages 130-146. MIT Press, Cambridge, MA, October 1993.

11
M. Hermenegildo and M. Carro. Relating Data-Parallelism and And-Parallelism in Logic Programs. In Proceedings of EURO-PAR'95, number 966 in LNCS, pages 27-42. Springer-Verlag, August 1995.

12
M. Hermenegildo and K. Greene. The &-Prolog System: Exploiting Independent And-Parallelism. New Generation Computing, 9(3,4):233-257, 1991.

13
J. Jaffar and M.J. Maher. Constraint Logic Programming: A Survey. Journal of Logic Programming, 13/20:503-581, 1994.

14
A.H. Karp and R.C. Babb. A Comparison of 12 Parallel Fortran Dialects. IEEE Software, September 1988.

15
P. López García, M. Hermenegildo, and S.K. Debray. A Methodology for Granularity Based Control of Parallelism in Logic Programs. Journal of Symbolic Computing, Special Issue on Parallel Symbolic Computation, 1996. In press.

16
E. Lusk et. al. The Aurora Or-Parallel Prolog System. New Generation Computing, 7(2,3), 1990.

17
E. Pontelli, G. Gupta, and M. Hermenegildo. &ACE: A High-Performance Parallel Prolog System. In International Parallel Processing Symposium. IEEE Computer Society Technical Committee on Parallel Processing, IEEE Computer Society, April 1995.

18
E. Tick. The Deevolution of Concurrent Logic Programming Languages. The Journal of Logic Programming, 23(1-3):89-125, 1995.