boost logo

Effects of Metaprogramming Style on Compilation Time

By David Abrahams and Carlos Pinto Coelho

Introduction

As C++ programmers begin to rely on template metaprogramming constructs[1] for high performance computing, they often find that the time required to compile their programs grows unacceptably. Because fast compiles are an important part of an effective code/debug/test cycle, it becomes important to optimize not only a program's runtime performance, but its compile-time performance as well. This can be especially true in a research environment, where it is important to be able to try out new ideas quickly.

Unfortunately, since the template instantiation mechanism of most compilers is undocumented, it is far from clear which constructs will result in faster compilation. Some things seem as though they should obviously improve things, for example, reducing the number of distinct types generated, and reducing the overall size of the names of those types. Beyond that, however, most metaprogrammers are ``flying blind''. Some will go out of their way to make ``optimizations'' which don't, in fact, produce the expected result. During a large metaprogramming project, the authors repeatedly found themselves faced with decisions about how to structure their program for compilation speed. This paper presents the results of a few experiments which were able to help.

Are Template Instantiations Cached?

The question which kept arising during this project and which motivated the experiments was this: does the compiler would repeat the work of deducing types each time a particular template instantiation is encountered, or is the template instantiation ``cached''? In particular, we were wondering whether it was costly to repeat the same O(N2) compile-time sort within a translation-unit, the alternative being to somehow ``store'' the sorted result in a typedef.

For a simpler example, the following contrived program reuses the same tuple[2,3] instantiation REPEATS times:

#include <utility>

// intlist<N> is a tuple of N ints.
template <int n>
struct intlist
{
    typedef std::pair<int,typename intlist<n-1>::type> type;
};
template <> // specialization for zero terminates recursion
struct intlist<0>
{
    typedef null type;
};

// repeat<T,N> is a tuple of N tuples of 30 ints
// T is unused
template <class T, int n>
struct repeat
{
    typedef repeat<T,n-1> prev;
    typedef typename intlist<(n == 0?0:30)>::type sublist;
    typedef std::pair<sublist,typename prev::list> list;
};

template <class T>
struct repeat<T,0>
{
    typedef null sublist;
    typedef std::pair<null,null> list;
};

int main()
{
    repeat<int,REPEATS>::list x;
    return sizeof(x);
}

The authors used the program above to run their first experiment. The strategy was to measure the compilation time with increasingly large values of REPEATS, and to graph the results. The presumption was that if the instantiation of intlist<30> were cached, compilation times should increase linearly with the number of REPEATS. The experiment was performed with three compilers running under Windows 2000: Comeau 4.2.45.2 (which uses the EDG front-end), GCC 3.0.1, built under Cygwin, and Metrowerks CodeWarrior Pro7. All experiments were performed with debug symbol generation disabled, and full optimization and inlining enabled (because the program generates almost no code, optimization and inlining should have little or no effect on compilation time). The results of this experiment are shown below.

graph#1

Clearly, at least the times for Comeau and GCC are not growing linearly with the number of repeats! This is true for Metrowerks as well, but the difference between Comeau and Metrowerks is so extreme that the Metrowerks curve doesn't show up well in this graph.

Comparing the Compilers

In fact, the most striking thing about the graph above is the extreme difference in speed among the compilers tested. Although we had experienced these differences in the form of annoyingly long compilations with GCC, and nearly interminable compilations with Comeau, seeing the numbers before us prompted us to graph the ratio between neighboring compiler speeds:

graph#2

graph#3

Interestingly, the ratio between times for GCC and times for Metrowerks seem to be asymptotically approaching a ratio of approximately 4:1. Of course, this may be an illusion due to insufficient data, but the trend does seem to be there. It was not possible to produce more data points because Metrowerks has a hard-coded template complexity limit and it stopped compiling our program successfully after REPEATS == 64. The Comeau compiler strangely stopped generating object files (with no diagnostic!) after REPEATS == 34, but the compilation times continued to increase smoothly so we believe the data is still valid for observing its asymptotic behavior.

A Better Test

These diversions aside, we returned to the problem of determining whether or not the template instantiation was cached. The problem with the previous test was that it didn't account account for factors internal to the compiler other than the one we were measuring. For example, it might be that the compilation time increases as the square of template nesting depth. To eliminate these effects from our measurements, we changed the program slightly. Instead of using a template instantiation inside the repeat<T,n> template, we would now use a non-template-dependent typedef naming the same type:

typedef intlist<30>::type sublist_t;
template <class T, int n>
struct repeat
{
    typedef repeat<T,n-1> prev;
    typedef sublist_t sublist;
    typedef std::pair<sublist,typename prev::list> list;
};

If we measure the difference in compilation time between our two programs, we should be able to isolate the ``reinstantiation overhead''. The graph of that difference appears below:

graph#4

The overhead for the Metrowerks compiler hovers roughly around zero, so it is safe to conclude that it is ``caching'' our template instantiations. What's really remarkable, however, is that GCC seems to do better with ``reinstantiation'' than it does with the typedef, and the gap continues to grow with the problem size! We have no explanation for this behavior, although a curious and dedicated person could undoubtedly find one by looking at the GCC source.

Reducing Symbol Size

Our final experiment was to measure the result of reducing the symbol size by ``cutting off'' the instantiation of complex types. Users of generic libraries ultimately end up writing concrete instantiations of the templates which the libraries provide. Often, especially if the template only has a role at compile-time, it is possible to declare the instantiation two ways. For example, if intlist is just being used as a type list[4], the following two declarations are equivalent for most practical purposes:

typedef intlist<10>::type my_list_1;
struct my_list_2 : intlist<10>::type {};
The actual type of my_list_1, however, looks something like this:
std::pair<int,std::pair<int,std::pair<int,std::pair<int,std::pair<int
   ,std::pair<int,std::pair<int,std::pair<int,std::pair<int,std::pair<int,
   intlist<0>::null> > > > > > > > > >;
whereas the type of my_list2 is just:
struct my_list2

To measure the effect of ``cutting off'' template instantiation, we made a further small change to our test program:

// check the savings due to cutting off symbol depth by using a derived struct
struct sublist_t : intlist<30>::type {};
The results of that experiment are shown below:

graph#5

The graph shows the expected improvement, indicating that where possible, it is useful to take advantage of this technique.

Conclusions

We have shown that, at least on the compilers tested, once a type which requires a complex template instantiation has been computed, there is no cost to ``computing it again''. However the results here are only a beginning: there are still many questions worth answering about metaprogram efficiency. More importantly, we believe we have shown some useful experimental techniques for answering those questions.


References

[1] T. Veldhuizen, "Using C++ template metaprograms," C++ Report Vol. 7 No. 4 (May 1995), pp. 36-43.

[2] Jaakko Järvi: ML-style Tuple Assignment in Standard C++ - Extending the Multiple Return Value Formalism, TUCS Technical Report No 267, April 1999.

[3] Jaakko Järvi: Tuples and Multiple Return Values in C++, TUCS Technical Report No 249, March 1999.

[4] Andrei Alexandrescu, Modern C++ Design Addison Wesley 2001


Revised 19 September 2001

© Copyright David Abrahams and Carlos Pinto Coelho, 2001