






                 Recursive Make Considered Harmful

                            _P_e_t_e_r _M_i_l_l_e_r
                      millerp@canb.auug.org.au



                              AABBSSTTRRAACCTT
         For large UNIX projects, the traditional method of
         building the project is to use recursive _m_a_k_e_.  On
         some  projects,  this results in build times which
         are unacceptably large, when all you want to do is
         change one file.    In examining the source of the
         overly long build times, it became evident that  a
         number of apparently unrelated problems combine to
         produce the delay, but on analysis  all  have  the
         same root cause.
         This paper explores a number of problems regarding
         the use of recursive _m_a_k_e_, and shows that they are
         all  symptoms  of the same problem.  Symptoms that
         the UNIX community have long accepted as a fact of
         life,  but  which  need not be endured any longer.
         These problems include recursive _m_a_k_es which  take
         ``forever'' to work out that they need to do noth-
         ing, recursive _m_a_k_es which do  too  much,  or  too
         little, recursive _m_a_k_es which are overly sensitive
         to changes in the source code and require constant
         Makefile intervention to keep them working.
         The  resolution  of these problems can be found by
         looking at what _m_a_k_e does, from first  principles,
         and  then  analyzing  the  effects  of introducing
         recursive _m_a_k_e to  this  activity.   The  analysis
         shows  that  the problem stems from the artificial
         partitioning of the build into  separate  subsets.
         This,  in  turn,  leads to the symptoms described.
         To avoid the symptoms, it  is  only  necessary  to
         avoid the separation; to use a single _m_a_k_e session
         to build the whole project, which is not quite the
         same as a single Makefile.
         This  conclusion  runs counter to much accumulated
         folk wisdom in building large  projects  on  UNIX.
         Some  of  the  main objections raised by this folk
         wisdom are examined and  shown  to  be  unfounded.
         The  results  of actual use are far more encourag-
         ing, with routine development performance improve-
         ments  significantly  faster  than  intuition  may
         indicate, and without the intuitvely expected com-
         promise of modularity.  The use of a whole project
         _m_a_k_e is not as difficult to put into  practice  as
         it may first appear.






    Peter Miller           30 August 2001                 Page 1





    AUUGN'97                   Recursive Make Considered Harmful


    11..  IInnttrroodduuccttiioonn

    For large UNIX software  development  projects,  the  tradi-
    tional  methods of building the project use what has come to
    be known as ``recursive _m_a_k_e.''  This refers to the use of a
    hierarchy  of  directories  containing  source files for the
    modules which make up the project, where each  of  the  sub-
    directories  contains  a  _M_a_k_e_f_i_l_e which describes the rules
    and instructions for the _m_a_k_e program.  The complete project
    build  is  done  by  arranging for the top-level Makefile to
    change directory into each of the sub-directories and recur-
    sively invoke _m_a_k_e_.

    This  paper  explores  some significant problems encountered
    when developing software projects using the  recursive  _m_a_k_e
    technique.   A  simple  solution is offered, and some of the
    implications of that solution are explored.

    Recursive _m_a_k_e results in a directory tree which looks some-
    thing like this:
                          +++
                          ++-_P+_r+_o_j_e_c_t
                           ++++Mmaokdeufliel1e
                           |++-++Makefile
                           | +-++source1.c
                           | +-++_e_t_c_._._.
                           ++++m+o+dule2
                             +-++Makefile
                             +-++source2.c
                             +-++_e_t_c_._._.
                                |
    This  hierarchy  of  modules can be nested arbitrarily deep.
    Real-world projects often use two-  and  three-level  struc-
    tures.

    11..11..  AAssssuummeedd KKnnoowwlleeddggee

    This paper assumes that the reader is familiar with develop-
    ing software on UNIX, with the _m_a_k_e program,  and  with  the
    issues of C programming and include file dependencies.

    This  paper assumes that you have installed GNU Make on your
    system and are moderately familiar with its features.   Some
    features of _m_a_k_e described below may not be available if you
    are using the limited version supplied by your vendor.

    22..  TThhee PPrroobblleemm

    There are numerous problems with recursive  _m_a_k_e,  and  they
    are usually observed daily in practice.  Some of these prob-
    lems include:
    -----------
    Copyright   (C)  1997  Peter  Miller;  All  rights
    reserved.



    Peter Miller           30 August 2001                 Page 2





    AUUGN'97                   Recursive Make Considered Harmful


    +o It is very hard to get the _o_r_d_e_r of the recursion into the
      sub-directories  correct.  This order is very unstable and
      frequently needs to be manually  ``tweaked.''   Increasing
      the  number of directories, or increasing the depth in the
      directory tree, cause this order to be increasingly unsta-
      ble.

    +o It  is  often  necessary to do more than one pass over the
      sub-directories to build the whole  system.   This,  natu-
      rally, leads to extended build times.

    +o Because  the builds take so long, some dependency informa-
      tion is omitted, otherwise development builds take  unrea-
      sonable  lengths of time, and the developers are unproduc-
      tive.  This usually leads to things not being updated when
      they  need to be, requiring frequent ``clean'' builds from
      scratch, to ensure everything has actually been built.

    +o Because inter-directory dependencies are either omitted or
      too  hard  to  express, the Makefiles are often written to
      build _t_o_o _m_u_c_h to ensure that nothing is left out.

    +o The inaccuracy of the dependencies, or the simple lack  of
      dependencies,  can  result in a product which is incapable
      of building cleanly, requiring the  build  process  to  be
      carefully watched by a human.

    Not  all  projects  experience all of these problems.  Those
    that do experience the problems may  do  so  intermittently,
    and  dismiss the problems as unexplained ``one off'' quirks.
    This paper attempts to bring together a  range  of  symptoms
    observed over long practice, and presents a systematic anal-
    ysis and solution.

    It must be emphasized that this paper does not suggest  that
    _m_a_k_e  itself is the problem.  This paper is working from the
    premise that _m_a_k_e does nnoott have a bug, that  _m_a_k_e  does  nnoott
    have  a design flaw.  The problem is not in _m_a_k_e at all, but
    rather in the input given to _m_a_k_e - the way  _m_a_k_e  is  being
    used.

    33..  AAnnaallyyssiiss

    Before  it  is possible to address these seemingly unrelated
    problems, it is first necessary to understand what _m_a_k_e does
    and  how  it  does  it.   It is then possible to look at the
    effects recursive _m_a_k_e has on how _m_a_k_e behaves.

    33..11..  WWhhoollee MMaakkee

    _M_a_k_e is an expert system.  You give it a set  of  rules  for
    how  to  construct  things,  and a target to be constructed.
    The rules can be decomposed into pair-wise ordered dependen-
    cies between files.  _M_a_k_e takes the rules and determines how



    Peter Miller           30 August 2001                 Page 3





    AUUGN'97                   Recursive Make Considered Harmful


    to build the given target.  Once it has  determined  how  to
    construct the target, it proceeds to do so.

    _M_a_k_e  determines  how  to build the target by constructing a
    _d_i_r_e_c_t_e_d _a_c_y_c_l_i_c _g_r_a_p_h_, the DAG familiar  to  many  Computer
    Science  students.  The vertices of this graph are the files
    in the system, the edges of this graph  are  the  inter-file
    dependencies.   The  edges of the graph are directed because
    the pair-wise dependencies  are  ordered;  resulting  in  an
    _a_c_y_c_l_i_c graph - things which look like loops are resolved by
    the direction of the edges.

    This paper will use a small example project for  its  analy-
    sis.   While  the  number of files in this example is small,
    there is sufficient complexity to  demonstrate  all  of  the
    above  recursive _m_a_k_e problems.  First, however, the project
    is presented in a non-recursive form.
                           +++
                           ++-_P+_r+_o_j_e_c_t
                            +-++Mmaakienf.icle
                            +-++parse.c
                            +-++parse.h
                              ++

    The Makefile in this small project looks like this:

                    +--------------------------+
                    |OBJ = main.o parse.o      |
                    |prog: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |main.o: main.c parse.h    |
                    |  $(CC) -c main.c         |
                    |parse.o: parse.c parse.h  |
                    |  $(CC) -c parse.c        |
                    +--------------------------+
    Some of the  implicit  rules  of  _m_a_k_e  are  presented  here
    explicitly,  to assist the reader in converting the Makefile
    into its equivalent DAG.

    The above Makefile can be drawn as a DAG  in  the  following
    form:

                                prog



                          main.o   parse.o


                      main.c   parse.h  parse.c







    Peter Miller           30 August 2001                 Page 4





    AUUGN'97                   Recursive Make Considered Harmful


    This is an _a_c_y_c_l_i_c graph because of the arrows which express
    the ordering of the  relationship  between  the  files.   If
    there  _w_a_s a circular dependency according to the arrows, it
    would be an error.

    Note that the object files (.o) are dependent on the include
    files  (.h) even though it is the source files (.c) which do
    the including.  This is because if an include file  changes,
    it is the object files which are out-of-date, not the source
    files.

    The second part of what _m_a_k_e does it to perform a  _p_o_s_t_o_r_d_e_r
    traversal of the DAG.  That is, the dependencies are visited
    first.  The actual order of traversal is undefined, but most
    _m_a_k_e  implementations work down the graph from left to right
    for edges below the same vertex, and most  projects  implic-
    itly  rely on this behavior.  The last-time-modified of each
    file is examined, and higher files are determined to be out-
    of-date  if  any of the lower files on which they depend are
    younger.  Where a file is determined to be out-of-date,  the
    action  associated with the relevant graph edge is performed
    (in the above example, a compile or a link).

    The use of recursive _m_a_k_e affects both phases of the  opera-
    tion of _m_a_k_e_: it causes _m_a_k_e to construct an inaccurate DAG,
    and it forces _m_a_k_e to traverse the DAG in  an  inappropriate
    order.

    33..22..  RReeccuurrssiivvee MMaakkee

    To examine the effects of recursive _m_a_k_es, the above example
    will be artificially segmented into two modules,  each  with
    its  own  Makefile,  and a top-level Makefile used to invoke
    each of the module Makefiles.

    This example is intentionally artificial, and thoroughly so.
    However,  all  ``modularity'' of all projects is artificial,
    to some extent.  Consider: for  many  projects,  the  linker
    flattens it all out again, right at the end.

    The directory structure is as follows:
                          +++
                          ++-_P+_r+_o_j_e_c_t
                           ++++Maanktefile
                           |++-++Makefile
                           | +-++main.c
                           ++++b+e+e
                             +-++Makefile
                             +-++parse.c
                             +-++parse.h
                                |
    The  top-level  Makefile  often  looks  a  lot  like a shell
    script:




    Peter Miller           30 August 2001                 Page 5





    AUUGN'97                   Recursive Make Considered Harmful


                  +-------------------------------+
                  |MODULES = ant bee              |
                  |all:                           |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  +-------------------------------+
    The ant/Makefile looks like this:

                  +------------------------------+
                  |all: main.o                   |
                  |main.o: main.c ../bee/parse.h |
                  |  $(CC) -I../bee -c main.c    |
                  +------------------------------+
    and the equivalent DAG looks like this:

                               main.o



                          main.c    parse.h

    The bee/Makefile looks like this:

                   +----------------------------+
                   |OBJ = ../ant/main.o parse.o |
                   |all: prog                   |
                   |prog: (OBJ)                 |
                   |  $(CC) -o $@ $(OBJ)        |
                   |parse.o: parse.c parse.h    |
                   |  $(CC) -c parse.c          |
                   +----------------------------+
    and the equivalent DAG looks like this:

                              prog



                        main.o    parse.o


                             parse.h  parse.c



    Take a close look at the DAGs.  Notice how neither  is  com-
    plete  -  there  are vertices and edges (files and dependen-
    cies) missing from both DAGs.  When the entire build is done
    from the top level, everything will work.

    But  what  happens  when  small changes occur?  For example,
    what would happen of the parse.c and parse.h files were gen-
    erated from a parse.y yacc grammar?  This would add the fol-
    lowing lines to the bee/Makefile:



    Peter Miller           30 August 2001                 Page 6





    AUUGN'97                   Recursive Make Considered Harmful


                    +--------------------------+
                    |parse.c parse.h: parse.y  |
                    |  $(YACC) -d parse.y      |
                    |  mv y.tab.c parse.c      |
                    |  mv y.tab.h parse.h      |
                    +--------------------------+
    And the equivalent DAG changes to look like this:

                              prog



                        main.o    parse.o


                             parse.h  parse.c



                                  parse.y



    This change has a  simple  effect:  if  parse.y  is  edited,
    main.o  will  nnoott be constructed correctly.  This is because
    the DAG for ant knows about only some of the dependencies of
    main.o, and the DAG for bee knows none of them.

    To  understand  why this happens, it is necessary to look at
    the actions _m_a_k_e will take _f_r_o_m _t_h_e _t_o_p _l_e_v_e_l_.  Assume  that
    the project is in a self-consistent state.  Now edit parse.y
    in such a way that the generated parse.h file will have non-
    trivial  differences.   However,  when the top-level _m_a_k_e is
    invoked, first ant and then bee is visited.  But  ant/main.o
    is  _n_o_t  recompiled,  because  bee/parse.h  has not yet been
    regenerated and thus does not yet indicate  that  main.o  is
    out-of-date.   It  is not until bee is visited by the recur-
    sive _m_a_k_e that parse.c and parse.h are  reconstructed,  fol-
    lowed  by  parse.o.   When  the program is linked main.o and
    parse.o are non-trivially incompatible.  That is,  the  pro-
    gram is _w_r_o_n_g_.

    33..33..  TTrraaddiittiioonnaall SSoolluuttiioonnss

    There  are three traditional fixes for the above ``glitch.''

    33..33..11..  RReesshhuuffffllee

    The first is to manually tweak the order of the  modules  in
    the  top-level  Makefile.  But why is this tweak required at
    all?  Isn't _m_a_k_e supposed to be an expert system?   Is  _m_a_k_e
    somehow flawed, or did something else go wrong?





    Peter Miller           30 August 2001                 Page 7





    AUUGN'97                   Recursive Make Considered Harmful


    To answer this question, it is necessary to look, not at the
    graphs, but the _o_r_d_e_r _o_f _t_r_a_v_e_r_s_a_l of the graphs.  In  order
    to  operate  correctly,  _m_a_k_e  needs  to perform a _p_o_s_t_o_r_d_e_r
    traversal, but in separating the DAG into two  pieces,  _m_a_k_e
    has  not been _a_l_l_o_w_e_d to traverse the graph in the necessary
    order - instead the project has dictated an order of traver-
    sal.   An order which, when you consider the original graph,
    is plain _w_r_o_n_g_.  Tweaking the  top-level  Makefile  corrects
    the order to one similar to that which _m_a_k_e could have used.
    Until the next dependency is added...

    33..33..22..  RReeppeettiittiioonn

    The second traditional solution is to  make  more  than  one
    pass in the top-level Makefile, something like this:

                  +-------------------------------+
                  |MODULES = ant bee              |
                  |all:                           |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  |  for dir in $(MODULES); do \  |
                  |    (cd $$dir; ${MAKE} all); \ |
                  |  done                         |
                  +-------------------------------+

    This  doubles  then  length  of time it takes to perform the
    build.  But that is not all: there is no guarantee that  two
    passes  are enough!  The upper bound of the number of passes
    is not even proportional to the number  of  modules,  it  is
    instead  proportional  to  the  number  of graph edges which
    cross module boundaries.

    33..33..33..  OOvveerrkkiillll

    We have already seen an example of how  recursive  _m_a_k_e  can
    build too little, but another common problem is to build too
    much.  The third traditional solution to the above glitch is
    to add even _m_o_r_e lines to ant/Makefile:

                    +--------------------------+
                    |.PHONY: ../bee/parse.h    |
                    |../bee/parse.h:           |
                    |    cd ../bee; \          |
                    |    make clean; \         |
                    |    make all              |
                    +--------------------------+
    This means that whenever main.o is made, parse.h will always
    be considered to be out-of-date.  All of bee will always  be
    rebuilt  including  parse.h,  and  so  main.o will always be
    rebuilt, _e_v_e_n _i_f _e_v_e_r_y_t_h_i_n_g _w_a_s _s_e_l_f _c_o_n_s_i_s_t_e_n_t_.





    Peter Miller           30 August 2001                 Page 8





    AUUGN'97                   Recursive Make Considered Harmful


    44..  PPrreevveennttiioonn

    The above analysis is based one one simple action:  the  DAG
    was  artificially  separated  into  incomplete pieces.  This
    separation resulted in  all  of  the  problems  familiar  to
    recursive _m_a_k_e builds.

    Did  _m_a_k_e  get it wrong?  No.  This is a case of the ancient
    GIGO principle: _G_a_r_b_a_g_e _I_n_, _G_a_r_b_a_g_e _O_u_t_.   Incomplete  Make-
    files are _w_r_o_n_g Makefiles.

    To  avoid  these  problems, don't break the DAG into pieces;
    instead, use one Makefile for the entire project.  It is not
    the  recursion  itself  which is harmful, it is the crippled
    Makefiles which are used in the recursion which  are  _w_r_o_n_g.
    It is not a deficiency of _m_a_k_e itself that recursive _m_a_k_e is
    broken, it does the best it can with the flawed input it  is
    given.

         ``_B_u_t_,  _b_u_t_,  _b_u_t_._._.   _Y_o_u _c_a_n_'_t _d_o _t_h_a_t_!'' I hear
         you cry.  ``_A _s_i_n_g_l_e Makefile  _i_s  _t_o_o  _b_i_g_,  _i_t_'_s
         _u_n_m_a_i_n_t_a_i_n_a_b_l_e_,  _i_t_'_s _t_o_o _h_a_r_d _t_o _w_r_i_t_e _t_h_e _r_u_l_e_s_,
         _y_o_u_'_l_l _r_u_n _o_u_t _o_f _m_e_m_o_r_y_, _I _o_n_l_y _w_a_n_t _t_o _b_u_i_l_d  _m_y
         _l_i_t_t_l_e  _b_i_t_,  _t_h_e  _b_u_i_l_d _w_i_l_l _t_a_k_e _t_o_o _l_o_n_g_.  _I_t_'_s
         _j_u_s_t _n_o_t _p_r_a_c_t_i_c_a_l_.''

    These are valid concerns,  and  they  frequently  lead  _m_a_k_e
    users  to the conclusion that re-working their build process
    does not have any short- or long-term benefits.   This  con-
    clusion is based on ancient, enduring, false assumptions.

    The  following  sections will address each of these concerns
    in turn.

    44..11..  AA SSiinnggllee Makefile IIss TToooo BBiigg

    If the entire project build description were placed  into  a
    single Makefile this would certainly be true, however modern
    _m_a_k_e implementations have _i_n_c_l_u_d_e statements.  By  including
    a  relevant fragment from each module, the total size of the
    Makefile and its include files need be no  larger  than  the
    total size of the Makefiles in the recursive case.

    44..22..  AA SSiinnggllee Makefile IIss UUnnmmaaiinnttaaiinnaabbllee

    The  complexity  of  using a single top-level Makefile which
    includes a fragment from each module is no more complex than
    in  the  recursive  case.  Because the DAG is not segmented,
    this form of Makefile becomes less complex,  and  thus  _m_o_r_e
    maintainable,  simply  because fewer ``tweaks'' are required
    to keep it working.

    Recursive Makefiles have a great deal of  repetition.   Many
    projects  solve  this  by  using  include files.  By using a



    Peter Miller           30 August 2001                 Page 9





    AUUGN'97                   Recursive Make Considered Harmful


    single Makefile for the project, the need for the ``common''
    include files disappears - the single Makefile is the common
    part.

    44..33..  IItt''ss TToooo HHaarrdd TToo WWrriittee TThhee RRuulleess

    The only change required is to include the directory part in
    filenames  in  a number of places.  This is because the _m_a_k_e
    is performed  from  the  top-level  directory;  the  current
    directory  is  not the one in which the file appears.  Where
    the output file is explicitly stated in a rule, this is  not
    a problem.

    GCC  allows  a  -o option in conjunction with the -c option,
    and GNU Make knows this.  This results in the implicit  com-
    pilation  rule  placing  the  output  in  the correct place.
    Older and dumber C compilers, however, may not allow the  -o
    option with the -c option, and will leave the object file in
    the top-level directory (_i_._e_. the wrong  directory).   There
    are  three  ways  for you to fix this: get GNU Make and GCC,
    override the built-in rule with one  which  does  the  right
    thing, or complain to your vendor.

    Also,  K&R  C  compilers will start the double-quote include
    path (#include "_f_i_l_e_n_a_m_e_._h")  from  the  current  directory.
    This  will not do what you want.  ANSI C compliant C compil-
    ers, however, start the double-quote include path  from  the
    directory  in which the source file appears; thus, no source
    changes are required.  If you don't have an ANSI C compliant
    C  compiler, you should consider installing GCC on your sys-
    tem as soon as possible.

    44..44..  II OOnnllyy WWaanntt TToo BBuuiilldd MMyy LLiittttllee BBiitt

    Most of the time, developers are  deep  within  the  project
    tree  and  they  edit  one or two files and then run _m_a_k_e to
    compile their changes and try them out.  They  may  do  this
    dozens  or  hundreds  of  times a day.  Being forced to do a
    full project build every time would be absurd.

    Developers always have the option of giving _m_a_k_e a  specific
    target.   This is always the case, it's just that we usually
    rely on the default target in the Makefile  in  the  current
    directory to shorten the command line for us.  Building ``my
    little bit'' can still be done with a  whole  project  Make-
    file, simply by using a specific target, and an alias if the
    command line is too long.

    Is doing a full project build every time so  absurd?   If  a
    change  made in a module has repercussions in other modules,
    because there is a dependency the developer  is  unaware  of
    (but  the  Makefile  is  aware of), isn't it better that the
    developer find out as early as possible?  Dependencies  like
    this _w_i_l_l be found, because the DAG is more complete than in



    Peter Miller           30 August 2001                Page 10





    AUUGN'97                   Recursive Make Considered Harmful


    the recursive case.

    The developer is rarely a seasoned old salt who knows  every
    one  of  the  million  lines  of  code in the product.  More
    likely the developer is a short-term contractor or a junior.
    You  don't want implications like these to blow up after the
    changes are integrated with the master source, you want them
    to  blow  up on the developer in some nice safe sand-box far
    away from the master source.

    If you want to make ``just your little'' bit because you are
    concerned  that performing a full project build will corrupt
    the project master source, due to  the  directory  structure
    used in your project, see the ``Projects _v_e_r_s_u_s Sand-Boxes''
    section below.

    44..55..  TThhee BBuuiilldd WWiillll TTaakkee TToooo LLoonngg

    This statement can be made from  one  of  two  perspectives.
    First,  that  a  whole project _m_a_k_e, even when everything is
    up-to-date, inevitably takes a long time to  perform.   Sec-
    ondly,  that these inevitable delays are unacceptable when a
    developer wants to quickly compile and  link  the  one  file
    that they have changed.

    44..55..11..  PPrroojjeecctt BBuuiillddss

    Consider a hypothetical project with 1000 source (.c) files,
    each of which has its calling interface defined in a  corre-
    sponding  include  (.h) file with defines, type declarations
    and function prototypes.  These 1000  source  files  include
    their  own  interface definition, plus the interface defini-
    tions of any other module they may call.  These 1000  source
    files  are  compiled  into  1000 object files which are then
    linked into an executable program.   This  system  has  some
    3000  files which _m_a_k_e must be told about, and be told about
    the include dependencies, and also explore  the  possibility
    that implicit rules (.y -> .c for example) may be necessary.

    In order to build the DAG, _m_a_k_e must  ``stat''  3000  files,
    plus  an  additional  2000  files  or so, depending on which
    implicit rules your _m_a_k_e knows about and your  Makefile  has
    left  enabled.  On the author's humble 66MHz i486 this takes
    about 10 seconds; on native disk on faster platforms it goes
    even  faster.  With NFS over 10MB Ethernet it takes about 10
    seconds, no matter what the platform.

    This is an astonishing statistic!  Imagine being able to  do
    a  single file compile, out of 1000 source files, in only 10
    seconds, plus the time for the compilation itself.

    Breaking the set of files up into 100 modules,  and  running
    it as a recursive _m_a_k_e takes about 25 seconds.  The repeated
    process creation for the subordinate _m_a_k_e  invocations  take



    Peter Miller           30 August 2001                Page 11





    AUUGN'97                   Recursive Make Considered Harmful


    quite a long time.

    Hang  on  a  minute!   On real-world projects with less than
    1000 files, it takes an awful lot longer than 25 seconds for
    _m_a_k_e  to  work out that it has nothing to do.  For some pro-
    jects, doing it in only 25 minutes would be an  improvement!
    The above result tells us that it is not the number of files
    which is slowing us down (that only takes 10  seconds),  and
    it  is not the repeated process creation for the subordinate
    _m_a_k_e invocations (that only takes another 15  seconds).   So
    just what _i_s taking so long?

    The  traditional  solutions  to  the  problems introduced by
    recursive _m_a_k_e often increase the number of subordinate _m_a_k_e
    invocations  beyond the minimum described here; _e_._g_. to per-
    form multiple repetitions (3.3.2), or to overkill cross-mod-
    ule  dependencies (3.3.3).  These can take a long time, par-
    ticularly when combined, but do not account for some of  the
    more spectacular build times; what else is taking so long?

    Complexity  of the Makefile is what is taking so long.  This
    is covered, below, in the _E_f_f_i_c_i_e_n_t _M_a_k_e_f_i_l_e_s section.

    44..55..22..  DDeevveellooppmmeenntt BBuuiillddss

    If, as in the 1000 file example, it only takes 10 seconds to
    figure  out  which  one of the files needs to be recompiled,
    there is no serious threat to the productivity of developers
    if  they do a whole-project _m_a_k_e as opposed to a module-spe-
    cific _m_a_k_e.  The advantage for the project is that the  mod-
    ule-centric  developer  is  reminded  at relevant times (and
    only relevant times) that their  work  has  wider  ramifica-
    tions.

    By consistently using C include files which contain accurate
    interface definitions (including function prototypes),  this
    will  produce  compilation errors in many of the cases which
    would result in a defective product.  By doing whole-project
    builds,  developers  discover  such errors very early in the
    development process, and can fix the problems when they  are
    least expensive.

    44..66..  YYoouu''llll RRuunn OOuutt OOff MMeemmoorryy

    This  is the most interesting response.  Once long ago, on a
    CPU far, far away, it may even have been true.  When Feldman
    [feld78]  first  wrote  _m_a_k_e  it was 1978 and he was using a
    PDP11.  Unix processes were limited to 64KB of data.

    On such a computer, the above project with  its  3000  files
    detailed  in  the whole-project Makefile, would probably _n_o_t
    allow the DAG and rule actions to fit in memory.





    Peter Miller           30 August 2001                Page 12





    AUUGN'97                   Recursive Make Considered Harmful


    But we are not using PDP11s any more.  The  physical  memory
    of  modern  computers  exceeds 10MB for _s_m_a_l_l computers, and
    virtual memory often exceeds 100MB.  It is going to  take  a
    project  with  hundreds  of  thousands  of  source  files to
    exhaust virtual memory on a _s_m_a_l_l modern computer.   As  the
    1000  source  file  example  takes less than 100KB of memory
    (try it, I did) it is unlikely that any  project  manageable
    in  a  single  directory  tree on a single disk will exhaust
    your computer's memory.

    44..77..  WWhhyy NNoott FFiixx TThhee DDAAGG IInn TThhee MMoodduulleess??

    It was shown in the above discussion that the  problem  with
    recursive  _m_a_k_e is that the DAGs are incomplete.  It follows
    that by adding the missing portions, the problems  would  be
    resolved  without  abandoning  the  existing  recursive _m_a_k_e
    investment.

    +o The developer needs to remember to do this.  The  problems
      will  not  affect  the  developer  of  the module, it will
      affect the developers of _o_t_h_e_r modules.  There is no trig-
      ger to remind the developer to do this, other than the ire
      of fellow developers.

    +o It is difficult to work out where the changes need  to  be
      made.   Potentially  every  Makefile in the entire project
      needs to  be  examined  for  possible  modifications.   Of
      course,  you  can  wait for your fellow developers to find
      them for you.

    +o The include dependencies will be recomputed unnecessarily,
      or  will be interpreted incorrectly.  This is because _m_a_k_e
      is string based, and thus ``.''  and  ``../ant''  are  two
      different  places, even when you are in the ant directory.
      This is of concern when include dependencies are automati-
      cally generated - as they are for all large projects.

    By making sure that each Makefile is complete, you arrive at
    the point where the Makefile for at least  one  module  con-
    tains  the  equivalent  of  a whole-project Makefile (recall
    that these modules form a single project and are thus inter-
    connected), and there is no need for the recursion any more.

    55..  EEffffiicciieenntt MMaakkeeffiilleess

    The central theme of this paper is the _s_e_m_a_n_t_i_c side-effects
    of artificially separating a Makefile into the pieces neces-
    sary to perform a recursive _m_a_k_e.  However, once you have  a
    large  number  of  Makefiles,  the  speed  at which _m_a_k_e can
    interpret this multitude of files also becomes an issue.

    Builds can take ``forever'' for both these reasons: the tra-
    ditional  fixes  for  the  separated DAG may be building too
    much _a_n_d your Makefile may be inefficient.



    Peter Miller           30 August 2001                Page 13





    AUUGN'97                   Recursive Make Considered Harmful


    55..11..  DDeeffeerrrreedd EEvvaalluuaattiioonn

    The text in a Makefile must somehow be read from a text file
    and  understood  by _m_a_k_e so that the DAG can be constructed,
    and the specified actions attached to the  edges.   This  is
    all kept in memory.

    The  input  language for Makefiles is deceptively simple.  A
    crucial distinction that  often  escapes  both  novices  and
    experts  alike  is that _m_a_k_e's input language is _t_e_x_t _b_a_s_e_d_,
    as opposed to token based, as is the  case  for  C  or  AWK.
    _M_a_k_e does the very least possible to process input lines and
    stash them away in memory.

    As an example of this, consider the following assignment:

                    +--------------------------+
                    |OBJ = main.o parse.o      |
                    +--------------------------+
    Humans read this as the  variable  OBJ  being  assigned  two
    filenames ``main.o'' and ``parse.o''.  But _m_a_k_e does not see
    it that way.  Instead  OBJ is assigned the  _s_t_r_i_n_g  ``main.o
    parse.o''.  It gets worse:

                    +--------------------------+
                    |SRC = main.c parse.c      |
                    |OBJ = $(SRC:.c=.o)        |
                    +--------------------------+
    In  this  case humans expect _m_a_k_e to assign two filenames to
    OBJ, but _m_a_k_e actually assigns the string  ``$(SRC:.c=.o)''.
    This is because it is a _m_a_c_r_o language with deferred evalua-
    tion, as opposed to one with variables and immediate evalua-
    tion.

    If  this does not seem too problematic, consider the follow-
    ing Makefile:

                   +-----------------------------+
                   |SRC = $(shell echo 'Ouch!' \ |
                   |  1>&2 ; echo *.[cy])        |
                   |OBJ = \                      |
                   |  $(patsubst %.c,%.o,\       |
                   |    $(filter %.c,$(SRC))) \  |
                   |  $(patsubst %.y,%.o,\       |
                   |    $(filter %.y,$(SRC)))    |
                   |test: $(OBJ)                 |
                   |  $(CC) -o $@ $(OBJ)         |
                   +-----------------------------+
    How many times will the shell command  be  executed?   OOuucchh!!
    It  will  be executed _t_w_i_c_e just to construct the DAG, and a
    further _t_w_o times if the rule needs to be executed.

    If this shell command does anything complex or time  consum-
    ing  (and  it  usually  does) it will take _f_o_u_r times longer



    Peter Miller           30 August 2001                Page 14





    AUUGN'97                   Recursive Make Considered Harmful


    than you thought.

    But it is worth looking at the other portions  of  that  OBJ
    macro.   Each  time it is named, a huge amount of processing
    is performed:

    +o The argument to _s_h_e_l_l is a single  string  (all  built-in-
      functions  take  a single string argument).  The string is
      executed in a sub-shell, and the standard output  of  this
      command is read back in, translating newlines into spaces.
      The result is a single string.

    +o The argument to _f_i_l_t_e_r is a single string.  This  argument
      is  broken into two strings at the first comma.  These two
      strings are then each broken into sub-strings separated by
      spaces.   The  first  set are the patterns, the second set
      are the filenames.  Then, for each  of  the  pattern  sub-
      strings,  if  a filename sub-string matches it, that file-
      name is included in the output.  Once all  of  the  output
      has  been  found,  it is re-assembled into a single space-
      separated string.

    +o The argument to _p_a_t_s_u_b_s_t is a single string.   This  argu-
      ment  is broken into three strings at the first and second
      commas.  The third string is then broken into  sub-strings
      separated  by  spaces, these are the filenames.  Then, for
      each of the filenames which match the first string  it  is
      substituted according to the second string.  If a filename
      does not match, it is passed through unchanged.  Once  all
      of  the output has been generated, it is re-assembled into
      a single space-separated string.

    Notice how many times those strings are disassembled and re-
    assembled.   Notice  how  many  ways  that happens.  _T_h_i_s _i_s
    _s_l_o_w_.  The example here names just two  files  but  consider
    how inefficient this would be for 1000 files.  Doing it _f_o_u_r
    times becomes decidedly inefficient.

    If you are using a dumb _m_a_k_e that has no  substitutions  and
    no  built-in  functions, this cannot bite you.  But a modern
    _m_a_k_e has lots of built-in  functions  and  can  even  invoke
    shell  commands  on-the-fly.   The  semantics of _m_a_k_e's text
    manipulation is such that string  manipulation  in  _m_a_k_e  is
    very  CPU  intensive, compared to performing the same string
    manipulations in C or AWK.

    55..22..  IImmmmeeddiiaattee EEvvaalluuaattiioonn

    Modern _m_a_k_e implementations  have  an  immediate  evaluation
    ``:=''  assignment  operator.   The above example can be re-
    written as






    Peter Miller           30 August 2001                Page 15





    AUUGN'97                   Recursive Make Considered Harmful


                  +------------------------------+
                  |SRC := $(shell echo 'Ouch!' \ |
                  |  1>&2 ; echo *.[cy])         |
                  |OBJ := \                      |
                  |  $(patsubst %.c,%.o,\        |
                  |    $(filter %.c,$(SRC))) \   |
                  |  $(patsubst %.y,%.o,\        |
                  |    $(filter %.y,$(SRC)))     |
                  |test: $(OBJ)                  |
                  |  $(CC) -o $@ $(OBJ)          |
                  +------------------------------+
    Note that _b_o_t_h assignments are immediate evaluation  assign-
    ments.   If  the  first  were  not,  the shell command would
    always be executed twice.   If  the  second  were  not,  the
    expensive  substitutions  would  be performed at least twice
    and possibly four times.

    As a rule of thumb: always use immediate evaluation  assign-
    ment unless you knowingly want deferred evaluation.

    55..33..  IInncclluuddee FFiilleess

    Many Makefiles perform the same text processing (the filters
    above, for example) for  every  single  _m_a_k_e  run,  but  the
    results  of  the processing rarely change.  Wherever practi-
    cal, it is more efficient to record the results of the  text
    processing  into  a file, and have the Makefile include this
    file.

    55..44..  DDeeppeennddeenncciieess

    Don't be miserly with include files.   They  are  relatively
    inexpensive  to  read,  compared to $(shell), so more rather
    than less doesn't greatly affect efficiency.

    As an example of this, it is first necessary to  describe  a
    useful  feature  of  GNU Make: once a Makefile has been read
    in, if any of its included files were out-of-date (or do not
    yet  exist),  they are re-built, and then _m_a_k_e starts again,
    which has the result that _m_a_k_e is now  working  with  up-to-
    date include files.  This feature can be exploited to obtain
    automatic include file dependency tracking  for  C  sources.
    The obvious way to implement it, however, has a subtle flaw.

                    +--------------------------+
                    |SRC := $(wildcard *.c)    |
                    |OBJ := $(SRC:.c=.o)       |
                    |test: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |include dependencies      |
                    |dependencies: $(SRC)      |
                    |  depend.sh $(CFLAGS) \   |
                    |    $(SRC) > $@           |
                    +--------------------------+



    Peter Miller           30 August 2001                Page 16





    AUUGN'97                   Recursive Make Considered Harmful


    The depend.sh script prints lines of the form

         _f_i_l_e.o: _f_i_l_e.c _i_n_c_l_u_d_e.h ...

    The most simple implementation of this is to  use  _G_C_C_,  but
    you  will  need an equivalent awk script or C program if you
    have a different compiler:

                    +--------------------------+
                    |#!/bin/sh                 |
                    |gcc -MM -MG "$@"          |
                    +--------------------------+
    This implementation of tracking C include  dependencies  has
    several  serious flaws, but the one most commonly discovered
    is that the dependencies file does not,  itself,  depend  on
    the  C include files.  That is, it is not re-built in one of
    the include files changes.  There is  no  edge  in  the  DAG
    joining  the  dependencies vertex to any of the include file
    vertices.  If an include file  changes  to  include  another
    file (a nested include), the dependencies will not be recal-
    culated, and potentially the C file will not be  recompiled,
    and thus the program will not be re-built correctly.

    A  classic  build-too-little  problem, caused by giving _m_a_k_e
    inadequate information, and thus  causing  it  to  build  an
    inadequate DAG and reach the wrong conclusion.

    The traditional solution is to build too much:

                    +--------------------------+
                    |SRC := $(wildcard *.c)    |
                    |OBJ := $(SRC:.c=.o)       |
                    |test: $(OBJ)              |
                    |  $(CC) -o $@ $(OBJ)      |
                    |include dependencies      |
                    |.PHONY: dependencies      |
                    |dependencies: $(SRC)      |
                    |  depend.sh $(CFLAGS) \   |
                    |    $(SRC) > $@           |
                    +--------------------------+
    Now,  even  if  the  project  is  completely up-do-date, the
    dependencies will be re-built.  For a large project, this is
    very wasteful, and can be a major contributor to _m_a_k_e taking
    ``forever'' to work out that nothing needs to be done.

    There is a second problem, and that is that if  any  _o_n_e  of
    the  C  files changes, _a_l_l of the C files will be re-scanned
    for include dependencies.  This is as inefficient as  having
    a Makefile which reads

                    +--------------------------+
                    |prog: $(SRC)              |
                    |  $(CC) -o $@ $(SRC)      |
                    +--------------------------+



    Peter Miller           30 August 2001                Page 17





    AUUGN'97                   Recursive Make Considered Harmful


    What  is  needed, in exact analogy to the C case, is to have
    an intermediate form.  This is usually given a  ``.d''  suf-
    fix.   By exploiting the fact that more than one file may be
    named in an include line, there is no need to  ``link''  all
    of the ``.d'' files together:

                  +------------------------------+
                  |SRC := $(wildcard *.c)        |
                  |OBJ := $(SRC:.c=.o)           |
                  |test: $(OBJ)                  |
                  |  $(CC) -o $@ $(OBJ)          |
                  |include $(OBJ:.o=.d)          |
                  |%.d: %.c                      |
                  |  depend.sh $(CFLAGS) $* > $@ |
                  +------------------------------+

    This  has  one  more  thing  to fix: just as the object (.o)
    files depend on the source files and the include  files,  so
    do the dependency (.d) files.

         _f_i_l_e.d _f_i_l_e.o: _f_i_l_e.c _i_n_c_l_u_d_e.h

    This means tinkering with the depend.sh script again:

                +-----------------------------------+
                |#!/bin/sh                          |
                |gcc -MM -MG "$@" |                 |
                |sed -e 's@^\(.*\)\.o:@\1.d \1.o:@' |
                +-----------------------------------+

    This method of determining include file dependencies results
    in the Makefile  including  more  files  than  the  original
    method,  but opening files is less expensive than rebuilding
    all of the dependencies every time.  Typically  a  developer
    will  edit  one or two files before re-building; this method
    will rebuild the _e_x_a_c_t dependency  file  affected  (or  more
    than  one, if you edited an include file).  On balance, this
    will use less CPU, and less time.

    In the case of a build where nothing needs to be done,  _m_a_k_e
    will  actually  do  nothing,  and  will  work  this out very
    quickly.

    However, the above technique assumes your project fits  eni-
    trely  within  the  one directory.  For large projects, this
    usually isn't the  case.   This  means  tinkering  with  the
    depend.sh script again:










    Peter Miller           30 August 2001                Page 18





    AUUGN'97                   Recursive Make Considered Harmful


           +---------------------------------------------+
           |#!/bin/sh                                    |
           |DIR="$1"                                     |
           |shift 1                                      |
           |case "$DIR" in                               |
           |"" | ".")                                    |
           |gcc -MM -MG "$@" |                           |
           |sed -e 's@^\(.*\)\.o:@\1.d \1.o:@'           |
           |;;                                           |
           |*)                                           |
           |gcc -MM -MG "$@" |                           |
           |sed -e "s@^\(.*\)\.o:@$DIR/\1.d $DIR/\1.o:@" |
           |;;                                           |
           |esac                                         |
           +---------------------------------------------+
    And  the rule needs to change, too, to pass the directory as
    the first argument, as the script expects.


    Note that the .d files will be relative  to  the  top  level
    directory.   Writing  them so that they can be used form any
    level is possible, but beyond the scope of this paper.

    55..55..  MMuullttiipplliieerr

    All of the inefficiencies described in this section compound
    together.   If you do 100 Makefile interpretations, once for
    each module, checking 1000 source files can take a very long
    time  - if the interpretation requires complex processing or
    performs unnecessary work, or both.  A whole  project  _m_a_k_e,
    on  the  other  hand, only needs to interpret a single Make-
    file.

    66..  PPrroojjeeccttss _v_e_r_s_u_s SSaanndd--bbooxxeess

    The above discussion assumes that a project resides under  a
    single  directory  tree,  and this is often the ideal.  How-
    ever, the realities of working with large software  projects
    often  lead  to  weird and wonderful directory structures in
    order to have developers working on  different  sections  of
    the project without taking complete copies and thereby wast-
    ing precious disk space.

    It is possible to see the whole-project _m_a_k_e  proposed  here
    as  impractical, because it does not match the evolved meth-
    ods of your development process.

    The whole-project _m_a_k_e proposed here does have an effect  on
    development  methods:  it  can  give you cleaner and simpler
    build environments for your  developers.   By  using  _m_a_k_e's
    VPATH  feature,  it is possible to copy only those files you
    need to edit into your private work  area,  often  called  a
    _s_a_n_d_-_b_o_x_.




    Peter Miller           30 August 2001                Page 19





    AUUGN'97                   Recursive Make Considered Harmful


    The  simplest  explanation  of what VPATH does is to make an
    analogy with the include file search  path  specified  using
    -I_p_a_t_h  options  to  the  C  compiler.   This set of options
    describes where to look for files, just as VPATH tells  _m_a_k_e
    where to look for files.

    By  using VPATH, it is possible to ``stack'' the sand-box _o_n
    _t_o_p _o_f the project master source, so that files in the sand-
    box  take  precedence,  but it is the union of all the files
    which _m_a_k_e uses to perform the build.
                      +          +
                     +_M+_a_s_t_e_r _S_o_u_r+_c+_e
                     +   main.c +   _C_o_m_b_i_n_e_d _V_i_e_w
                    +   parse.y+       main.c
                     _S_a_n_d_-_B_o_x    +     parse.y
                      main.c    ++   variable.c
                                +
                    variable.c +


    In this environment, the sand-box has the same  tree  struc-
    ture  as  the project master source.  This allows developers
    to safely change things across  separate  modules,  _e_._g_.  if
    they  are  changing  a module interface.  It also allows the
    sand-box to be physically separate - perhaps on a  different
    disk,  or  under  their  home directory.  It also allows the
    project master source to be read-only, if you have (or would
    like) a rigorous check-in procedure.

    Note: in addition to adding a VPATH line to your development
    Makefile, you will also need to add -I options to the CFLAGS
    macro,  so  that  the  C compiler uses the same path as _m_a_k_e
    does.  This is simply done with a 3-line  Makefile  in  your
    work area - set a macro, set the VPATH, and then include the
    Makefile from the project master source.

    66..11..  VVPPAATTHH SSeemmaannttiiccss

    For the above discussion to apply, you need to use GNU  make
    3.76  or later.  For versions of GNU Make earlier than 3.76,
    you will need  Paul  Smith's  VPATH+  patch.   This  may  be
    obtained  from ftp://ftp.wellfleet.com/netman/psmith/gmake/.

    The POSIX semantics of VPATH  are  slightly  brain-dead,  so
    many  other  _m_a_k_e  implementations are too limited.  You may
    want to consider installing GNU Make.

    77..  TThhee BBiigg PPiiccttuurree

    This section brings together all of  the  preceding  discus-
    sion,  and  presents  the  example project with its separate
    modules, but with a whole-project Makefile.   The  directory
    structure  is changed little from the recursive case, except
    that the deeper Makefiles are replaced  by  module  specific



    Peter Miller           30 August 2001                Page 20





    AUUGN'97                   Recursive Make Considered Harmful


    include files:
                          +++
                          ++-_P+_r+_o_j_e_c_t
                           ++++Maanktefile
                           |++-++module.mk
                           | +-++main.c
                           ++++b+e+e
                           | +-++module.mk
                           | +-++parse.y
                           +-++de|pend.sh
                              |

    The Makefile looks like this:

                  +-------------------------------+
                  |MODULES := ant bee             |
                  |# look for include files in    |
                  |#   each of the modules        |
                  |CFLAGS += $(patsubst %,-I%,\   |
                  |  $(MODULES))                  |
                  |# extra libraries if required  |
                  |LIBS :=                        |
                  |# each module will add to this |
                  |SRC :=                         |
                  |# include the description for  |
                  |#   each module                |
                  |include $(patsubst %,\         |
                  |    %/module.mk,$(MODULES))    |
                  |# determine the object files   |
                  |OBJ :=                    \    |
                  |  $(patsubst %.c,%.o,     \    |
                  |    $(filter %.c,$(SRC))) \    |
                  |  $(patsubst %.y,%.o,     \    |
                  |    $(filter %.y,$(SRC)))      |
                  |# link the program             |
                  |prog: $(OBJ)                   |
                  |  $(CC) -o $@ $(OBJ) $(LIBS)   |
                  |# include the C include        |
                  |#   dependencies               |
                  |include $(OBJ:.o=.d)           |
                  |# calculate C include          |
                  |#   dependencies               |
                  |%.d: %.c                       |
                  |  depend.sh $(CFLAGS) $< > $@  |
                  +-------------------------------+
    This looks absurdly large, but it has all of the common ele-
    ments in the one place, so that each of  the  modules'  _m_a_k_e
    includes may be small.

    The ant/module.mk file looks like:

                    +--------------------------+
                    |SRC += ant/main.c         |
                    +--------------------------+



    Peter Miller           30 August 2001                Page 21





    AUUGN'97                   Recursive Make Considered Harmful


    The bee/module.mk file looks like:

                    +--------------------------+
                    |SRC += bee/parse.y        |
                    |LIBS += -ly               |
                    |%.c %.h: %.y              |
                    |  $(YACC) -d $*.y         |
                    |  mv y.tab.c $*.c         |
                    |  mv y.tab.h $*.h         |
                    +--------------------------+

    Notice that the built-in rules are used for the C files, but
    we need special yacc processing  to  get  the  generated  .h
    file.

    The  savings  in  this  example look irrelevant, because the
    top-level Makefile is so large.  But consider if there  were
    100  modules,  each  with  only a few non-comment lines, and
    those specifically relevant to the module.  The savings soon
    add  up  to a total size often _l_e_s_s _t_h_a_n the recursive case,
    without loss of modularity.

    The equivalent DAG of the Makefile after all of the includes
    looks like this:

                                prog



                          main.o   parse.o
                            main.d|  parse.d|
                                  |         |
                      main.c   parse.h  parse.c



                                   parse.y



    The vertexes and edges for the include file dependency files
    are also present as these are important for _m_a_k_e to function
    correctly.

    77..11..  SSiiddee EEffffeeccttss

    There are a couple of desirable side-effects of using a sin-
    gle Makefile.

    +o The GNU Make -j option, for parallel  builds,  works  even
    better  than before.  It can find even more unrelated things
    to do at once.





    Peter Miller           30 August 2001                Page 22





    AUUGN'97                   Recursive Make Considered Harmful


    +o The general make -k option, to continue as far as possible
    even  in  the face fo errors, works even better than before.
    It can find even more things to continue with.

    88..  LLiitteerraattuurree SSuurrvveeyy

    How can it be possible that we have been misusing  _m_a_k_e  for
    20  years?   How can it be possible that behavior previously
    ascribed to _m_a_k_e's limitations is in fact a result of misus-
    ing it?

    The  author  only started thinking about the ideas presented
    in this paper when faced with a number of ugly  build  prob-
    lems  on  utterly  different projects, but with common symp-
    toms.  By stepping back from the  individual  projects,  and
    closely  examining  the  thing  they had in common, _m_a_k_e, it
    became possible to see the larger pattern.  Most of  us  are
    too  caught  up  in  the minutiae of just getting the rotten
    build to work that we don't have time to spare for  the  big
    picture.  Especially when the item in question ``obviously''
    works, and has done so continuously for the last 20 years.

    It is interesting that the problems of  recursive  _m_a_k_e  are
    rarely  mentioned in the very books Unix programmers rely on
    for accurate, practical advice.

    88..11..  TThhee OOrriiggiinnaall PPaappeerr

    The original _m_a_k_e paper [feld78] contains  no  reference  to
    recursive  _m_a_k_e_, let alone any discussion as to the relative
    merits of whole project _m_a_k_e over recursive _m_a_k_e_.

    It is hardly surprising that the original paper did not dis-
    cuss  recursive  _m_a_k_e, Unix projects at the time usually _d_i_d
    fit into a single directory.

    It may be this which set the ``one Makefile in every  direc-
    tory''  concept so firmly in the collective Unix development
    mind-set.

    88..22..  GGNNUU MMaakkee

    The GNU Make manual [stal93] contains several pages of mate-
    rial  concerning  recursive  _m_a_k_e_, however its discussion of
    the merits or otherwise of the technique are limited to  the
    brief statement that

         ``This  technique is useful when you want to sepa-
         rate makefiles for various subsystems that compose
         a larger system.''

    No mention is made of the problems you may encounter.





    Peter Miller           30 August 2001                Page 23





    AUUGN'97                   Recursive Make Considered Harmful


    88..33..  MMaannaaggiinngg PPrroojjeeccttss wwiitthh MMaakkee

    The Nutshell Make book [talb91] specifically promotes recur-
    sive _m_a_k_e over whole project _m_a_k_e because

         ``The cleanest way to build is to put  a  separate
         description  file  in each directory, and tie them
         together through a master  description  file  that
         invokes  _m_a_k_e  recursively.  While cumbersome, the
         technique is easier to  maintain  than  a  single,
         enormous  file that covers multiple directories.''
         (p. 65)

    This is despite the book's advice only two  paragraphs  ear-
    lier that

         ``_m_a_k_e is happiest when you keep all your files in
         a single directory.'' (p. 64)

    Yet the book fails to discuss the contradiction in these two
    statements,  and  goes on to describe one of the traditional
    ways of treating the symptoms of incomplete DAGs  caused  by
    recursive _m_a_k_e.

    The  book  may  give  us a clue as to why recursive _m_a_k_e has
    been used in this way for so many  years.   Notice  how  the
    above  quotes  confuse  the  concept of a directory with the
    concept of a Makefile.

    This paper suggests a simple change to the mind-set:  direc-
    tory  trees,  however deep, are places to store files; Make-
    files are places to describe the relationships between those
    files, however many.

    88..44..  BBSSDD MMaakkee

    The tutorial for BSD Make [debo88] says nothing at all about
    recursive _m_a_k_e, but it is one  of  the  few  which  actually
    described, however briefly, the relationship between a Make-
    file and a DAG (p. 30).  There is also a wonderful quote

         ``If _m_a_k_e doesn't do what you expect it to, it's a
         good chance the makefile is wrong.'' (p. 10)

    Which is a pithy summary of the thesis of this paper.

    99..  SSuummmmaarryy

    This paper presents a number of related problems, and demon-
    strates that they are not inherent limitations of  _m_a_k_e,  as
    is  commonly  believed,  but  are  the  result of presenting
    incorrect information to _m_a_k_e.  This is the ancient  _G_a_r_b_a_g_e
    _I_n_,  _G_a_r_b_a_g_e  _O_u_t  principle at work.  Because _m_a_k_e can only
    operate correctly with a  complete  DAG,  the  error  is  in



    Peter Miller           30 August 2001                Page 24





    AUUGN'97                   Recursive Make Considered Harmful


    segmenting the Makefile into incomplete pieces.

    This  requires a shift in thinking: directory _t_r_e_e_s are sim-
    ply a place to hold files, Makefiles are a place to remember
    relationships between files.  Do not confuse the two because
    it is as important to accurately represent the relationships
    between files in different directories as it is to represent
    the relationships between files in the same directory.  This
    has  the  implication that there should be exactly one Make-
    file for a project, but the magnitude of the description can
    be managed by using a _m_a_k_e include file in each directory to
    describe the subset of the project files in that  directory.
    This  is just as modular as having a Makefile in each direc-
    tory.

    This paper has shown how a project build and  a  development
    build  can be equally brief for a whole-project _m_a_k_e.  Given
    this parity of time, the gains provided by  accurate  depen-
    dencies mean that this process will, in fact, be faster than
    the recursive _m_a_k_e case, and more accurate.

    99..11..  IInntteerr--ddeeppeennddeenntt PPrroojjeeccttss

    In organizations with a strong culture of re-use, implement-
    ing  whole-project  _m_a_k_e  can present challenges.  Rising to
    these challenges, however, may require looking at the bigger
    picture.

    +o A  module  may  be shared between two programs because the
      programs are closely related.  Clearly, the  two  programs
      plus  the  shared  module  belong to the same project (the
      module may be self-contained, but the programs  are  not).
      The dependencies must be explicitly stated, and changes to
      the module must result in both programs  being  recompiled
      and  re-linked  as appropriate.  Combining them all into a
      single project means that whole-project  _m_a_k_e  can  accom-
      plish this.

    +o A  module  may be shared between two projects because they
      must inter-operate.  Possibly your project is bigger  than
      your  current  directory structure implies.  The dependen-
      cies must be explicitly stated, and changes to the  module
      must  result  in  both  projects  being recompiled and re-
      linked as appropriate.  Combining them all into  a  single
      project means that whole-project _m_a_k_e can accomplish this.

    +o It is the normal case to omit the edges between your  pro-
      ject  and  the  operating  system or other installed third
      party tools.  So normal that they are ignored in the Make-
      files  in this paper, and they are ignored in the built-in
      rules of _m_a_k_e programs.
      Modules shared between your projects may fall into a simi-
      lar  category:  if  they change, you will deliberately re-
      build to include their changes, or quietly  include  their



    Peter Miller           30 August 2001                Page 25





    AUUGN'97                   Recursive Make Considered Harmful


      changes  whenever  the  next  build may happen.  In either
      case, you do not explicitly state  the  dependencies,  and
      whole-project _m_a_k_e does not apply.

    +o Re-use  may  be better served if the module were used as a
      template, and divergence between two projects is  seen  as
      normal.  Duplicating the module in each project allows the
      dependencies to be explicitly stated, but  requires  addi-
      tional  effort  if  maintenance  is required to the common
      portion.

    How to structure dependencies in a strong re-use environment
    thus  becomes  an  exercise in _r_i_s_k _m_a_n_a_g_e_m_e_n_t.  What is the
    danger that omitting chunks of the DAG will harm  your  pro-
    jects?   How  vital  is  it  to rebuild if a module changes?
    What are the consequences of _n_o_t  rebuilding  automatically?
    How  can  you tell when a rebuild is necessary if the depen-
    dencies are not explicitly  stated?   What  are  the  conse-
    quences of forgetting to rebuild?

    99..22..  RReettuurrnn OOnn IInnvveessttmmeenntt

    Some  of the techniques presented in this paper will improve
    the speed of your builds, even if you continue to use recur-
    sive  _m_a_k_e.  These are not the focus of this paper, merely a
    useful detour.

    The focus of this paper is that you will get  more  accurate
    builds  of your project if you use whole-project _m_a_k_e rather
    than recursive _m_a_k_e.

    +o The time for _m_a_k_e to work out that  nothing  needs  to  be
      done will not be more, and will often be less.

    +o The  size  and complexity of the total Makefile input will
      not be more, and will often be less.

    +o The total Makefile input is no less modular  than  in  the
      resursive case.

    +o The  difficulty  of  maintaining  the total Makefile input
      will not be more, and will often be less.

    The disadvantages of using whole-project _m_a_k_e over recursive
    _m_a_k_e are often un-measured.  How much time is spent figuring
    out why _m_a_k_e did something unexpected?   How  much  time  is
    spent  figuring out that _m_a_k_e ddiidd something unexpected?  How
    much time is spent tinkering with the build process?   These
    activities  are  often  thought of as ``normal'' development
    overheads.

    Building your project is a fundamental activity.  If  it  is
    performing  poorly,  so are development, debugging and test-
    ing.  Building your project needs to be so simple the newest



    Peter Miller           30 August 2001                Page 26





    AUUGN'97                   Recursive Make Considered Harmful


    recruit  can  do  it  immediately with only a single page of
    instructions.  Building your project needs to be  so  simple
    that it rarely needs any development effort at all.  Is your
    build process this simple?

    1100..  RReeffeerreenncceess


         ddeebboo8888:: Adam de Boor (1988).  _P_M_a_k_e _- _A _T_u_t_o_r_i_a_l.  Uni-
    versity of California, Berkeley

         ffeelldd7788:: Stuart I. Feldman (1978).  _M_a_k_e _- _A _P_r_o_g_r_a_m _f_o_r
    _M_a_i_n_t_a_i_n_i_n_g _C_o_m_p_u_t_e_r _P_r_o_g_r_a_m_s.  Bell Laboratories  Computing
    Science Technical Report 57

         ssttaall9933::  Richard M. Stallman and Roland McGrath (1993).
    _G_N_U _M_a_k_e_: _A _P_r_o_g_r_a_m _f_o_r _D_i_r_e_c_t_i_n_g _R_e_c_o_m_p_i_l_a_t_i_o_n.  Free Soft-
    ware Foundation, Inc.

         ttaallbb9911::  Steve  Talbott (1991).  _M_a_n_a_g_i_n_g _P_r_o_j_e_c_t_s _w_i_t_h
    _M_a_k_e_, _2_n_d _E_d.  O'Reilly & Associates, Inc.

    1111..  AAbboouutt tthhee AAuutthhoorr

    Peter Miller has worked for many years in the  software  R&D
    industry,  principally  on UNIX systems. In that time he has
    written tools such as Aegis (a software  configuration  man-
    agement  system)  and  Cook  (yet another _m_a_k_e-oid), both of
    which are freely available on the Internet.  Supporting  the
    use  of  these  tools  at  many  Internet sites provided the
    insights which led to this paper.

    Please visit  http://www.canb.auug.org.au/~millerp/  if  you
    would like to look at some of the author's free software.























    Peter Miller           30 August 2001                Page 27


