Jump To Content

LearnHub




Automated Builds Using Make

Introduction

Most languages require you to compile them before they can be run and too often the command to do that resembles something like:

gcc -c -Wall -ansi -I/pkg/chempak/include dat2csv.c

Having to remember, let along type that for every file in your program is both tedious and error prone. And we haven't even begun to introduce into file / module dependencies.

The best way to solve both the problems of tedium and error prevention is through the use of an automated build tool. This lesson will teach you how to automate your builds using Make. Make is a free build tool that is commonly associated with C and Unix, but it is available on all major platforms and can do much more than just compile and link C; as you will see throughout this lesson.

Lesson Example

As a running example for this lesson, we will be following Nigel who is studying organic fullerene production.
  • Automated laboratory equipment runs experiments in batches to create files like this:
Time: 1.2271
Concentration: 0.0050
Yield: 11.41
Time: 2.5094
Concentration: 0.0055
Yield: 11.20

Time: 3.7440
Concentration: 0.0060
Yield: 10.90
  • Currently there are only 20 - 30 of that .dat files being produced, but there may one day be many thousand
  • Nigel needs to transform his .dat files to a CSV (Comma Separated Values) format using a tool called dat2csv

Hello, Make

The Make program gets all of its information a Makefile. Here is the contents of Nigel's first Makefile, hello.mk.
hydroxyl_422.csv : hydroxyl_422.dat
dat2csv hydroxyl_422.dat > hydroxyl_422.csv

Note: The second line is indent with a tab, not 8 spaces. It's not pretty, and rather hard to debug if you forget, but that's just the way it is.
Nigel next runs the make -f hello.mk to run his build.
  • Make sees that the csv file is dependent on the dat file
  • Since the csv does not exist, make executes the dat2csv program with the given arguments
If Nigel runs it again, Make will notice that the csv is newer than the dat and will skip the file creation. This is an example of how Make can save you time by only dealing with things that is absolutely must.

Currently, Nigel's Makefile consists of a single rule. This, and every other Make rule has 3 distinct components:


  • A target
  • It's prerequisite
  • And an action -- this is what Make will do on your behalf in the rule

Targets

A Makefile with only a single rule is likely overkill in most cases. Most have more. Including Nigel's now since he has added a new one for converting files regarding methyl.
hydroxyl_422.csv : hydroxyl_422.dat
dat2csv hydroxyl_422.dat > hydroxyl_422.csv

methyl_422.csv : methyl_422.dat
dat2csv methyl_422.dat > methyl_422.csv
But when he runs make -f hello.mk, only the hydroxl rule is executed. That is because the first rule in a Makefile is the one that is executed by default. In order to convert the methyl files he needs to specify the rule; make -f hello.mk methyl_422.csv.

Having to specify each rule by hand does not seem to be very automated though. To start to really leverage to the power of make, Nigel needs a Phony Target. A phony target has prerequisites, but the target does not refer to an actual file on disk. This means that the target can never be newer than the prerequisites causing them to always execute.

Since Nigel wants to be able to convert 'all' his files at once, he creates the 'all' phony target.
all : hydroxyl_422.csv methyl_422.csv

hydroxyl_422.csv : hydroxyl_422.dat
dat2csv hydroxyl_422.dat > hydroxyl_422.csv

methyl_422.csv : methyl_422.dat
dat2csv methyl_422.dat > methyl_422.csv
makefile -f hello.mk all will now run both the hydroxyl_422.csv and methyl_422.csv rules.

Other than all common phony targets are:
  • clean - removes all files produced by the build; both final and temperary
  • configure - do any necessary environment setup such as creating output directories and setting variables
  • install - copy produced artifacts to another location

Dependencies

As your Makefile grows in complexity, it becomes more important to understand how Make determines the dependencies within the file. Nigel's dependency graph looks like this:

The all target depends on hydroxyl_422.csv as well as methyl_422.csv while both of those depend (must be newer than) their corresponding dat files.

Make starts at the top at the top of the graph and figures out both the direct and indirect dependencies. Once it has these figured out, it executes them from the bottom up. The order which rules are executed is not determined until runtime outside of needing to satisfy target prerequisites. Since the rules in Nigel's Makefile are completely independent Make might run hydroxyl_422.csv or methyl_422.csv first with exactly the same outcome as far as meeting all's dependencies.

Automatic Variables

Make provides 2 types of variables to the user: automatic (which we discuss now) and macros (which is sufficiently different enough that it gets its own section later).

Automatic variables come built into Make and have local scope to each rule. That is, a fresh value is determined in each rule and the same variable will have different values depending on where in the execution it is.

The automatic variables available to you in Make are:
  • $@ - The rule's target
  • $< - The rule's first prerequisite
  • $? - All the out of date prerequisites
  • $^ - All prerequisites
Make's Unix origin shows with these variable in that they are accessed beginning with a $ as shell variables are as well as the sacrifice for quick editing over clarity. Making use of automatic variables and a clean phony target to remove all the converted csv files, Nigel has
all : hydroxyl_422.csv methyl_422.csv

hydroxyl_422.csv : hydroxyl_422.dat
@dat2csv $< > $@

methyl_422.csv : methyl_422.dat
@dat2csv $< > $@

clean :
@rm -f *.csv

While he is at it he renames his file from hello.mk to Makefile which is the default name make expects for its files which means make all is what he now types to convert his files. Hurray for less typing!

Pattern Matching

Nigel's Makefile is getting increasingly powerful, but increasingly complex but it lets him convert his files by typing just two words in the console. But what if he added another compound to his experiment? hydroxyl_480.csv for example. He would have to edit the Makefile to create a new rule.

He can however avoid this by using Make's pattern matching capabilities. Based upon the fact that most project manipulate similar types of files in a similar manner, Make has a single wildcard, %, which matches the stem portion of a file.
all : hydroxyl_422.csv methyl_422.csv hydroxyl_480.csv

%.csv : %.dat
dat2csv $< > $@

clean :
rm -f *.csv
Here we see a target that will match all csv files, and which depends on all the dat files.

Somethings to note here are:
  • We have sacrificed another bit of clarity (individual rule) for power (a more powerful single rule)
  • Nigel still has to add new csv files to the all rule, but that is easier than writing a whole new rule
  • When using pattern matching, you must use automatic variables in the action. This is because Make does not know the actual name of either the target or prerequisite files until runtime.

More Dependencies

Now that Nigel can easily convert his results into csv format, he wants to take those individual files and summarize his data. To do that, he will use a program called summarize that he has developed to do this.
all : hydroxyl_all.csv methyl_all.csv

%_all.csv : %_422.csv %_480.csv
summarize $^ > $@

%.csv : %.dat
dat2csv $< > $@

clean :
@rm -f *.csv
Now when Nigel does a make all,
  • the %_all.csv rule is checked for any dependencies
  • the %.cvs rule is triggered to run the dat2csv file and create new csv files
  • the %_all.csv action runs the summarize command producing both hydroxyl_all.csv and methyl_all.csv
  • it then deletes all the csv files it created while producing the ones asked for (in this case hydroxyl_422.csv, methyl_422.csv and hydroxyl_480.csv)
If Nigel ran make clean, it would delete every csv file in the directory as there are no intermediate files.

Macros

A lesson hard learned by lots of people is that something repeated in two or more places will eventually be wrong in at least one. To avoid this problem you set this sort of information in a macro. A macro is the other type of variable in Make and unlike automatic variables, macros are user defined.
INPUT_DIR = /lab/gamma2100
OUTPUT_DIR = /tmp

all : ${OUTPUT_DIR}/hydroxyl_all.csv ${OUTPUT_DIR}/methyl_all.csv

${OUTPUT_DIR}/%_all.csv : ${OUTPUT_DIR}/%_422.csv ${OUTPUT_DIR}/%_480.csv
@summarize $^ > $@

${OUTPUT_DIR}/%.csv : ${INPUT_DIR}/%.dat
@dat2csv $< > $@

clean :
@rm -f *.csv
Now if Nigel changes where files are either read from or written to, he just needs to change a single line at the top of the file and the change propagates automatically throughout.

Like a lot of things in Make, the syntax belies Make's Unix heritage and there are some things you have to remember:
  • To set a value, you assign it with an =
  • To access a value, you reference the variable with a $ and put {} around it. If you forget the {}, you will be accessing $O followed by the characters UTPUT_DIR after it instead of ${OUTPUT_DIR}
  • By convention, macro names are in all uppercase

Passing Values to Make

It was a good thing that Nigel added the INPUT_DIR macro to his Makefile as he has been told his lab will be moving and during the transition he wants to be able to run his experiments. He could create a duplicate copy of his Makefile with the new location, or he could pass in the new location as an argument to make and once the transition is over update the Makefile with the new location. Not wanting to have multiple files that do the same thing with only minor variation around, he (wisely) chooses the latter.

To pass a value into make you the name=value pairs when calling make. The passed in value will override any default value set in the Makefile.

In order to use Nigel's new lab location, he would run his experiment with make INPUT_DIR=/newlab all. Alternatively, he could have set INPUT_DIR as shell environment variable. Environment variables are always available inside Makefiles and are referenced in the same way other macros are.

Functions

GNU Make includes a number of functions to automate common tasks. While these are not standard, GNU Make is by far the most widely used version of Make and will likely be available more often than not.

Back when Nigel added hydroxyl_480.dat to the files he wished to convert (see Pattern Matching) he still had to add it to the all target. By making use of functions he can add another hydroxyl compound without having to edit the Makefile at all.
INPUT_DIR = /lab/gamma2100
OUTPUT_DIR = /tmp
CHEMICALS = hydroxyl methyl
SUMMARIES = $(addprefix ${OUTPUT_DIR}/,$(addsuffix _all.csv,${CHEMICALS}))
all : ${SUMMARIES}

${OUTPUT_DIR}/%_all.csv : ${OUTPUT_DIR}/%_422.csv ${OUTPUT_DIR}/%_480.csv
@summarize $^ > $@

${OUTPUT_DIR}/%.csv : ${INPUT_DIR}/%.dat
@dat2csv $< > $@

clean :
@rm -f *.csv
The SUMMARIES macro uses the addprefix and addsuffix function to build a list of filenames which is used by the all target. In this case, hydroxyl becomes /tmp/hydroxyl_all.csv and methyl become /tmp/methyl_all.csv.

A complete list of functions can be found in the GNU Make Documentation, but here are the most commonly used ones
  • $(addprefix prefix,filenames) - Add a prefix to each filename in a list
  • $(addsuffix suffix,filenames) - Add a suffix to each filename in a list
  • $(dir filenames) - Extract the directory name portion of each filename in a list
  • $(filter pattern,text) - Keep words in text that match pattern
  • $(filter-out pattern,text) - Keep words in text that don't match pattern
  • $(patsubst pattern,replacement,text) - Replace everything that matches pattern in text
  • $(sort text) - Sort the words in text, removing duplicates
  • $(strip text) - Remove leading and trailing whitespace from text
  • $(subst from,to,text) - Replace from with to in text
  • $(wildcard pattern) - Create a list of filenames that match a pattern

Summary

Make is an extremely powerful tool, but has grown more and more complex since its invention in 1975. Despite this it is the core of many project's build system; there are newer alternatives out there though. In the Java community, ant, not Make is the defacto standard build system and even that is beginning to be replaced by maven. The .NET world tends to use a port of ant to their platform called nant and Ruby has rake.

Regardless of the system though, the principles remain the same
  • Automate repetitive tasks
  • Remove duplication within your Makefile, and
  • Build you Makefile incrementally from least to greatest complexity

Your Comment
Textile is Enabled (View Reference)