Introduction

Modus is a language for building Docker/OCI container images. Modus uses logic programming to express interactions among build parameters, specify complex build workflows, automatically parallelise and cache builds, help to reduce image size, and simplify maintenance. For more information, please follow the following links:

Modus uses semantic versioning; until version 1.0 is declared, breaking changes are possible. The current version, 0.1, is a preview release. We welcome bug reports and feature requests submitted through GitHub Issues.

Tutorial

Installation

Please follow the installation instructions on the website to download and install Modus.

Your First Modusfile

Modusfiles are our version of Dockerfiles. They are a collection of rules that specify how to build images. We now demonstrate how to use Modus to build a simple rust application.

my_app(profile) :-
  (
    from("rust:alpine")::set_workdir("/usr/src/app"),   # FROM rust:alpine; WORKDIR /usr/src/app
    copy(".", "."),                                     # COPY . .
    cargo_build(profile)                                # calling into another predicate
  )::set_entrypoint(f"./target/${profile}/my_app").     # ENTRYPOINT ["./target/release/my_app"]

cargo_build("debug") :- run("cargo build").             # RUN cargo build
cargo_build("release") :- run("cargo build --release"). # RUN cargo build --release

Assuming that you are familiar with Dockerfiles, the meaning of the above Modusfile should be mostly easy to guess. In particular:

  • The syntax is an extension of Datalog (which is itself a subset of Prolog), but you do not need to know either of those languages to write your own Modusfile.

  • Line-comments starts with #. In the above case, the equivalent instructions in Dockerfile have been written out for clarity using comments.

  • Modusfile consists of a series of rules of the form HEAD :- BODY., where HEAD is a single literal, and BODY is an expression involving other literals.

  • A literal has the form foo(arg1, arg2, ...) where foo is the name of the predicate, and arg1, arg2, etc. are arguments. Examples of literals in the above file are my_app("debug"), from("rust"). Literals can also have no parameters, in which case you omit the parenthesis, like my_app.

  • Expression uses , to denote logical "and", and ; to denote logical "or". Expressions can be nested with (), and can also have "::" operators that may change the behaviour of the modified expression in some way. A later section will discuss logic in Modus in more detail, but for now just think of a, b as "do a then b", and think of a; b as "do either of a or b, whichever works".

Note that instead of writing run("cargo build") directly in my_app, we used a custom rule cargo_build, which we defined later, and, when defining cargo_build, we have separate definition for when the argument is dev and when it is release. To make this clearer, consider the line

cargo_build("debug") :- run("cargo build").

What this means is that run("cargo build") logically implies cargo_build("debug"). Given this definition, whenever Modus sees cargo_build("debug"), Modus replaces it with run("cargo build").

The set_workdir operator takes in a path, and sets the working directory of its image operand. This changes subsequent resolution of relative paths, such as in the destination argument of copy. set_entrypoint simply overrides the entrypoint of an image.

To build a Modusfile, you just need to use the "modus build" command. The usage is fairly similar to docker build:

modus build [-f <Modusfile>] <CONTEXT> <GOAL>

CONTEXT is a directory containing any source file that you want to make available to Docker, just like the context directory in docker build. GOAL is a literal denoting what you want to build. You can use "-f <Modusfile>" to specify the Modusfile to build, and the default is Modusfile in the context directory.

In our case, we can use my_app("debug") as our goal in order to build a debug image. However, we can also specify unbounded variables in our goal. If we simply use my_app(X) as our goal, Modus will build two images in parallel for us, one being the debug image and the other being the release image. You can think of it as saying "For all X, as long as my_app(X) generates a valid image, build it". You can also go a step further and add parameters to select the rust channel, base distributions, etc. You can't specify a default for these parameters, but you can define versions of my_app that takes different numbers (including zero) of parameters, to simulate having a default. For example, by adding:

my_app :- my_app("release").

The goal my_app will now build the release version, while you can still use my_app("debug") to build the debug version.

The attentive reader will have noticed that our Modusfile builds both a debug and a release image. Consider how you would do this with Dockerfiles — you would either need two separate Dockerfiles, each building one version, or do something with build arguments. It may not be a problem if you only have debug and release images, but it quickly become hard to manage, especially if you need to take separate steps depending on some arguments.

Intermediate Build Stages

For our next step, we want to reduce the size of the final image by building the rust code in a separate stage, then starting a new image and copying the binary inside. This can be easily implemented in Modusfile as well. We will just need to add the following lines to our existing Modusfile, and use trimmed_app as our goal instead:

trimmed_app(profile) :-
  (
    from("alpine"),
    my_app(profile)::copy(f"target/${profile}/my_app", ".")
  )::set_entrypoint("./my_app").

image::copy(source, destination) is an operator that allows you to copy files from another image to the current one. The image here can actually be any expression generating an image, so you could also "inline" my_app and write something like:

trimmed_app(profile) :-
  (
    from("alpine"),
    (
      from("rust:alpine")::set_workdir("/usr/src/app"),
      copy(".", "."),
      cargo_build(profile)
    )::copy(f"target/${profile}/my_app", ".")
  )::set_entrypoint("./my_app").

Note that both source and destination can be relative paths. They will all be resolved sensibly based on respective the working directory.

Logic in Modus

Not all predicates has to be about building. Since Modus is based on a logic programming language, it goes without saying that you can write more complicated Modusfile, which can do things like figure out which version of compiler to use depeneding on constraints on parameters, or take additional steps for debug builds, etc. Here is a quick rundown of some Modus patterns:

  • Defining multiple rules for the same predicate with different parameters. We have already seen how this lets us "select" what cargo command to run.
  • Creating a dictionary by defining a set of constant rules for a predicate. For example:
target_cc_flags("debug", "-Og -fsanitize=address,undefined").
target_cc_flags("release", "-O3").
target_cc_flags("fuzz", "-Og -fsanitize=fuzzer,address,undefined -DFUZZING=1").

my_app(target) :-
  ...,
  target_cc_flags(target, flags),
  run(f"make CFLAGS='${flags}' CXXFLAGS='${flags}'"),
  ....

Note that rules with no body can be written as HEAD., and is always true.

  • Defining a predicate that "restricts" the set of input. This is necessary to make unbounded variables work. For example:
rust_channel(channel) :-
  channel = "stable"; channel = "nightly"; channel = "beta".

# or

rust_channel("stable").
rust_channel("nightly").
rust_channel("beta").

my_app(channel) :-
  from("rust:alpine"),
  rust_channel(channel),
  run(f"rustup run ${channel} cargo build").

Without rust_channel(channel), the goal my_app(X) would fail, because it is not possible to build an infinite set of images. With the predicate to limit the values of X, a query like my_app(X) will build 3 images, each with a different version of rust.

Where to go from here…

Now that you have learned the basics of Modus, you can go ahead and read the rest of the documentation, which dive deeper into how everything works exactly (groundness, predicate kinds, etc), as well as other built-in predicates and operators like number_{gt,lt,eq,geq,leq}, string parsing and manipulation, operators to set environment variables, temproarily changing the working directory with in_workdir, squashing image layers with ::merge, etc.

Grammar

A Modus build srcipt is a sequence of facts and rules. A fact is of the form

<fact> ::= <head> "."

A rule is of the form

<rule> ::= <head> ":-" <body> "."

The head of a fact or a rule is a <literal> of the form

<literal> ::= <identifier> "(" <arg_list> ")"
            | <identifier>

where <identifier> is a non-empty alpha-numeric string, not starting with a number, and possibly containing _; <arg_list> is a comma-separated list of terms.

A term is either a variable or a string literal. A variable is represented as an <identifier>; string literals are described in String Literals.

a("b", _c) and foo are examples of syntactically valid literals.

The body of a rule is an expression defined as follows:

<expression> ::= <literal>
               | <unification>
               | <expression> "::" <operator>
               | <expression> "," <expression>
               | <expression> ";" <expression>
               | "(" <expression> ")"
               | !<expression>

A unification asserts the equality of two terms (which could be variables):

<unification> ::= <term> "=" <term>
                | <term> "!=" <term>

Operators have the same grammar as literals.

The following are examples of syntactically valid expressions:

  • a("b", _c)
  • a = "foo"
  • a("b", _c)::b(_c)
  • a::b
  • a, b("c")
  • (a; b), c
  • !(a(X), b(X))
  • "1.0.1" = version

Full EBNF

<fact> ::= <head> "."
<rule> ::= <head> ":-" <body> "."
<head> ::= <literal>
<body> ::= <expression>
<literal> ::= <identifier> "(" <arg_list> ")"
            | <identifier>
<operator> ::= <literal>
<expression> ::= <literal>
               | <unification>
               | <expression> "::" <operator>
               | <expression> "," <expression>
               | <expression> ";" <expression>
               | "(" <expression> ")"
               | !<expression>
<unification> ::= <term> "=" <term>
                | <term> "!=" <term>

<arg_list> ::= <term> { "," <term> }
<term> ::= <variable> | '"' <char_string> '"' | <format_string_term>
<variable> ::= <identifier>
<format_string_term> ::= 'f"' { <char_string> | <interpolation> } '"'
<interpolation> ::= "${" <variable> "}"

<char_string> ::= { <char> }
<identifier> ::= ( <letter> | "_" ) [ <alphanumeric_with_underscore> ]
<alphanumeric_with_underscore> ::= { <letter> | <digit> | "_" }
<alphanumeric> ::= { <letter> | <digit> }
<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G"
       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
       | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
       | "c" | "d" | "e" | "f" | "g" | "h" | "i"
       | "j" | "k" | "l" | "m" | "n" | "o" | "p"
       | "q" | "r" | "s" | "t" | "u" | "v" | "w"
       | "x" | "y" | "z"
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

String Literals

There exist two types of string literals:

  • Normal strings
  • Formatted strings

Normal strings are sequences of characters surrounded with double quotes, in which " needs to be escaped and \ is used as the escape characted. Specifically, the following characters are escaped: \", \\, \n, \r, \t, \\, \0. To break a single-line string to multiple lines, use a string continuation, a newline character preceded with \. Below are examples of valid string literals:

  • "abc"
  • "a\nb"
  • "\"abc\""
  • "hello,\
     world"
    

Formatted strings are just like normal strings, but they starts with the letter f before the first quote symbol, and $ can be escaped. Below are examples of valid string literals:

  • f"abc"
  • f"10\$"
  • f"hello, ${name}"

Semantics

Given a Modusfile and a goal, Modus performs two operations:

  1. computes the build DAG using a Datalog solver;
  2. executes the build in parallel using BuildKit.

The build DAG is a graph that describes dependencies between build operations. The DAG is computed by a Datalog solver as a minimal proof of the build target from true facts representing either existing images, or intrinsics like run which add build steps. The build DAG is constructed statically, i.e. the values of the variables do not depend on the results of build execution.

After the build DAG is computed, it is transformed into a BuildKit representation. Then, BuildKit executes build subtrees in parallel to construct the target images.

Predicate Kinds

In a literal such as a(b, "c"), we refer to a as the predicate name. There exist predicates of three and only three kinds:

  • image predicates;
  • layer predicates;
  • logic predicates.

An image predicate is one of the following:

  • builtin predicate from;
  • a predicate, in all definitions <head> :- <body> of which, the body is an image expression.

An image expression is defined as follows:

  • an image literal, which is an application of an image predicate, is an image expression;
  • an application of an operator to an image expression (<expression> "::" <operator>) is an image expression, except for the ::copy operator;
  • <expression1> "," <expression2> is an image expression iff
    • either <expression1> is an image expression, and <expression2> is not an image expression.
    • or <expression1> is a logic expression and <expression2> is an image expression.
  • <expression1> ";" <expression2> is an image expression iff both <expression1> and <expression2> are image expressions.

In the following example, a is an image predicate, and b is not:

a :-
    from("ubuntu"),
    run("apt-get update").
    
b(X) :- X = "a".

A layer predicate is one of the following:

  • the builtin predicate run;
  • the builtin predicate copy;
  • a predicate, in all definitions <head> :- <body> of which, the body is a layer expression.

A layer expression is defined as follows:

  • a layer literal, which is an application of a layer predicate, is a layer expression;
  • an application of an operator to a layer expression (<expression> "::" <operator>) is a layer expression;
  • <expression1> "," <expression2> is a layer expressions iff either:
    • both <expression1> and <expression2> are layer expressions.
    • or <expression1> is a logic expression and <expression2> is a layer expression.
  • <expression1> ";" <expression2> is a layer expression iff both <expression1> and <expression2> are layer expressions.
  • an application of the operator ::copy to an image literal is a layer expression;

In the following example, a is a layer predicate, and b is not:

a :- (from("ubuntu"), run("apt-get update"))::copy("/etc/hosts", ".").
    
b(X) :- X = "a".

A logic predicate is one of the following:

  • a buildin predicate, except for from, run and copy;
  • a predicate, in all definitions <head> :- <body> of which, the body is a logic expression.

A logic expression is defined as follows:

  • a logic literal, which is an application of a logic predicate, is a logic expression;
  • a unification is a logic expression;
  • an application of an operator to a logic expression (<literal> "::" <operator>) is a logic expression.
  • <expression1> "," <expression2> and <expression1> ";" <expression2> are logic expressions iff both <expression1> and <expression2> are logic expressions.
  • the negation of a logic expression

In the following example, a and a_prime are logic predicates, but b is not:

a(Y) :- semver_geq(Y, "1.2.3").
a_prime(Y) :- !a(Y).
    
b(X) :- X = "a", from("ubuntu").

Negation

Expressions can be negated, which acts as a logical check. Here are some examples of expressions using negation:

  • !(a(X), b)
  • !(foo(X) ; bar(Y) ; !baz(X)), lit(X) Note that X is constrained with all of foo(X), the nested negation baz(X) and the positive literal lit(X).

A negated expression asserts that the positive expression fails to be proved, i.e. we cannot construct a proof tree of it. The variables of a negated expression must be grounded (e.g. through some other literal) or must be an anonymous variable.

This is an example of a Modusfile using negation:

app(X) :-
    (
        windows(X),
        from("jturolla/failing-container")
    ;
        !windows(X),
        from(X),
        run("echo hello-world")
    ).
    
windows("...").

Usually, negation can be thought of as falsehood, however, there is an important distinction between proving falsehood and Negation as Failure (NAF). If the fact windows(c1). is not present (for some constant c1), we say !windows(c1) succeeds. So we are not truly inferring !windows(c1).

Predicates

Predicate are used to encapsulate build logic. Predicates in Modus belong to one of the three kinds: image predicates, layer predicates and logic predicates. Image predicates create new images, e.g. the predicate from, as well as any predicates that adds further layers upon a from. Layer predicates adds new layers on tops of an image, and can not itself be built, e.g. run or copy. Logic predicates does neither, can be used anywhere, but can not be built on its own. The kind of a predicate determines where it can be used. For example, a rule body can not contain more than one image predicates, nor can layer predicates appear before image predicates. The kind of a user-defined predicate is inferred automatically, while the kind of a builtin predicate is pre-defined.

Builtin predicates have Prolog-like signatures that specify which parameters have to be initialised:

predicate(?Variable1, +Variable2, -Variable3)

where

  • ? means: This variable can be either instantiated or not. Both ways are possible.
  • + means: This variable is an input to the predicate. As such it must be instantiated.
  • - means: This variable is an output to the predicate. It is usually non-instantiated, but may be if you want to check for a specific "return value".

For example, run has the signature run(+cmdline), which means that the argument has to be initialised.

PredicateKindDescription
fromImageRefer to existing local/registry image by its name
runLayerExecute shell command
copyLayerCopy local file/directory
number_eqLogic= for numbers
number_gtLogic> for numbers
number_ltLogic< for numbers
number_geqLogic>= for numbers
number_leqLogic<= for numbers
string_concatLogicConcatenate strings
string_lengthLogicCompute string length
semver_exactLogicEquality with compatibility check for SemVer versions
semver_gtLogic> for SemVer versions
semver_ltLogic< for SemVer versions
semver_geqLogic>= for SemVer versions
semver_leqLogic<= for SemVer versions

Image

  • from(+imageref): Refer to existing local/registry image by its name.

    imageref follows standard Docker image reference syntax, so it can contain a registry, a tag, or it can be a digest.

    This is equivalent to FROM in Dockerfile.

Layer

  • run(+command): Run a shell command.

    command should be a string containing a shell script. At runtime, the script will be executed with /bin/sh -c.

    This is equivalent to RUN in Dockerfile.

  • copy(+src, +dst): Copy local file/directory.

    Existing files already in the image will be overwritten in the new layer, and directories will be copied recursively. Any non-existant parent directories will be created.

    This is equivalent to COPY in Dockerfile without a --from option. Following Docker's behavior, if src is a directory, the content of src is copied into dst, but not src itself. This means that copy("dir", "/") basically means cp -r dir/* /.

    If dst is a relative path, it is resolved based on the current image's working directory. src must be a relative path, and must not be outside the build context.

Number

  • number_OP(+a, +b): compare numbers, resolves if a OP b is true. Valid OPs are:
    • eq: a = b
    • gt: a > b
    • lt: a < b
    • geq: a >= b
    • leq: a <= b

String

  • string_concat(?a, ?b, ?res): Concatenate two strings.

    At least two arguments must be grouneded. Resolves if a + b = res. If one of the argument is ungrounded, this predicate constrains it so that the statement is true, by either concating the two strings or removing a prefix or suffix from res.

    f-strings act as a syntatic sugar for this predicate: X = f"${a}(content of b)", X = f"(content of a)${b} or X = f"${a}${b}" is equivalent to string_concat(a, b, X).

    Due to the limitation of our logic resolution algorithm, this predicate can not be part of a recursion. For example, a(X) :- string_concat("a", Y, X), a(Y) is not allowed.

  • string_length(+str, -len): Get the length of a string.

SemVer

  • semver_OP(+a, +b): compare semver versions, resolves if a OP b is true. Valid OPs are the same as those for number_OP, except there is no semver_eq.

  • semver_exact(+a, +b):  =I.J.K means exactly the version I.J.K; =I.J means equivalent to >=I.J.0, <I.(J+1).0; =I means equivalent to >=I.0.0, <(I+1).0.0.

Operators

Operators begin with ::; they can be applied to expressions to control the build process.

OperatorFromToDescription
::copyImageLayerCopy file/directory to current image
::set_envImageImageSet environment variable
::set_entrypointImageImageSet entrypoint
::set_cmdImageImageSet image "CMD"
::set_workdirImageImageSet current working directory
::set_userImageImageSet user
::append_pathImageImageAppend PATH variable
::in_workdirLayerLayerSpecify working directory for layer commands
::in_envLayerLayerSpecify environment variables for layer commands
::mergeLayerLayerMerge layers

Image

  • ::copy(+src, +dst): Copy file/directory from another image.

    Existing files already in the current image will be overwritten in the new layer, and directories will be copied recursively. Any non-existant parent directories will be created.

    This is equivalent to COPY --from=src_image in Dockerfile. Following Docker's behavior, if src is a directory, the content of src is copied into dst, but not src itself. This means that copy("dir", "/") is equivalent to cp -r dir/* /, except that hidden files are also copied.

    If src is a relative path, it is resolved based on the source image's working directory. If dst is a relative path, it is resolved based on the current image's working directory.

  • ::append_path(+path): append string to the PATH environment variable.

  • ::set_env(+key, +value), ::set_entrypoint(+str_or_array), ::set_cmd(+array), ::set_workdir(+dir), ::set_user(+username), ::set_label(+key, +value): Set image properties.

    ::set_workdir also allows specifying a relative path based on the input image's working directory. This will be resolved to an absolute path.

    ::set_entrypoint will clear any command the image has, to be consistent with the ENTRYPOINT command in Dockerfile.

    Example:

    app :-
      (
        from("alpine"),
        copy("./app", "/"),
        copy("./entrypoint.sh", "/")
      )::set_entrypoint("/entrypoint.sh")
       ::set_cmd(["start"])
       ::set_label("org.opencontainers.image.vendor", "Modus").
    

Layer

  • ::in_workdir(+dir): Temporary change the image's working directory, affecting any operations inside the operator.

  • ::in_env(+key, +value): Temporary set environment variable, affecting any operations inside the operator.

  • ::merge: Squash anything inside into one image layer.

    The only allowed operations inside are run, copy and ::copy. This can improve image size if operations overwrite each other.

Foundations

Container images are built on top of base images by copying information from other images and intacting with the environment. Modus uses decidable logic programming to concisely capture and efficiently resolve dependencies between images and to encapluate environmental interactions with predicates and operators. This section discusses the logic programming foundations of Modus. You do not need to understand these foundations to use, and profit from, Modus.

Design Principles

  • Maintainability: make build definitions maintainable by enabling modularity, code reuse and dependency resolution.
  • Efficiency: make builds efficient by automatic parallelisation and fine-grained caching; provide tools for optimising the image size.
  • Expressiveness: enable the user to express complex build workflows.
  • Simplicity: provide minimalistic syntax and well-defined, non-Turing-complete semantics for build definitions.

Container Image Build Model

A Docker/OSI container image consists of a set of layers combined using a union mount filesystem. To build an image, the user specifies the parent image and defines operations that are executed on top of the parent image to form new layers. The most common operations are copying local files into the container and executing shell commands.

Another important operation which enables multi-stage builds is copying files from another image. A multi-stage build can be visualised as the following graph. Each instruction in this graph creates a new filesystem layer. Instructions are specified using Dockerfile's notation: FROM defines the parent image, COPY copies local files, RUN executes shell command, and COPY --from copies files from another container.

Container Image Build Model

The key insight of Modus is that the resolution of dependencies between images in this build model maps to Horn clauses, logical formulas in the form \( u \leftarrow (p \wedge q\ \wedge ... \wedge\ t) \). Particularly, container images correspond to logical facts, build rules are logical rules that derive new facts from existing facts, and the build graph is a minimal proof of the fact representing the build target from the facts representing existing images.

Consider the following recursive build script:

a(mode) :-
    (
        mode = "production",
        from("alpine"),
        a("development")::copy("/app", "/app")
    ;
        mode = "development",
        from("gcc"),
        copy(".", "/app"),
        run("cd /app && make")
    )::set_workdir("/app").

For the goal a(X), where X is a variable, Modus computes the following build trees:

a("production")
╞══ from("alpine")
└── a("development")::copy("/app", "/app")
    ╞══ from("gcc")
    ├── copy(".", "/app")
    └── run("cd /app && make")
a("development")
╞══ from("gcc")
├── copy(".", "/app")
└── run("cd /app && make")

The subtree a("development") of the two trees is shared; it is only built once.

Datalog

Datalog is a decidable fragment of Horn clauses. A good overview of Datalog is given in the following article:

What You Always Wanted to Know About Datalog (And Never Dared to Ask)
Stefano Ceri, Georg Gottlob, Letizia Tanca
IEEE TKDE 1989

Modus uses Datalog because it is:

  • declarative, the success of solving does not depend on the ordering of clauses;
  • expressive;
  • decidable, so it can always generate a minimal proof.

Thus, Datalog presents a sweet spot between expressiveness and computatibility, which is important for a build system. Standard Datalog, despite its suitability for addressing their core dependency resolution problem, does not capture a container build environmental dependencies. To solve this problem, Modus extends Datalog with build-specific predicates and build parameters via ungrounded variables. Modus supports two Datalog extensions, namely builtin predicates described in Predicates and non-grounded variables. To realise these extensions, Modus uses a custom top-down Datalog solver for generating proofs.

Builtin Predicates

Dependency resolution is only part of the story. Container builds must also interact with the environment. Predicates elegantly capture these interactions. We have equipped Modus with a library of built-in predicates to handle these interactions. For example, the semver_* predicates define software version comparison as per the SemVer specification. For example, the following fact is true: semver_lt("1.0.3","1.1.0"), where lt means <.

The key difficulty of adding built-in predicates to Datalog is that built-in predicates such as semver_lt are infinite relations, which require special handling to retain Datalog's decidability. We support built-in predicates by deferring the evaluation of a built-in predicate until the arguments of this predicate are bound to constants.

Since not all arguments of a predicate have to be bound to a constant during evaluation, e.g. if an argument depends on the other arguments, builtin predicates in Modus have Prolog-like signatures that specify which parameters have to be initialised (see Predicates).

Non-Grounded Variables

Build systems often use parameters that are passed by the user when they launch a build. Dockerfiles allow users to specify arbitrary parameters when launching a target's build using the --build-arg option. For example, a user may want to parameretise app's image build with the parameter cflags to control complation flags:

app(cflags) :-
  from("gcc:latest"),
  copy(".", "."),
  run(f"gcc ${cflags} test.c -o test").

In this rule, the variable cflags is not grounded, as it only appears as an argument of a built-in predicate run that expects an instantiated variable. Thus, this is an invalid Datalog program, since it violates standard Datalog's safety conditions that do not permit non-grounded variables. A workaround of this problem is to define possible compilation flags using a dedicated predicate:

supported_flags("-g").
supported_flags("").

a(cflags) :-
    supported_flags(cflags),
    from("gcc:latest"),
    copy(".", "."),
    run(f"gcc ${cflags} test.c -o test").

However, GCC accepts a large number of flags, so it would impractical to list all accepted flags in the Modus program. Instead, it would be natural to allow users to use this program to build the goal app("-g"), since the argument of run can then be inferred from the goal.

To enable such usage scenarios, we relaxed the safety conditions by allowing user-defined predicates with non-grounded variables. At the same time, we introduced the new restriction that, during evaluation, we defer the evaluation of these predicates until all of their arguments are bound to constants. Doing so enabled us to support the usage scenario above without sacrificing Datalog's safety.

For user-defined predicates, variables which do not appear in the body have to be initialised before the rule can be applied. For example, for the script:

a(X) :-
  from("alpine"),
  run("echo Hello").

A goal like a("foo"), where "foo" is a arbitrary string constant, would build the same image, but the query a(X) is not allowed. This ensures that each resulting image is mapped to a constant literal like a("foo").

Proof Optimality

A Datalog fact can be inferred from other facts in multiple ways and therefore multiple proof trees may exists. Thus, an image defined in Modus may also be built using different build DAGs. Modus searches for an optimal proof, that is a proof with a minimal cost. The cost of a proof is the number of layers in the build DAG.

To illustrate proof optimality, consider again this build script:

a(mode) :-
    (
        mode = "production",
        from("alpine"),
        a("development")::copy("/app", "/app")
    ;
        mode = "development",
        from("gcc"),
        copy(".", "/app"),
        run("cd /app && make")
    )::set_workdir("/app").

For the goal a("production"), Modus will compute the following build trees:

a("production")
╞══ from("alpine")
└── a("development")::copy("/app", "/app")
    ╞══ from("gcc")
    ├── copy(".", "/app")
    └── run("cd /app && make")

However, if we add a new rule that uses a cached development image from a registry

a("development") :- from("myregistry.domain.com/app:1.1-dev").

then the result of the goal a("production") will become

a("production")
╞══ from("alpine")
└── a("development")::copy("/app", "/app")
    ╘══ from("myregistry.domain.com/app:1.1-dev")

because this tree involves fewer layer operations than the original one.

There may be situations when several minimal proofs of the same cost exist. In this case, Modus chooses one non-deterministically. Specifically, for a given program and a goal, Modus will always choose the same proof, but reordering rules may cause Modus to generate a different proof. To avoid non-determinism, one should avoid rules with uncontrolled choice:

a :- b; c.

Instead, this rule can be re-written with an auxiliary parameter to control the choice:

a("left") :- b.
a("right") :- c.

Modus produces a warning when the build graph construction is non-deterministic.

Command Line Tool

The following screencast demonstrates a simple but complete workflow of proving, building, tagging and running images. asciicast

modus build

Much like docker build, the modus build command builds a Modusfile.

Syntax: modus build [options] <context> <goal>

context is a directory containing all files (including the Modusfile) to send to Docker for building. .dockerignore can be used to exclude files in the same way as docker build.

goal is the target literal to build.

Options:

-f <modusfile>: Allows building something other than ./Modusfile.

--json or --json=<file>: Outputs the build results as JSON.

In the first case, the output are written to stdout. In the second case, the output are written to the file specified.

An example JSON output looks like this:

[
  {
    "predicate": "all",
    "args": [
      "frontend",
      "release"
    ],
    "digest": "sha256:595a66d80d1ef681f821e0ae92fd785b437ad2fbd5c564da093f588e37dfd65f"
  }
]

There will be one entry for every final image built, with the applied arguments for the predicate stored in the args array. This could be useful for scripting in CI routines. For example, the following command runs modus build, then tags the produced images with modus/ + the valuation of X:

modus build . 'all(X, "release")' --json | jq -r '.[] | .digest, ("modus/" + .args[0])' | xargs -L 2 docker tag

--no-cache: Ignore all existing build cache. This effectively forces a complete rebuild of the image. Existing base images used in from clauses are not re-downloaded. To update those, use docker pull.

-v or --verbose: Print all the build output. This is implemented by passing --progress=plain to docker build, which turns off the interactive progress report and instead print all the stages with numbering and their output.

Developer options: useful for debugging Modus itself.

--docker-flags=<flags>: Pass additional raw flags to docker build.

--custom-buildkit-frontend=<frontend>: Use a custom buildkit frontend. The default is ghcr.io/modus-continens/modus-buildkit-frontend: + commit hash of the Modus program (embedded in binary at build time).

--image-export-concurrency <integer>, --image-resolve-concurrency <number>: Controls the concurrency used during image export and image resolution respectively.

Image resolution is done at the beginning of the build, and may involves fetching metadata (and downloading the image if necessary) from the repository. Image export is the very last step of any build process, and is usually disk IO-bound, although delay may be introduced by docker build.

Defaults: --image-export-concurrency=8, --image-resolve-concurrency=3.

modus proof

modus proof prints out proof trees based on some Modus facts/rules, and a provided goal.

Syntax: modus proof [options] <context> <goal>

  • context is a directory that should contain the Modusfile that contains the facts/rules. This is chosen to match the interface of modus build.
  • goal specifies the goal to prove.

Options:

  • --compact: Changes the output of proof trees, omitting logical rule resolution.
  • -e, --explain: Prints out a structured 'explanation' of the steps taken to prove <goal> during SLD resolution. (See -g for a graphical version of this.) This may be verbose.
  • -f <modusfile>: Use the facts and rules of this Modusfile, instead of <context>/Modusfile.
  • -g, --graph: Outputs DOT source for a graph of the SLD tree. This can given to dot -Tpng to produce a PNG of the graph. Recommended over -e for larger Modusfiles.