Continuous Integration¶

GitLab CI¶

GitLab provides a convenient framework for running commands in response to Git pushes. We use it to test merge requests (MRs) before merging them (pre-merge testing), as well as post-merge testing, for everything that hits main (this is necessary because we still allow commits to be pushed outside of MRs, and even then the MR CI runs in the forked repository, which might have been modified and thus is unreliable).

The CI runs a number of tests, from trivial build-testing to complex GPU rendering:

Build testing for a number of configurations and platforms
Sanity checks (meson test)
Most drivers are also tested using several test suites, such as the Vulkan/GL/GLES conformance test suite, Piglit, and others.
Replay of application traces

A typical run takes between 20 and 30 minutes, although it can go up very quickly if the GitLab runners are overwhelmed, which happens sometimes. When it does happen, not much can be done besides waiting it out, or cancel it.

It is a good practice to check the Marge queue to evaluate if it is the right moment to trigger some testing jobs. Marge is configured to pick the MRs by assignment time, and this sort option is not available in the Web UI. The marge_queue CLI tool provides the list sorted by assignment time. The recommended way to manage the trigger of those jobs is using our crnm cli tool.

Due to limited resources, we currently do not run the CI automatically on every push; instead, we only run it automatically once the MR has been assigned to Marge, our merge bot.

If you’re interested in the details, the main configuration file is .gitlab-ci.yml, and it references a number of other files in .gitlab-ci/.

If the GitLab CI doesn’t seem to be running on your fork (or MRs, as they run in the context of your fork), you should check the “Settings” of your fork. Under “CI / CD” → “General pipelines”, make sure “Custom CI config path” is empty (or set to the default .gitlab-ci.yml), and that the “Project-based pipeline visibility” box is checked.

If a specific CI farm is failing for reasons unrelated to your changes, make an MR to disable the farm following the farm management instructions.

If you’re having other issues with the GitLab CI, your best bet is to ask about it on #freedesktop on OFTC and tag Daniel Stone (daniels on IRC) or Eric Engestrom (eric_engestrom on IRC).

The three GitLab CI systems currently integrated are:

Farm management¶

Note

Never mix disabling/re-enabling a farm with any change that can affect a job that runs in another farm!

When the farm starts failing for any reason (power, network, out-of-space), it needs to be disabled by pushing separate MR with

git mv .ci-farms{,-disabled}/$farm_name

Find the GitLab handle of the farm’s admin in .gitlab-ci/farm-rules.yml and ping them on the MR. MRs to disable farms do not need to go through review, and can be assigned to Marge directly.

After farm restore functionality can be enabled by pushing a new merge request, which contains

git mv .ci-farms{-disabled,}/$farm_name

Warning

Pushing (git push) directly to main is forbidden; this change must be sent as a Merge Request.

Application traces replay¶

The CI replays application traces with various drivers in two different jobs. The first job replays traces listed in src/<driver>/ci/traces-<driver>.yml files and if any of those traces fail the pipeline fails as well. The second job replays traces listed in src/<driver>/ci/restricted-traces-<driver>.yml and it is allowed to fail. This second job is only created when the pipeline is triggered by marge-bot or any other user that has been granted access to these traces.

A traces YAML file also includes a download-url pointing to a MinIO instance where to download the traces from. While the first job should always work with publicly accessible traces, the second job could point to an URL with restricted access.

Restricted traces are those that have been made available to Mesa developers without a license to redistribute at will, and thus should not be exposed to the public. Failing to access that URL would not prevent the pipeline to pass, therefore forks made by contributors without permissions to download non-redistributable traces can be merged without friction.

As an aside, only maintainers of such non-redistributable traces are responsible for ensuring that replays are successful, since other contributors would not be able to download and test them by themselves.

Those Mesa contributors that believe they could have permission to access such non-redistributable traces can request permission to Daniel Stone <daniels@collabora.com>.

gitlab.freedesktop.org accounts that are to be granted access to these traces will be added to the OPA policy for the MinIO repository as per https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/commit/a3cd632743019f68ac8a829267deb262d9670958 .

So the jobs are created in personal repositories, the name of the user’s account needs to be added to the rules attribute of the GitLab CI job that accesses the restricted accounts.

Running traces on a local machine

Intel CI¶

The Intel CI is not yet integrated into the GitLab CI. For now, special access must be manually given (file a issue in the Intel CI configuration repo if you think you or Mesa would benefit from you having access to the Intel CI). Results can be seen on mesa-ci.01.org if you are not an Intel employee, but if you are you can access a better interface on mesa-ci-results.jf.intel.com.

The Intel CI runs a much larger array of tests, on a number of generations of Intel hardware and on multiple platforms (X11, Wayland, DRM & Android), with the purpose of detecting regressions. Tests include Crucible, VK-GL-CTS, dEQP, Piglit, Skia, VkRunner, WebGL, and a few other tools. A typical run takes between 30 minutes and an hour.

If you’re having issues with the Intel CI, your best bet is to ask about it on #dri-devel on OFTC and tag Nico Cortes (ngcortes on IRC).

CI job user expectations¶

To make sure that testing of one vendor’s drivers doesn’t block unrelated work by other vendors, we require that a given driver’s test farm produces a spurious failure no more than once a week. If every driver had CI and failed once a week, we would be seeing someone’s code getting blocked on a spurious failure daily, which is an unacceptable cost to the project.

To ensure that, driver maintainers with CI enabled should watch the Flakes panel of the CI flakes dashboard, particularly the “Flake jobs” pane, to inspect jobs in their driver where the automatic retry of a failing job produced a success a second time. Additionally, most CI reports test-level flakes to an IRC channel, and flakes reported as NEW are not expected and could cause spurious failures in jobs. Please track the NEW reports in jobs and add them as appropriate to the -flakes.txt file for your driver.

Additionally, the test farm needs to be able to provide a short enough turnaround time that we can get our MRs through marge-bot without the pipeline backing up. As a result, we require that the test farm be able to handle a whole pipeline’s worth of jobs in less than 15 minutes (to compare, the build stage is about 10 minutes). Given boot times and intermittent network delays, this generally means that the test runtime as reported by deqp-runner should be kept to 10 minutes.

If a test farm is short the HW to provide these guarantees, consider dropping tests to reduce runtime. dEQP job logs print the slowest tests at the end of the run, and Piglit logs the runtime of tests in the results.json.bz2 in the artifacts. Or, you can add the following to your job to only run some fraction (in this case, 1/10th) of the dEQP tests.

variables:
   DEQP_FRACTION: 10

to just run 1/10th of the test list.

For Collabora’s LAVA farm, the device types page can tell you how many boards of a specific tag are currently available by adding the “Idle” and “Busy” columns. For bare-metal, a gitlab admin can look at the runners page. A pipeline should probably not create more jobs for a board type than there are boards, unless you clearly have some short-runtime jobs.

If a HW CI farm goes offline (network dies and all CI pipelines end up stalled) or its runners are consistently spuriously failing (disk full?), and the maintainer is not immediately available to fix the issue, please push through an MR disabling that farm’s jobs according to the Farm Management instructions.

Personal runners¶

Mesa’s CI is currently run primarily on packet.net’s m1xlarge nodes (2.2Ghz Sandy Bridge), with each job getting 8 cores allocated. You can speed up your personal CI builds (and marge-bot merges) by using a faster personal machine as a runner. You can find the gitlab-runner package in Debian, or use GitLab’s own builds.

To do so, follow GitLab’s instructions to register your personal GitLab runner in your Mesa fork. Then, tell Mesa how many jobs it should serve (concurrent=) and how many cores those jobs should use (FDO_CI_CONCURRENT=) by editing these lines in /etc/gitlab-runner/config.toml, for example:

concurrent = 2

[[runners]]
  environment = ["FDO_CI_CONCURRENT=16"]

Docker caching¶

The CI system uses Docker images extensively to cache infrequently-updated build content like the CTS. The freedesktop.org CI templates help us manage the building of the images to reduce how frequently rebuilds happen, and trim down the images (stripping out manpages, cleaning the apt cache, and other such common pitfalls of building Docker images).

When running a container job, the templates will look for an existing build of that image in the container registry under MESA_IMAGE_TAG. If it’s found it will be reused, and if not, the associated .gitlab-ci/containers/<jobname>.sh will be run to build it. So, when developing any change to container build scripts, you need to update the associated MESA_IMAGE_TAG to a new unique string. We recommend using the current date plus some string related to your branch (so that if you rebase on someone else’s container update from the same day, you will get a Git conflict instead of silently reusing their container)

When developing a given change to your Docker image, you would have to bump the tag on each git commit --amend to your development branch, which can get tedious. Instead, you can navigate to the container registry for your repository and delete the tag to force a rebuild. When your code is eventually merged to main, a full image rebuild will occur again (forks inherit images from the main repo, but MRs don’t propagate images from the fork into the main repo’s registry).

Building locally using CI docker images¶

It can be frustrating to debug build failures on an environment you don’t personally have. If you’re experiencing this with the CI builds, you can use Docker to use their build environment locally. Go to your job log, and at the top you’ll see a line like:

Pulling docker image registry.freedesktop.org/anholt/mesa/debian/android_build:2020-09-11

We’ll use a volume mount to make our current Mesa tree be what the Docker container uses, so they’ll share everything (their build will go in _build, according to meson-build.sh). We’re going to be using the image non-interactively so we use run --rm $IMAGE command instead of run -it $IMAGE bash (which you may also find useful for debug). Extract your build setup variables from .gitlab-ci.yml and run the CI meson build script:

IMAGE=registry.freedesktop.org/anholt/mesa/debian/android_build:2020-09-11
sudo docker pull $IMAGE
sudo docker run --rm -v `pwd`:/mesa -w /mesa $IMAGE env PKG_CONFIG_PATH=/usr/local/lib/aarch64-linux-android/pkgconfig/:/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android/pkgconfig/ GALLIUM_DRIVERS=freedreno UNWIND=disabled EXTRA_OPTION="-D android-stub=true -D llvm=disabled" DRI_LOADERS="-D glx=disabled -D gbm=disabled -D egl=enabled -D platforms=android" CROSS=aarch64-linux-android ./.gitlab-ci/meson-build.sh

All you have left over from the build is its output, and a _build directory. You can hack on mesa and iterate testing the build with:

sudo docker run --rm -v `pwd`:/mesa $IMAGE meson compile -C /mesa/_build

Running specific CI jobs¶

You can use bin/ci/ci_run_n_monitor.py to run specific CI jobs. It will automatically take care of running all the jobs yours depends on, and cancel the rest to avoid wasting resources.

See bin/ci/ci_run_n_monitor.py --help for all the options.

Target jobs

The --target argument takes a regex that you can use to select the jobs names you want to run, e.g. --target 'zink.*' will run all the Zink jobs, leaving the other drivers’ jobs free for others to use.

Note that in fork pipelines, GitLab only adds the jobs for the files that have changed since the last push, so you might not get the jobs you expect. You can work around that by adding a dummy change in a file core to what you’re working on and then making a new push with that change, and removing that change before you create the MR.

GitLab token

The --token argument is used to provide a GitLab token with rights to interact with the pipeline. Using the argument, one can provide the value or the name of the file having the value. If the argument is not provided, then it checks if the value of $XDG_CONFIG_HOME has a valid directory (if not, then uses $HOME/.config), and there is a file called gitlab-token that contains a token. The token required to work with this tool needs api scope permissions.

Note

To create that token, refer to create-a-personal-access-token and select the api scope. The token will only be shown once after creation, so make sure you store it securely.

Marge queue¶

You can use bin/ci/marge_queue.sh to check how long the Marge queue is. As mentioned, the merge flow is to assign MR to the Marge bot, to serialize the verification and merge. Looking at the merge requests assigned to Marge you can evaluate the size of the queue, since the marge_queue tool provides sorted and summarized information about those MR in queue.

The tool requires a GitLab token as described in the crnm section. It outputs the current queue sorted by the assigned at to Marge. It can also be used as an active wait for another action in a pipe, using the --wait until the queue is empty. The return code corresponds to the number of MRs in the queue, so when it returns 0, one can, for example, start the crnm tool on a certain pipeline.

Conformance Tests¶

Some conformance tests require a special treatment to be maintained on GitLab CI. This section lists their documentation pages.

SkQP

Updating GitLab CI Linux Kernel¶

GitLab CI usually runs a bleeding-edge kernel. The following documentation has instructions on how to uprev Linux Kernel in the GitLab CI ecosystem.

Upreving Linux Kernel

Structured tagging¶

Some build scripts can be tagged with a deterministic tag to allow for testing and validation of the build output. This section lists the documentation pages for the structured tagging feature.

Structured Tagging for CI Builds

Reusing CI scripts for other projects¶

The CI scripts in .gitlab-ci/ can be reused for other projects, to facilitate reuse of the infrastructure, our scripts can be used as tools to create containers and run tests on the available farms.

EXTRA_LOCAL_PACKAGES¶: Define extra Debian packages to be installed in the container.