YAML tip: Using anchors for shared steps & jobs
<p><a href="https://www.linkedin.com/in/sheridanrawlins/">Sheridan Rawlins</a>, Architect, <a href="https://www.verizonmedia.com">Verizon Media</a></p>
<h1>Overview</h1>
<p>Occasionally, a pipeline needs several similar but different jobs. When these jobs are specific to a single pipeline, it would not make much sense to create a <a href="https://docs.screwdriver.cd/user-guide/templates">Screwdriver template</a>. In order to reduce copy/paste issues and facilitate sharing jobs and steps in a single YAML, the tips shared in this post will hopefully be as helpful to you as they were to us.</p>
<p>Below is a condensed example showcasing some techniques or patterns that can be used for sharing steps.</p>
<h2>Example of desired use</h2>
<pre><code>jobs:
deploy-prod:
template: sd/noop
deploy-location1:
<<: *deploy-job
requires: [deploy-prod]
environment:
LOCATION: location1
deploy-location2:
<<: *deploy-job
requires: [deploy-prod]
environment:
LOCATION: location2
</code></pre>
<p>Complete working example at the end of this post.</p>
<p><!-- more --></p>
<h2>Defining shared steps</h2>
<h3>What is a step?</h3>
<p>First, let us define a step.</p>
<p>Steps of a job look something like the following, and each step is an array element with an object with only one key and corresponding value. The key is the step name and the value is the cmd to be run. More details can be found in the <a href="https://docs.screwdriver.cd/user-guide/configuration/jobconfiguration#steps">SD Guide</a>.</p>
<pre><code>jobs:
job1:
steps:
- step1: echo "do step 1"
- step2: echo "do step 2"
</code></pre>
<h3>What are anchors and aliases?</h3>
<p>Second, let me describe <a href="https://yaml.org/spec/1.2/spec.html#id2765878">YAML anchors and aliases</a>. An <em>anchor</em> may only be placed between an object key and its value. An <em>alias</em> may be used to copy or merge the anchor value.</p>
<h3>Recommendation for defining shared steps and jobs</h3>
<p>While an anchor can be defined anywhere in a yaml, defining shared things in the <a href="https://docs.screwdriver.cd/user-guide/configuration/jobconfiguration.html#shared">shared</a> section makes intuitive sense. As <code>annotations</code> can contain freeform objects in addition to <a href="https://docs.screwdriver.cd/user-guide/configuration/annotations">documented ones</a>, we recommend defining annotations in the “shared” section.</p>
<p>Now, I’ll show an example and explain the details of how it works:</p>
<pre><code>shared:
environment:
ANOTHER_ARG: another_arg_value
annotations:
steps:
- .: &set-dryrun
set-dryrun: |
DRYRUN=false
if [[ -n $SD_PULL_REQUEST ]]; then
DRYRUN=true
fi
- .: &deploy
deploy: |
CMD=(
my big deploy tool
--dry-run="${DRYRUN:?}"
--location "${LOCATION:?}"
--another-arg "${ANOTHER_ARG:?}"
)
"${CMD[@]}"
</code></pre>
<h3>Explanation of how the step anchor declaration patterns work:</h3>
<p>In order to reduce redundancy, annotations allow users to define one shared configuration with an “alias” that can be referenced multiple times, such as <code>*some-step</code> in the following example, used by job1 and job2.</p>
<pre><code>jobs:
job1:
steps:
- *some-step
job2:
steps:
- *some-step
</code></pre>
<p>To use the alias, the anchor <code>&some-step</code> must result in an object with single key (also <code>some-step</code>) and value which is the shell code to execute.</p>
<p>Because an anchor can only be declared between a key and a value, we use an array with a single object with single key <code>.</code> (short to type). The array allows us to use <code>.</code> again without conflict - if it were in an object, we might need to repeat the some-step three times such as:</p>
<pre><code># Anti-pattern: do not use as it is too redundant.
some-step: &some-step
some-step: |
# shell instructions
</code></pre>
<p>The following is an example of a reasonably short pattern that can be used to define the steps with only redundancy being the anchor name and the step name:</p>
<pre><code>shared:
annotations:
steps:
- .: &some-step
some-step: |
echo "do some step"
</code></pre>
<p>When using <code>*some-step</code>, you alias to the anchor which is an object with single key <code>some-step</code> and value of <code>echo "do some step"</code> which is exactly what you want/need.</p>
<h3>FAQ</h3>
<h4>Why the <code>|</code> character after <code>some-step:</code>?</h4>
<p>While you could just write <code>some-step: echo "do some step"</code>, I prefer to use the <code>|</code> notation for describing shell code because it allows you to do multiline shell scripting. Even for one-liners, you don’t have to reason about the escape rules - as long as the commands are indented properly, they will be passed to the <a href="https://docs.screwdriver.cd/user-guide/environment-variables"><code>$USER_SHELL_BIN</code></a> correctly, allowing your shell to deal with escaping naturally.</p>
<pre><code>set-dryrun: |
DRYRUN=false
if [[ -n $SD_PULL_REQUEST ]]; then
DRYRUN=true
fi
</code></pre>
<h4>Why that syntax for environment variables?</h4>
<ol><li>By using environment variables for shared steps, it allows the variables to be altered by the specific jobs that invoke them.</li>
<li>The syntax <code>"${VARIABLE:?}"</code> is useful for a step that needs a value - it will cause an error if the variable is undefined or empty.</li>
</ol><h4>Why split CMD into array assignment and invocation?</h4>
<p>The style of defining an array and then invoking it helps readability by putting each logical flag on its own line.
It can be digested by a human very easily and also copy/pasted to other commands or deleted with ease as a single line.
Assigning to an array allows multiple lines as bash will not complete the statement until the closing parenthesis.</p>
<h4>Why does one flag have –flag=value and another have –flag value</h4>
<p>Most CLI parsers treat boolean flags as a flag without an expected value - omission of the flag is false, existence is true. However, many CLI parsers also accept the <code>--flag=value</code> syntax for boolean flags and, in my opinion, it is far easier to debug and reason about a variable (such as <code>false</code>) than to know that the flag exists and is false when not provided.</p>
<h2>Defining shared jobs</h2>
<h3>What is a job?</h3>
<p>A job in screwdriver is an object with many fields described in the <a href="https://docs.screwdriver.cd/user-guide/configuration/jobconfiguration">SD Guide</a></p>
<h3>Job anchor declaration patterns</h3>
<p>To use a shared job effectively, it is helpful to use a feature of YAML that is documented outside of the <a href="https://yaml.org/spec/1.2/spec.html">YAML 1.2 Spec</a> called <a href="https://yaml.org/type/merge.html">Merge Key</a>.</p>
<p>The syntax <code><<: *some-object-anchor</code> lets you merge in keys of an anchor that has an object as its value into another object and then add or override keys as necessary.</p>
<h3>Recommendation for defining shared jobs</h3>
<pre><code>shared:
annotations:
jobs:
deploy-job: &deploy-job
image: the-deploy-image
steps:
- *set-dryrun
- *deploy
</code></pre>
<p>If you browse back to the previous <em>example of desired use</em> (also copied here), you can see use of the <code><<: *deploy-job</code> to start with the deploy-job keys/values, and then add <code>requires</code> and <code>environment</code> overrides to customize the concrete instances of the deploy job.</p>
<pre><code>jobs:
deploy-prod:
template: sd/noop
deploy-location1:
<<: *deploy-job
requires: [deploy-prod]
environment:
LOCATION: location1
deploy-location2:
<<: *deploy-job
requires: [deploy-prod]
environment:
LOCATION: location2
</code></pre>
<h3>FAQ</h3>
<h4>Why is environment put in the shared section and not included with the shared job?</h4>
<p>The answer to that is quite subtle. The Merge Key merges top level keys; if you were to put defaults in a shared job, overriding <code>environment:</code> would end up clobbering all of the provided values.
However, Screwdriver follows up the YAML parsing phase with its own logic to merge things from the shared section at the appropriate depth.</p>
<h4>Why not just use shared.steps?</h4>
<p>As noted above, Screwdriver does additional work to merge annotations, environment, and steps into each job after the YAML parsing phase. The logic for steps goes like this:</p>
<ol><li>If a job has NO steps key, then it inherits <em>ALL</em> shared steps.</li>
<li>If a job has at least one step, then only matching wrapping steps (steps starting with <code>pre</code> or <code>post</code>) are copied in at the right place (before or after steps that the job provides matching the remainder of the step name after <code>pre</code> or <code>post</code>).</li>
</ol><p>While the above pattern might be useful for some pipelines, complex pipelines typically have a few job types and may want to share some but not all steps.</p>
<h2>Complete Example</h2>
<p>Copy paste the following into <a href="https://cd.screwdriver.cd/validator">validator</a></p>
<pre><code>shared:
environment:
ANOTHER_ARG: another_arg_value
annotations:
steps:
- .: &set-dryrun
set-dryrun: |
DRYRUN=false
if [[ -n $SD_PULL_REQUEST ]]; then
DRYRUN=true
fi
- .: &deploy
deploy: |
CMD=(
my big deploy tool
--dry-run="${DRYRUN:?}"
--location "${LOCATION:?}"
--another-arg "${ANOTHER_ARG:?}"
)
"${CMD[@]}"
jobs:
deploy-job: &deploy-job
image: the-deploy-image
steps:
- *set-dryrun
- *deploy
jobs:
deploy-prod:
template: sd/noop
deploy-location1:
<<: *deploy-job
requires: [deploy-prod]
environment:
LOCATION: location1
deploy-location2:
<<: *deploy-job
requires: [deploy-prod]
environment:
LOCATION: location2
</code></pre>