Trading Fish The blog of Hector Castro

About / Archive / Talks / Feed

Creating Go Application Releases with GoReleaser

A few weeks ago, I set out to upgrade the version of Go (1.6 to 1.15) used to build an old command-line utility I developed, named Heimdall. Heimdall provides a way to wrap an executable program inside of an exclusive lock provided by a central PostgreSQL instance via pg_try_advisory_lock.

Now, Heimdall is nice little utility and all (if you’re intrigued, check out the README), but the most interesting part of the upgrade process came after I got everything working and started to think about how to create a new release. That’s when I came across GoReleaser.

GoReleaser

GoReleaser is a release automation tool specifically for Go projects. With a few bits of YAML configuration, GoReleaser provided me with:

  • Hooks into the Go module system for managing library dependencies
  • The ability to easily produce a set of build artifacts for multiple operating systems and computer architectures
  • Checksums for each of the build artifacts
  • Easy integration with GitHub Actions to automate publishing releases on tagged commits

If you are responsible for Go applications that are in need of a uniform release process, I find it really hard to beat GoReleaser.

Validating Data in Python with Cerberus

This year was my first participating in Advent of Code—and I’m glad I did, because solving one of the challenges exposed me to an excellent data validation library for Python named Cerberus.

What’s in a valid passport

Below are some excerpts from the challenge, along with specific field level validation rules:

You arrive at the airport only to realize that you grabbed your North Pole Credentials instead of your passport. While these documents are extremely similar, North Pole Credentials aren’t issued by a country and therefore aren’t actually valid documentation for travel in most of the world.

It seems like you’re not the only one having problems, though; a very long line has formed for the automatic passport scanners, and the delay could upset your travel itinerary.

The line is moving more quickly now, but you overhear airport security talking about how passports with invalid data are getting through. Better add some data validation, quick!

You can continue to ignore the cid field, but each other field has strict rules about what values are valid for automatic validation:

  • byr (Birth Year) - four digits; at least 1920 and at most 2002.
  • iyr (Issue Year) - four digits; at least 2010 and at most 2020.
  • eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
  • hgt (Height) - a number followed by either cm or in:
    • If cm, the number must be at least 150 and at most 193.
    • If in, the number must be at least 59 and at most 76.
  • hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
  • ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
  • pid (Passport ID) - a nine-digit number, including leading zeroes.
  • cid (Country ID) - ignored, missing or not.

Your job is to count the passports where all required fields are both present and valid according to the above rules.

For completeness, here are some invalid passports (delimited by \n\n):

eyr:1972 cid:100
hcl:#18171d ecl:amb hgt:170 pid:186cm iyr:2018 byr:1926

iyr:2019
hcl:#602927 eyr:1967 hgt:170cm
ecl:grn pid:012533040 byr:1946

hcl:dab227 iyr:2012
ecl:brn hgt:182cm pid:021572410 eyr:2020 byr:1992 cid:277

And, some valid passports:

pid:087499704 hgt:74in ecl:grn iyr:2012 eyr:2030 byr:1980
hcl:#623a2f

eyr:2029 ecl:blu cid:129 byr:1989
iyr:2014 pid:896056539 hcl:#a97842 hgt:165cm

hcl:#888785
hgt:164cm byr:2001 iyr:2015 cid:88
pid:545766238 ecl:hzl
eyr:2022

Most of the validation rules look straightforward in isolation, but less so when you think about composing them all together.

Validating passports with Cerberus

Step one involved getting familiar with Cerberus validation rules. The library supports rules like the following:

  • contains - This rule validates that the a container object contains all of the defined items.
>>> document = {"states": ["peace", "love", "inity"]}

>>> schema = {"states": {"contains": "peace"}}
>>> v.validate(document, schema)
True
  • regex - The validation will fail if the field’s value does not match the provided regular expression.
>>> schema = {
...     "email": {
...        "type": "string",
...        "regex": "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
...     }
... }
>>> document = {"email": "john@example.com"}
>>> v.validate(document, schema)
True
  • required - If True the field is mandatory. Validation will fail when it is missing.
>>> v.schema = {"name": {"required": True, "type": "string"}, "age": {"type": "integer"}}
>>> document = {"age": 10}
>>> v.validate(document)
False

Step two involved converting the passports into Cerberus documents. This was mostly an exercise in parsing uniquely assembled text into Python dictionaries.

# Split the batch file records by double newline.
for record in batch_file.read().split("\n\n"):
    # Split the fields within a record by a space or newline.
    record_field_list = [
        tuple(field.split(":")) for field in re.compile(r"\s").split(record.strip())
    ]

That leaves record_field_list looking like:

>>> record_field_list
[('ecl', 'gry'),
 ('pid', '860033327'),
 ('eyr', '2020'),
 ('hcl', '#fffffd'),
 ('byr', '1937'),
 ('iyr', '2017'),
 ('cid', '147'),
 ('hgt', '183cm')]

From there, dict converts the list of tuples into a proper Cerberus document:

>>> document = dict(record_field_list)
>>> document
{'byr': '1937',
 'cid': '147',
 'ecl': 'gry',
 'eyr': '2020',
 'hcl': '#fffffd',
 'hgt': '183cm',
 'iyr': '2017',
 'pid': '860033327'}

Putting it all together

Equipped with a better understanding of what’s possible with Cerberus, and a list of Python dictionaries representing passports, below is the schema I put together to enforce the passport validation rules of the challenge. Only one of the rules (hgt) required a custom function (compare_hgt_with_units).

SCHEMA = {
    "byr": {"min": "1920", "max": "2002"},
    "iyr": {"min": "2010", "max": "2020"},
    "eyr": {"min": "2020", "max": "2030"},
    "hgt": {
        "anyof": [
            {"allof": [{"regex": "[0-9]+cm"}, {"check_with": compare_hgt_with_units}]},
            {"allof": [{"regex": "[0-9]+in"}, {"check_with": compare_hgt_with_units}]},
        ]
    },
    "hcl": {"regex": "#[0-9a-f]{6}"},
    "ecl": {"allowed": ["amb", "blu", "brn", "gry", "grn", "hzl", "oth"]},
    "pid": {"regex": "[0-9]{9}"},
    "cid": {"required": False},
}

# Provide a custom field validation function for a height with units.
def compare_hgt_with_units(field: str, value: str, error: Callable[..., str]) -> None:
    if value.endswith("cm"):
        if not (150 <= int(value.rstrip("cm")) <= 193):
            error(field, "out of range")
    elif value.endswith("in"):
        if not (59 <= int(value.rstrip("in")) <= 76):
            error(field, "out of range")
    else:
        error(field, "missing units")

With a schema in place, all that’s left to do is instantiate a Validator and validate each document:

>>> v = Validator(SCHEMA, require_all=True)
>>> v.validate(document)
True

Thanks, Cerberus!

Centralized Scala Steward with GitHub Actions

Keeping project dependencies up-to-date is a challenging problem. Services like GitHub’s automated dependency updating system, Dependabot, go a long way to help make things easier, but that is only helpful if your package manager’s ecosystem is supported. In the case of Scala based projects, it is not.

Enter Scala Steward.

Scala Steward provides a similar, low-effort way to keep project dependencies up-to-date. You simply open a pull request against the Scala Steward repository and add a reference to your project’s GitHub repository inside of a specially designated Markdown file. After that, Scala Steward (which manifests itself as a robot user on GitHub) keeps your project dependencies up-to-date via pull requests.

Unfortunately, this easy-mode option requires that your repository be publicly accessible. There are options for running Scala Steward as a service for yourself, but that path is less trodden and requires a bit more effort.

Scala Steward and GitHub Actions

So what other options do you have if your Scala project is inside a private repository? Well, if your project is on GitHub, then you likely have access to their workflow automation service, GitHub Actions. Scala Steward’s maintainers created a GitHub Action that lowers the bar to adding Scala Steward support to projects via the GitHub Actions execution model.

By default, the Action supports dependency detection through a workflow defined inside of your project’s repository. This approach makes it easy to simulate the public instance of Scala Steward on a per repository basis. But, there is also a centralized mode that allows you to mimic the way the centrally managed instance of Scala Steward works.

This centralized mode gives us an opportunity to have the best of both worlds: a low-effort way to keep multiple project dependencies up-to-date (similar to the public instance of Scala Steward), and the ability to do so across both public and private repositories!

Putting things together

First, create a GitHub repository for your instance of Scala Steward and put a file in it at .github/workflows/scala-steward.yml with the following contents:

name: Scala Steward

on:
  schedule:
    # Schedule to run every Sunday @ 12PM UTC. Replace this with
    # whatever seems appropriate to you.
    - cron: "0 0 * * 0"
  # Provide support for manually triggering the workflow via GitHub.
  workflow_dispatch:

jobs:
  scala-steward:
    name: scala-steward
    runs-on: ubuntu-latest
    steps:
      # This is necessary to ensure that the most up-to-date version of
      # REPOSITORIES.md is used.
      - uses: actions/checkout@v2

      - name: Execute Scala Steward
        uses: scala-steward-org/scala-steward-action@vX.Y.Z
        with:
          # A GitHub personal access token tied to a user that will create
          # pull requests against your projects to update dependencies. More
          # on this under the YAML snippet.
          github-token: ${{ secrets.SCALA_STEWARD_GITHUB_TOKEN }}
          # A Markdown file with a literal Markdown list of repositories
          # Scala Steward should monitor.
          repos-file: REPOSITORIES.md
          author-email: scala-steward@users.noreply.github.com
          author-name: Scala Steward

Hopefully, the inline comments help minimize any ambiguity in the GitHub Actions workflow configuration file. For completeness, below is an example of the Markdown file as well:

- organization/repository1
- organization/repository2
- organization/repository3

The last step is to ensure that any private repositories add the user associated with the GitHub personal access token as a collaborator with the Write role permissions. Also, to slightly improve usability and maintainability, consider the following suggestions:

  • Add Dependabot support to your Scala Steward repository to keep the Scala Steward GitHub Action up-to-date.
  • Avoid tying Scala Steward to an individual user GitHub account. Consider creating a bot account first, then create a personal access token with it to use with Scala Steward.
  • Create a custom Scala Steward team (e.g., @organization/scala-steward) and add the bot account above to it. Now, instead of remembering to add the bot account to your Scala project repository as a collaborator, you can add the more intuitive Scala Steward team.