Updating Container Images the Flux GitOps way
My homelab has been great for helping me apply Infrastructure-as-Code (IaC) and GitOps concepts in a practical way. However, it does take a considerable amount of time and energy to maintain. That's why today, I'm super excited to finally automate another part of my homelab: Kubernetes container image updates with Flux v2.
Installing Flux image automation components
Flux's official guide currently only describes how to use the flux bootstrap command to install the neccessary extra image automation components. While it may be the simplest way to get up and running, flux bootstrap assumes the Git branch you want Flux to commit changes to is also the branch you want the cluster to reconcile with. That's not always the case, and doesn't work if your main branch is protected.
To do it the more predictable, explicit way, you need to manually update flux-system manifests to include all the necessary components that flux bootstrap would have. The best way I found to do this is to use flux install with the --export option to replace your gotk-components.yaml file. Once the changes are committed, pushed, requested, reviewed, and merged, etc. Flux will reconcile the changes as usual.
Here are the commands I used to update and push the files for both of my Flux managed clusters:
git checkout -b flux-image-automation
flux install \
 --components-extra=image-reflector-controller,image-automation-controller \
 --export | tee staging/flux-system/gotk-components.yaml production/flux-system/gotk-components.yaml > /dev/null
git commit -am "Add Flux image automation components"
git push --set-upstream origin flux-image-automation
Configuring image updates
With the Flux Image automation components now installed and running, I was ready to create the manifests used to configure the automations. In short, there are three types of resources to define:
- ImageRepositoryresources tell Flux where to scan for new images
- ImagePolicyresources tell Flux which image tags should be considered
- ImageUpdateAutomationresources tell Flux where changes should be commited
Additionally, a comment must be added within the manifest where the image is configured to tell Flux which policy it should reference when updating the tag.
Directory structure
My strategy was to keep the ImageRepository manifests as close as possible to the base manifests where my image tags are configured. I also wanted Flux to commit to a new branch for each application that it would be updating so that each change could be in a separate GitHub Pull Request.
After some testing, I realized that I wouldn't be able to use my current directory structure because set my namespaces on resource using Kustomize's namespace transformer, and there's no simple way for resources in the same directory to be applied in different namespaces. And, while I could have put these resources in the same namespace as the application, keeping them in flux-system made some sense. In addition, I also didn't want the image update automations running in multiple clusters making updates to the same repositories, so I would new subdirectories anyways. Ultimately, this is the directory structure I settled on:
apps/
  base/
    my-app/
      resources/
        deployments.yaml
        kustomization.yaml
        services.yaml
      updates/
        kustomization.yaml
        updateautomations.yaml
  production/
    my-app/
      resources/
        kustomization.yaml
        patches.yaml
      updates/
        kustomization.yaml
      kustomization.yaml
  staging/
    my-app/
      resources/
        kustomization.yaml
        patches.yaml
      kustomization.yaml
The namespaces are set by the kustomization.yaml files within production/my-app/resources/ and production/my-app/updates/ which merely reference the corresponding folder in base/.
Generating the image automation manifests
To generate the initial manifests for my first app, I used the following flux commands:
flux create image repository pi-hole \
 --image=pihole/pihole \
 --interval=1h \
 --namespace= \
 --export | tee apps/base/pi-hole/updates/updateautomations.yaml
flux create image policy pi-hole \
 --image-ref=pi-hole \
 --filter-regex='^(?P<YYYY>\d{4})\.(?P<MM>\d{2})\.(?P<RELEASE>\d+)
It took a bit to figure out the filters for selecting and sorting the image tags because this image uses CalVer style releases. Still, the filter isn't perfect because the "release" part of version isnt a fixed length. When sorted, alphabetically, `2023.05.9` would be considered newer than `2023.05.11` for example. Exceeding 9 releases in a given month seems to be very rare for this project though, so I was fine with it as is. If I absolutely need a later release version within a given month, I can always just temporarily suspend the automation and configure the version manually.
I should also mention here that unattended image updates are generally not a good practice. It's especially true though for images that don't use SemVer since there's no real indication whether the release includes breaking changes. However, since I planned to gate all changes behind a pull request anyways, nothing will actually be updated in production without first being tested and approved. That's possible because each application will have it's `ImageUpdateAutomation` scoped to the specific base application manifest path, and a unique push-branch.
# Modifying the deployment manifest
Next up was adding the comment to the `image:` line in my deployment manifest as shown below. The comment simply tells Flux which `ImagePolicy` should be applied when evaluating images to update. This may be easier to understand for SemVer releases where you might set an `ImagePolicy` matching versions `2.x.x` to prevent automatically updating to newer major releases.
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: pihole
image: pihole/pihole:2023.03.1 #
## Rotating my deploy-key with write permissions
After pushing and merging the above changes, I expected to see a new commit in a new branch. Instead, I noticed the error below within the `ImageUpdateAutomation` events.
Events:
Type Reason Age From Message
Warning error 52s (x9 over 4m15s) image-automation-controller unknown error: ERROR: The key you are authenticating with has been marked as read only.
I missed the `--read-write-key` option in the `flux bootstrap` command shown in [Flux's image automation guide](https://fluxcd.io/flux/guides/image-update/). By default, Flux creates GitHub deploy keys as read-only. Since there's no way to modify a deploy-key once created, I'd need to create a new one with write access. No problem. Following the first method at [https://fluxcd.io/flux/installation/#deploy-key-rotation](https://fluxcd.io/flux/installation/#deploy-key-rotation), I rotated my `flux-system` secret and manually created the new deploy-key within GitHub.
# Automating pull request creation
Immediately after fixing my deploy-key issues, Flux had already created a new branch with an image update. The last step now was just turning that commit into a new pull request automatically. Flux also provides an [example leveraging GitHub actions](https://fluxcd.io/flux/use-cases/gh-actions-auto-pr/) to do just that, but it's a bit outdated. After some trial and error, here's the final GitHub workflow I ended up with.
name: Flux Image Update Auto-PR
on:
create:
branches:
- fluxcdbot/updates/**
permissions:
pull-requests: write
jobs:
pull-request:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
name: checkout
with:
fetch-depth: 0
- name: pull-request
run: |
gh pr create --base "main" --fill
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
After merging in the new workflow, I deleted the branch that Flux created so it would recreate it and trigger the workflow. It didn't take long before I had a new pull request to update my image!
# Wrapping up
With the automation now working, my next steps were to add additional apps and polish up some of the rough edges like signing commits and using better commit messages.
Speaking of rough edges, while I'm impressed with Flux's Image Automation overall, setting this up was considerably more involved than I expected. Mostly because configuring Flux to do feature-branch-like changes isn't as simple or well documentated as it should be. It seems like the happy path in the examples is to have all the updates committed to one branch so that changes are automatically deployed from that one branch. This probably works well if you have a single app on your cluster and want it deployed to a staging cluster before promoting to production, but with multiple apps you wouldn't want to be promoting multiple changes at once or cherry picking. I also wish there was better support for non SemVer versioning. Using Regex filters with sorting doesn't work in all cases. To actually fix the filter in my example, you'd need to be able to sort the date and release numbers separately.
That being said, Flux in general is extremely flexible and it's no exception here. I plan to continue using Flux v2's Image Update Automation for the forseeable future, so go check out my progress and see how I'm using it in [my homelab](https://github.com/bcbrookman/homelab) today!
 \
 --filter-extract='$YYYY.$MM.$RELEASE' \
 --select-alpha=asc \
 --namespace= \
 --export | tee -a apps/base/pi-hole/updates/updateautomations.yaml
flux create image update pi-hole \
 --git-repo-ref=flux-system \
 --git-repo-path="/software-layer/k8s/apps/base/pi-hole/" \
 --checkout-branch=main \
 --push-branch=fluxcdbot/updates/pi-hole \
 --author-name=fluxcdbot \
 --author-email=fluxcdbot@users.noreply.github.com \
 --commit-template="{{range .Updated.Images}}{{println .}}{{end}}" \
 --namespace= \
 --export | tee -a apps/base/pi-hole/updates/updateautomations.yaml
It took a bit to figure out the filters for selecting and sorting the image tags because this image uses CalVer style releases. Still, the filter isn't perfect because the "release" part of version isnt a fixed length. When sorted, alphabetically, 2023.05.9 would be considered newer than 2023.05.11 for example. Exceeding 9 releases in a given month seems to be very rare for this project though, so I was fine with it as is. If I absolutely need a later release version within a given month, I can always just temporarily suspend the automation and configure the version manually.
I should also mention here that unattended image updates are generally not a good practice. It's especially true though for images that don't use SemVer since there's no real indication whether the release includes breaking changes. However, since I planned to gate all changes behind a pull request anyways, nothing will actually be updated in production without first being tested and approved. That's possible because each application will have it's ImageUpdateAutomation scoped to the specific base application manifest path, and a unique push-branch.
Modifying the deployment manifest
Next up was adding the comment to the image: line in my deployment manifest as shown below. The comment simply tells Flux which ImagePolicy should be applied when evaluating images to update. This may be easier to understand for SemVer releases where you might set an ImagePolicy matching versions 2.x.x to prevent automatically updating to newer major releases.
{{CODE_BLOCK_3}}
Rotating my deploy-key with write permissions
After pushing and merging the above changes, I expected to see a new commit in a new branch. Instead, I noticed the error below within the ImageUpdateAutomation events.
{{CODE_BLOCK_4}}
I missed the --read-write-key option in the flux bootstrap command shown in Flux's image automation guide. By default, Flux creates GitHub deploy keys as read-only. Since there's no way to modify a deploy-key once created, I'd need to create a new one with write access. No problem. Following the first method at https://fluxcd.io/flux/installation/#deploy-key-rotation, I rotated my flux-system secret and manually created the new deploy-key within GitHub.
Automating pull request creation
Immediately after fixing my deploy-key issues, Flux had already created a new branch with an image update. The last step now was just turning that commit into a new pull request automatically. Flux also provides an example leveraging GitHub actions to do just that, but it's a bit outdated. After some trial and error, here's the final GitHub workflow I ended up with.
{{CODE_BLOCK_5}}
After merging in the new workflow, I deleted the branch that Flux created so it would recreate it and trigger the workflow. It didn't take long before I had a new pull request to update my image!
Wrapping up
With the automation now working, my next steps were to add additional apps and polish up some of the rough edges like signing commits and using better commit messages.
Speaking of rough edges, while I'm impressed with Flux's Image Automation overall, setting this up was considerably more involved than I expected. Mostly because configuring Flux to do feature-branch-like changes isn't as simple or well documentated as it should be. It seems like the happy path in the examples is to have all the updates committed to one branch so that changes are automatically deployed from that one branch. This probably works well if you have a single app on your cluster and want it deployed to a staging cluster before promoting to production, but with multiple apps you wouldn't want to be promoting multiple changes at once or cherry picking. I also wish there was better support for non SemVer versioning. Using Regex filters with sorting doesn't work in all cases. To actually fix the filter in my example, you'd need to be able to sort the date and release numbers separately.
That being said, Flux in general is extremely flexible and it's no exception here. I plan to continue using Flux v2's Image Update Automation for the forseeable future, so go check out my progress and see how I'm using it in my homelab today!