Search Results: "Sergio Talens-Oliag"

25 December 2023

Sergio Talens-Oliag: GitLab CI/CD Tips: Automatic Versioning Using semantic-release

This post describes how I m using semantic-release on gitlab-ci to manage versioning automatically for different kinds of projects following a simple workflow (a develop branch where changes are added or merged to test new versions, a temporary release/#.#.# to generate the release candidate versions and a main branch where the final versions are published).

What is semantic-releaseIt is a Node.js application designed to manage project versioning information on Git Repositories using a Continuous integration system (in this post we will use gitlab-ci)

How does it workBy default semantic-release uses semver for versioning (release versions use the format MAJOR.MINOR.PATCH) and commit messages are parsed to determine the next version number to publish. If after analyzing the commits the version number has to be changed, the command updates the files we tell it to (i.e. the package.json file for nodejs projects and possibly a CHANGELOG.md file), creates a new commit with the changed files, creates a tag with the new version and pushes the changes to the repository. When running on a CI/CD system we usually generate the artifacts related to a release (a package, a container image, etc.) from the tag, as it includes the right version number and usually has passed all the required tests (it is a good idea to run the tests again in any case, as someone could create a tag manually or we could run extra jobs when building the final assets if they fail it is not a big issue anyway, numbers are cheap and infinite, so we can skip releases if needed).

Commit messages and versioningThe commit messages must follow a known format, the default module used to analyze them uses the angular git commit guidelines, but I prefer the conventional commits one, mainly because it s a lot easier to use when you want to update the MAJOR version. The commit message format used must be:
<type>(optional scope): <description>
[optional body]
[optional footer(s)]
The system supports three types of branches: release, maintenance and pre-release, but for now I m not using maintenance ones. The branches I use and their types are:
  • main as release branch (final versions are published from there)
  • develop as pre release branch (used to publish development and testing versions with the format #.#.#-SNAPSHOT.#)
  • release/#.#.# as pre release branches (they are created from develop to publish release candidate versions with the format #.#.#-rc.# and once they are merged with main they are deleted)
On the release branch (main) the version number is updated as follows:
  1. The MAJOR number is incremented if a commit with a BREAKING CHANGE: footer or an exclamation (!) after the type/scope is found in the list of commits found since the last version change (it looks for tags on the same branch).
  2. The MINOR number is incremented if the MAJOR number is not going to be changed and there is a commit with type feat in the commits found since the last version change.
  3. The PATCH number is incremented if neither the MAJOR nor the MINOR numbers are going to be changed and there is a commit with type fix in the the commits found since the last version change.
On the pre release branches (develop and release/#.#.#) the version and pre release numbers are always calculated from the last published version available on the branch (i. e. if we published version 1.3.2 on main we need to have the commit with that tag on the develop or release/#.#.# branch to get right what will be the next version). The version number is updated as follows:
  1. The MAJOR number is incremented if a commit with a BREAKING CHANGE: footer or an exclamation (!) after the type/scope is found in the list of commits found since the last released version.In our example it was 1.3.2 and the version is updated to 2.0.0-SNAPSHOT.1 or 2.0.0-rc.1 depending on the branch.
  2. The MINOR number is incremented if the MAJOR number is not going to be changed and there is a commit with type feat in the commits found since the last released version.In our example the release was 1.3.2 and the version is updated to 1.4.0-SNAPSHOT.1 or 1.4.0-rc.1 depending on the branch.
  3. The PATCH number is incremented if neither the MAJOR nor the MINOR numbers are going to be changed and there is a commit with type fix in the the commits found since the last version change.In our example the release was 1.3.2 and the version is updated to 1.3.3-SNAPSHOT.1 or 1.3.3-rc.1 depending on the branch.
  4. The pre release number is incremented if the MAJOR, MINOR and PATCH numbers are not going to be changed but there is a commit that would otherwise update the version (i.e. a fix on 1.3.3-SNAPSHOT.1 will set the version to 1.3.3-SNAPSHOT.2, a fix or feat on 1.4.0-rc.1 will set the version to 1.4.0-rc.2 an so on).

How do we manage its configurationAlthough the system is designed to work with nodejs projects, it can be used with multiple programming languages and project types. For nodejs projects the usual place to put the configuration is the project s package.json, but I prefer to use the .releaserc file instead. As I use a common set of CI templates, instead of using a .releaserc on each project I generate it on the fly on the jobs that need it, replacing values related to the project type and the current branch on a template using the tmpl command (lately I use a branch of my own fork while I wait for some feedback from upstream, as you will see on the Dockerfile).

Container used to run itAs we run the command on a gitlab-ci job we use the image built from the following Dockerfile:
Dockerfile
# Semantic release image
FROM golang:alpine AS tmpl-builder
#RUN go install github.com/krakozaure/tmpl@v0.4.0
RUN go install github.com/sto/tmpl@v0.4.0-sto.2
FROM node:lts-alpine
COPY --from=tmpl-builder /go/bin/tmpl /usr/local/bin/tmpl
RUN apk update &&\
  apk upgrade &&\
  apk add curl git jq openssh-keygen yq zip &&\
  npm install --location=global\
    conventional-changelog-conventionalcommits@6.1.0\
    @qiwi/multi-semantic-release@7.0.0\
    semantic-release@21.0.7\
    @semantic-release/changelog@6.0.3\
    semantic-release-export-data@1.0.1\
    @semantic-release/git@10.0.1\
    @semantic-release/gitlab@9.5.1\
    @semantic-release/release-notes-generator@11.0.4\
    semantic-release-replace-plugin@1.2.7\
    semver@7.5.4\
  &&\
  rm -rf /var/cache/apk/*
CMD ["/bin/sh"]

How and when is it executedThe job that runs semantic-release is executed when new commits are added to the develop, release/#.#.# or main branches (basically when something is merged or pushed) and after all tests have passed (we don t want to create a new version that does not compile or passes at least the unit tests). The job is something like the following:
semantic_release:
  image: $SEMANTIC_RELEASE_IMAGE
  rules:
    - if: '$CI_COMMIT_BRANCH =~ /^(develop main release\/\d+.\d+.\d+)$/'
      when: always
  stage: release
  before_script:
    - echo "Loading scripts.sh"
    - . $ASSETS_DIR/scripts.sh
  script:
    - sr_gen_releaserc_json
    - git_push_setup
    - semantic-release
Where the SEMANTIC_RELEASE_IMAGE variable contains the URI of the image built using the Dockerfile above and the sr_gen_releaserc_json and git_push_setup are functions defined on the $ASSETS_DIR/scripts.sh file:
  • The sr_gen_releaserc_json function generates the .releaserc.json file using the tmpl command.
  • The git_push_setup function configures git to allow pushing changes to the repository with the semantic-release command, optionally signing them with a SSH key.

The sr_gen_releaserc_json functionThe code for the sr_gen_releaserc_json function is the following:
sr_gen_releaserc_json()
 
  # Use nodejs as default project_type
  project_type="$ PROJECT_TYPE:-nodejs "
  # REGEX to match the rc_branch name
  rc_branch_regex='^release\/[0-9]\+\.[0-9]\+\.[0-9]\+$'
  # PATHS on the local ASSETS_DIR
  assets_dir="$ CI_PROJECT_DIR /$ ASSETS_DIR "
  sr_local_plugin="$ assets_dir /local-plugin.cjs"
  releaserc_tmpl="$ assets_dir /releaserc.json.tmpl"
  pipeline_runtime_values_yaml="/tmp/releaserc_values.yaml"
  pipeline_values_yaml="$ assets_dir /values_$ project_type _project.yaml"
  # Destination PATH
  releaserc_json=".releaserc.json"
  # Create an empty pipeline_values_yaml if missing
  test -f "$pipeline_values_yaml"   : >"$pipeline_values_yaml"
  # Create the pipeline_runtime_values_yaml file
  echo "branch: $ CI_COMMIT_BRANCH " >"$pipeline_runtime_values_yaml"
  echo "gitlab_url: $ CI_SERVER_URL " >"$pipeline_runtime_values_yaml"
  # Add the rc_branch name if we are on an rc_branch
  if [ "$(echo "$CI_COMMIT_BRANCH"   sed -ne "/$rc_branch_regex/ p ")" ]; then
    echo "rc_branch: $ CI_COMMIT_BRANCH " >>"$pipeline_runtime_values_yaml"
  elif [ "$(echo "$CI_MERGE_REQUEST_SOURCE_BRANCH_NAME"  
      sed -ne "/$rc_branch_regex/ p ")" ]; then
    echo "rc_branch: $ CI_MERGE_REQUEST_SOURCE_BRANCH_NAME " \
      >>"$pipeline_runtime_values_yaml"
  fi
  echo "sr_local_plugin: $ sr_local_plugin " >>"$pipeline_runtime_values_yaml"
  # Create the releaserc_json file
  tmpl -f "$pipeline_runtime_values_yaml" -f "$pipeline_values_yaml" \
    "$releaserc_tmpl"   jq . >"$releaserc_json"
  # Remove the pipeline_runtime_values_yaml file
  rm -f "$pipeline_runtime_values_yaml"
  # Print the releaserc_json file
  print_file_collapsed "$releaserc_json"
  # --*-- BEG: NOTE --*--
  # Rename the package.json to ignore it when calling semantic release.
  # The idea is that the local-plugin renames it back on the first step of the
  # semantic-release process.
  # --*-- END: NOTE --*--
  if [ -f "package.json" ]; then
    echo "Renaming 'package.json' to 'package.json_disabled'"
    mv "package.json" "package.json_disabled"
  fi
 
Almost all the variables used on the function are defined by gitlab except the ASSETS_DIR and PROJECT_TYPE; in the complete pipelines the ASSETS_DIR is defined on a common file included by all the pipelines and the project type is defined on the .gitlab-ci.yml file of each project. If you review the code you will see that the file processed by the tmpl command is named releaserc.json.tmpl, its contents are shown here:
 
  "plugins": [
     - if .sr_local_plugin  
    "  .sr_local_plugin  ",
     - end  
    [
      "@semantic-release/commit-analyzer",
       
        "preset": "conventionalcommits",
        "releaseRules": [
            "breaking": true, "release": "major"  ,
            "revert": true, "release": "patch"  ,
            "type": "feat", "release": "minor"  ,
            "type": "fix", "release": "patch"  ,
            "type": "perf", "release": "patch"  
        ]
       
    ],
     - if .replacements  
    [
      "semantic-release-replace-plugin",
        "replacements":   .replacements   toJson    
    ],
     - end  
    "@semantic-release/release-notes-generator",
     - if eq .branch "main"  
    [
      "@semantic-release/changelog",
        "changelogFile": "CHANGELOG.md", "changelogTitle": "# Changelog"  
    ],
     - end  
    [
      "@semantic-release/git",
       
        "assets":   if .assets   .assets   toJson   else  []  end  ,
        "message": "ci(release): v$ nextRelease.version \n\n$ nextRelease.notes "
       
    ],
    [
      "@semantic-release/gitlab",
        "gitlabUrl": "  .gitlab_url  ", "successComment": false  
    ]
  ],
  "branches": [
      "name": "develop", "prerelease": "SNAPSHOT"  ,
     - if .rc_branch  
      "name": "  .rc_branch  ", "prerelease": "rc"  ,
     - end  
    "main"
  ]
 
The values used to process the template are defined on a file built on the fly (releaserc_values.yaml) that includes the following keys and values:
  • branch: the name of the current branch
  • gitlab_url: the URL of the gitlab server (the value is taken from the CI_SERVER_URL variable)
  • rc_branch: the name of the current rc branch; we only set the value if we are processing one because semantic-release only allows one branch to match the rc prefix and if we use a wildcard (i.e. release/*) but the users keep more than one release/#.#.# branch open at the same time the calls to semantic-release will fail for sure.
  • sr_local_plugin: the path to the local plugin we use (shown later)
The template also uses a values_$ project_type _project.yaml file that includes settings specific to the project type, the one for nodejs is as follows:
replacements:
  - files:
      - "package.json"
    from: "\"version\": \".*\""
    to: "\"version\": \"$ nextRelease.version \""
assets:
  - "CHANGELOG.md"
  - "package.json"
The replacements section is used to update the version field on the relevant files of the project (in our case the package.json file) and the assets section includes the files that will be committed to the repository when the release is published (looking at the template you can see that the CHANGELOG.md is only updated for the main branch, we do it this way because if we update the file on other branches it creates a merge nightmare and we are only interested on it for released versions anyway). The local plugin adds code to rename the package.json_disabled file to package.json if present and prints the last and next versions on the logs with a format that can be easily parsed using sed:
local-plugin.cjs
// Minimal plugin to:
// - rename the package.json_disabled file to package.json if present
// - log the semantic-release last & next versions
function verifyConditions(pluginConfig, context)  
  var fs = require('fs');
  if (fs.existsSync('package.json_disabled'))  
    fs.renameSync('package.json_disabled', 'package.json');
    context.logger.log( verifyConditions: renamed 'package.json_disabled' to 'package.json' );
   
 
function analyzeCommits(pluginConfig, context)  
  if (context.lastRelease && context.lastRelease.version)  
    context.logger.log( analyzeCommits: LAST_VERSION=$ context.lastRelease.version  );
   
 
function verifyRelease(pluginConfig, context)  
  if (context.nextRelease && context.nextRelease.version)  
    context.logger.log( verifyRelease: NEXT_VERSION=$ context.nextRelease.version  );
   
 
module.exports =  
  verifyConditions,
  analyzeCommits,
  verifyRelease
 

The git_push_setup functionThe code for the git_push_setup function is the following:
git_push_setup()
 
  # Update global credentials to allow git clone & push for all the group repos
  git config --global credential.helper store
  cat >"$HOME/.git-credentials" <<EOF
https://fake-user:$ GITLAB_REPOSITORY_TOKEN @gitlab.com
EOF
  # Define user name, mail and signing key for semantic-release
  user_name="$SR_USER_NAME"
  user_email="$SR_USER_EMAIL"
  ssh_signing_key="$SSH_SIGNING_KEY"
  # Export git user variables
  export GIT_AUTHOR_NAME="$user_name"
  export GIT_AUTHOR_EMAIL="$user_email"
  export GIT_COMMITTER_NAME="$user_name"
  export GIT_COMMITTER_EMAIL="$user_email"
  # Sign commits with ssh if there is a SSH_SIGNING_KEY variable
  if [ "$ssh_signing_key" ]; then
    echo "Configuring GIT to sign commits with SSH"
    ssh_keyfile="/tmp/.ssh-id"
    : >"$ssh_keyfile"
    chmod 0400 "$ssh_keyfile"
    echo "$ssh_signing_key"   tr -d '\r' >"$ssh_keyfile"
    git config gpg.format ssh
    git config user.signingkey "$ssh_keyfile"
    git config commit.gpgsign true
  fi
 
The function assumes that the GITLAB_REPOSITORY_TOKEN variable (set on the CI/CD variables section of the project or group we want) contains a token with read_repository and write_repository permissions on all the projects we are going to use this function. The SR_USER_NAME and SR_USER_EMAIL variables can be defined on a common file or the CI/CD variables section of the project or group we want to work with and the script assumes that the optional SSH_SIGNING_KEY is exported as a CI/CD default value of type variable (that is why the keyfile is created on the fly) and git is configured to use it if the variable is not empty.
Warning: Keep in mind that the variables GITLAB_REPOSITORY_TOKEN and SSH_SIGNING_KEY contain secrets, so probably is a good idea to make them protected (if you do that you have to make the develop, main and release/* branches protected too).
Warning: The semantic-release user has to be able to push to all the projects on those protected branches, it is a good idea to create a dedicated user and add it as a MAINTAINER for the projects we want (the MAINTAINERS need to be able to push to the branches), or, if you are using a Gitlab with a Premium license you can use the api to allow the semantic-release user to push to the protected branches without allowing it for any other user.

The semantic-release commandOnce we have the .releaserc file and the git configuration ready we run the semantic-release command. If the branch we are working with has one or more commits that will increment the version, the tool does the following (note that the steps are described are the ones executed if we use the configuration we have generated):
  1. It detects the commits that will increment the version and calculates the next version number.
  2. Generates the release notes for the version.
  3. Applies the replacements defined on the configuration (in our example updates the version field on the package.json file).
  4. Updates the CHANGELOG.md file adding the release notes if we are going to publish the file (when we are on the main branch).
  5. Creates a commit if all or some of the files listed on the assets key have changed and uses the commit message we have defined, replacing the variables for their current values.
  6. Creates a tag with the new version number and the release notes.
  7. As we are using the gitlab plugin after tagging it also creates a release on the project with the tag name and the release notes.

Notes about the git workflows and merges between branchesIt is very important to remember that semantic-release looks at the commits of a given branch when calculating the next version to publish, that has two important implications:
  1. On pre release branches we need to have the commit that includes the tag with the released version, if we don t have it the next version is not calculated correctly.
  2. It is a bad idea to squash commits when merging a branch to another one, if we do that we will lose the information semantic-release needs to calculate the next version and even if we use the right prefix for the squashed commit (fix, feat, ) we miss all the messages that would otherwise go to the CHANGELOG.md file.
To make sure that we have the right commits on the pre release branches we should merge the main branch changes into the develop one after each release tag is created; in my pipelines the fist job that processes a release tag creates a branch from the tag and an MR to merge it to develop. The important thing about that MR is that is must not be squashed, if we do that the tag commit will probably be lost, so we need to be careful. To merge the changes directly we can run the following code:
# Set the SR_TAG variable to the tag you want to process
SR_TAG="v1.3.2"
# Fetch all the changes
git fetch --all --prune
# Switch to the main branch
git switch main
# Pull all the changes
git pull
# Switch to the development branch
git switch develop
# Pull all the changes
git pull
# Create followup branch from tag
git switch -c "followup/$SR_TAG" "$SR_TAG"
# Change files manually & commit the changed files
git commit -a --untracked-files=no -m "ci(followup): $SR_TAG to develop"
# Switch to the development branch
git switch develop
# Merge the followup branch into the development one using the --no-ff option
git merge --no-ff "followup/$SR_TAG"
# Remove the followup branch
git branch -d "followup/$SR_TAG"
# Push the changes
git push
If we can t push directly to develop we can create a MR pushing the followup branch after committing the changes, but we have to make sure that we don t squash the commits when merging or it will not work as we want.

23 September 2023

Sergio Talens-Oliag: GitLab CI/CD Tips: Using Rule Templates

This post describes how to define and use rule templates with semantic names using extends or !reference tags, how to define manual jobs using the same templates and how to use gitlab-ci inputs as macros to give names to regular expressions used by rules.

Basic rule templatesI keep my templates in a rules.yml file stored on a common repository used from different projects as I mentioned on my previous post, but they can be defined anywhere, the important thing is that the files that need them include their definition somehow. The first version of my rules.yml file was as follows:
.rules_common:
  # Common rules; we include them from others instead of forcing a workflow
  rules:
    # Disable branch pipelines while there is an open merge request from it
    - if: >-
        $CI_COMMIT_BRANCH &&
        $CI_OPEN_MERGE_REQUESTS &&
        $CI_PIPELINE_SOURCE != "merge_request_event"
      when: never
.rules_default:
  # Default rules, we need to add the when: on_success to make things work
  rules:
    - !reference [.rules_common, rules]
    - when: on_success
The main idea is that .rules_common defines a rule section to disable jobs as we can do on a workflow definition; in our case common rules only have if rules that apply to all jobs and are used to disable them. The example includes one that avoids creating duplicated jobs when we push to a branch that is the source of an open MR as explained here. To use the rules in a job we have two options, use the extends keyword (we do that when we want to use the rule as is) or declare a rules section and add a !reference to the template we want to use as described here (we do that when we want to add additional rules to disable a job before evaluating the template conditions). As an example, with the following definitions both jobs use the same rules:
job_1:
  extends:
    - .rules_default
  [...]
job_2:
  rules:
    - !reference [.rules_default, rules]
  [...]

Manual jobs and rule templatesTo make the jobs manual we have two options, create a version of the job that includes when: manual and defines if we want it to be optional or not (allow_failure: true makes the job optional, if we don t add that to the rule the job is blocking) or add the when: manual and the allow_failure value to the job (if we work at the job level the default value for allow_failure is false for when: manual, so it is optional by default, we have to add an explicit allow_failure = true it to make it blocking). The following example shows how we define blocking or optional manual jobs using rules with when conditions:
.rules_default_manual_blocking:
  # Default rules for optional manual jobs
  rules:
    - !reference [.rules_common, rules]
    - when: manual
      # allow_failure: false is implicit
.rules_default_manual_optional:
  # Default rules for optional manual jobs
  rules:
    - !reference [.rules_common, rules]
    - when: manual
      allow_failure: true
manual_blocking_job:
  extends:
    - .rules_default_manual_blocking
  [...]
manual_optional_job:
  extends:
    - .rules_default_manual_optional
  [...]
The problem here is that we have to create new versions of the same rule template to add the conditions, but we can avoid it using the keywords at the job level with the original rules to get the same effect; the following definitions create jobs equivalent to the ones defined earlier without creating additional templates:
manual_blocking_job:
  extends:
    - .rules_default
  when: manual
  allow_failure: false
  [...]
manual_optional_job:
  extends:
    - .rules_default
  when: manual
  # allow_failure: true is implicit
  [...]
As you can imagine, that is my preferred way of doing it, as it keeps the rules.yml file smaller and I see that the job is manual in its definition without problem.

Rules with allow_failure, changes, exists, needs or variablesUnluckily for us, for now there is no way to avoid creating additional templates as we did on the when: manual case when a rule is similar to an existing one but adds changes, exists, needs or variables to it. So, for now, if a rule needs to add any of those fields we have to copy the original rule and add the keyword section. Some notes, though:
  • we only need to add allow_failure if we want to change its value for a given condition, in other cases we can set the value at the job level.
  • if we are adding changes to the rule it is important to make sure that they are going to be evaluated as explained here.
  • when we add a needs value to a rule for a specific condition and it matches it replaces the job needs section; when using templates I would use two different job names with different conditions instead of adding a needs on a single job.

Defining rule templates with semantic namesI started to use rule templates to avoid repetition when defining jobs that needed the same rules and soon I noticed that giving them names with a semantic meaning they where easier to use and understand (we provide a name that tells us when we are going to execute the job, while the details of the variables names or values used on the rules are an implementation detail of the templates). We are not going to define real jobs on this post, but as an example we are going to define a set of rules that can be useful if we plan to follow a scaled trunk based development workflow when developing, that is, we are going to put the releasable code on the main branch and use short-lived branches to test and complete changes before pushing things to main. Using this approach we can define an initial set of rule templates with semantic names:
.rules_mr_to_main:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
.rules_mr_or_push_to_main:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
    - if: >-
        $CI_COMMIT_BRANCH == 'main'
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_main:
  rules:
    - !reference [.rules_common, rules]
    - if: >-
        $CI_COMMIT_BRANCH == 'main'
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_branch:
  rules:
    - !reference [.rules_common, rules]
    - if: >-
        $CI_COMMIT_BRANCH != 'main'
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_branch_or_mr_to_main:
  rules:
    - !reference [.rules_push_to_branch, rules]
    - if: >-
         $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME != 'main'
         &&
         $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
.rules_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG =~ /^([0-9a-zA-Z_.-]+-)?v\d+.\d+.\d+$/
.rules_non_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG !~ /^([0-9a-zA-Z_.-]+-)?v\d+.\d+.\d+$/
With those names it is clear when a job is going to be executed and when using the templates on real jobs we can add additional restrictions and make the execution manual if needed as described earlier.

Using inputs as macrosOn the previous rules we have used a regular expression to identify the release tag format and assumed that the general branches are the ones with a name different than main; if we want to force a format for those branch names we can replace the condition != 'main' by a regex comparison (=~ if we look for matches, !~ if we want to define valid branch names removing the invalid ones). When testing the new gitlab-ci inputs my colleague Jorge noticed that if you keep their default value they basically work as macros. The variables declared as inputs can t hold YAML values, the truth is that their value is always a string that is replaced by the value assigned to them when including the file (if given) or by their default value, if defined. If you don t assign a value to an input variable when including the file that declares it its occurrences are replaced by its default value, making them work basically as macros; this is useful for us when working with strings that can t managed as variables, like the regular expressions used inside if conditions. With those two ideas we can add the following prefix to the rules.yaml defining inputs for both regular expressions and replace the rules that can use them by the ones shown here:
spec:
  inputs:
    # Regular expression for branches; the prefix matches the type of changes
    # we plan to work on inside the branch (we use conventional commit types as
    # the branch prefix)
    branch_regex:
      default: '/^(build ci chore docs feat fix perf refactor style test)\/.+$/'
    # Regular expression for tags
    release_tag_regex:
      default: '/^([0-9a-zA-Z_.-]+-)?v\d+.\d+.\d+$/'
---
[...]
.rules_push_to_changes_branch:
  rules:
    - !reference [.rules_common, rules]
    - if: >-
        $CI_COMMIT_BRANCH =~ $[[ inputs.branch_regex ]]
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_branch_or_mr_to_main:
  rules:
    - !reference [.rules_push_to_branch, rules]
    - if: >-
         $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME =~ $[[ inputs.branch_regex ]]
         &&
         $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
.rules_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG =~ $[[ inputs.release_tag_regex ]]
.rules_non_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG !~ $[[ inputs.release_tag_regex ]]

Creating rules reusing existing onesI m going to finish this post with a comment about how I avoid defining extra rule templates in some common cases. The idea is simple, we can use !reference tags to fine tune rules when we need to add conditions to disable them simply adding conditions with when: never before referencing the template. As an example, in some projects I m using different job definitions depending on the DEPLOY_ENVIRONMENT value to make the job manual or automatic; as we just said we can define different jobs referencing the same rule adding a condition to check if the environment is the one we are interested in:
deploy_job_auto:
  rules:
    # Only deploy automatically if the environment is 'dev' by skipping this job
    # for other values of the DEPLOY_ENVIRONMENT variable
    - if: $DEPLOY_ENVIRONMENT != "dev"
      when: never
    - !reference [.rules_release_tag, rules]
  [...]
deploy_job_manually:
  rules:
    # Disable this job if the environment is 'dev'
    - if: $DEPLOY_ENVIRONMENT == "dev"
      when: never
    - !reference [.rules_release_tag, rules]
  when: manual
  # Change this to  false  to make the deployment job blocking
  allow_failure: true
  [...]
If you think about it the idea of adding negative conditions is what we do with the .rules_common template; we add conditions to disable the job before evaluating the real rules. The difference in that case is that we reference them at the beginning because we want those negative conditions on all jobs and that is also why we have a .rules_default condition with an when: on_success for the jobs that only need to respect the default workflow (we need the last condition to make sure that they are executed if the negative rules don t match).

16 September 2023

Sergio Talens-Oliag: GitLab CI/CD Tips: Using a Common CI Repository with Assets

This post describes how to handle files that are used as assets by jobs and pipelines defined on a common gitlab-ci repository when we include those definitions from a different project.

Problem descriptionWhen a .giltlab-ci.yml file includes files from a different repository its contents are expanded and the resulting code is the same as the one generated when the included files are local to the repository. In fact, even when the remote files include other files everything works right, as they are also expanded (see the description of how included files are merged for a complete explanation), allowing us to organise the common repository as we want. As an example, suppose that we have the following script on the assets/ folder of the common repository:
dumb.sh
#!/bin/sh
echo "The script arguments are: '$@'"
If we run the following job on the common repository:
job:
  script:
    - $CI_PROJECT_DIR/assets/dumb.sh ARG1 ARG2
the output will be:
The script arguments are: 'ARG1 ARG2'
But if we run the same job from a different project that includes the same job definition the output will be different:
/scripts-23-19051/step_script: eval: line 138: d./assets/dumb.sh: not found
The problem here is that we include and expand the YAML files, but if a script wants to use other files from the common repository as an asset (configuration file, shell script, template, etc.), the execution fails if the files are not available on the project that includes the remote job definition.

SolutionsWe can solve the issue using multiple approaches, I ll describe two of them:
  • Create files using scripts
  • Download files from the common repository

Create files using scriptsOne way to dodge the issue is to generate the non YAML files from scripts included on the pipelines using HERE documents. The problem with this approach is that we have to put the content of the files inside a script on a YAML file and if it uses characters that can be replaced by the shell (remember, we are using HERE documents) we have to escape them (error prone) or encode the whole file into base64 or something similar, making maintenance harder. As an example, imagine that we want to use the dumb.sh script presented on the previous section and we want to call it from the same PATH of the main project (on the examples we are using the same folder, in practice we can create a hidden folder inside the project directory or use a PATH like /tmp/assets-$CI_JOB_ID to leave things outside the project folder and make sure that there will be no collisions if two jobs are executed on the same place (i.e. when using a ssh runner). To create the file we will use hidden jobs to write our script template and reference tags to add it to the scripts when we want to use them. Here we have a snippet that creates the file with cat:
.file_scripts:
  create_dumb_sh:
    -  
      # Create dumb.sh script
      mkdir -p "$ CI_PROJECT_DIR /assets"
      cat >"$ CI_PROJECT_DIR /assets/dumb.sh" <<EOF
      #!/bin/sh
      echo "The script arguments are: '\$@'"
      EOF
      chmod +x "$ CI_PROJECT_DIR /assets/dumb.sh"
Note that to make things work we ve added 6 spaces before the script code and escaped the dollar sign. To do the same using base64 we replace the previous snippet by this:
.file_scripts:
  create_dumb_sh:
    -  
      # Create dumb.sh script
      mkdir -p "$ CI_PROJECT_DIR /assets"
      base64 -d >"$ CI_PROJECT_DIR /assets/dumb.sh" <<EOF
      IyEvYmluL3NoCmVjaG8gIlRoZSBzY3JpcHQgYXJndW1lbnRzIGFyZTogJyRAJyIK
      EOF
      chmod +x "$ CI_PROJECT_DIR /assets/dumb.sh"
Again, we have to indent the base64 version of the file using 6 spaces (all lines of the base64 output have to be indented) and to make changes we have to decode and re-code the file manually, making it harder to maintain. With either version we just need to add a !reference before using the script, if we add the call on the first lines of the before_script we can use the downloaded file in the before_script, script or after_script sections of the job without problems:
job:
  before_script:
    - !reference [.file_scripts, create_dumb_sh]
  script:
    - $ CI_PROJECT_DIR /assets/dumb.sh ARG1 ARG2
The output of a pipeline that uses this job will be the same as the one shown in the original example:
The script arguments are: 'ARG1 ARG2'

Download the files from the common repositoryAs we ve seen the previous solution works but is not ideal as it makes the files harder to read, maintain and use. An alternative approach is to keep the assets on a directory of the common repository (in our examples we will name it assets) and prepare a YAML file that declares some variables (i.e. the URL of the templates project and the PATH where we want to download the files) and defines a script fragment to download the complete folder. Once we have the YAML file we just need to include it and add a reference to the script fragment at the beginning of the before_script of the jobs that use files from the assets directory and they will be available when needed. The following file is an example of the YAML file we just mentioned:
bootstrap.yml
variables:
  CI_TMPL_API_V4_URL: "$ CI_API_V4_URL /projects/common%2Fci-templates"
  CI_TMPL_ARCHIVE_URL: "$ CI_TMPL_API_V4_URL /repository/archive"
  CI_TMPL_ASSETS_DIR: "/tmp/assets-$ CI_JOB_ID "
.scripts_common:
  bootstrap_ci_templates:
    -  
      # Downloading assets
      echo "Downloading assets"
      mkdir -p "$CI_TMPL_ASSETS_DIR"
      wget -q -O - --header="PRIVATE-TOKEN: $CI_TMPL_READ_TOKEN" \
        "$CI_TMPL_ARCHIVE_URL?path=assets&sha=$ CI_TMPL_REF:-main "  
        tar --strip-components 2 -C "$ASSETS_DIR" -xzf -
The file defines the following variables:
  • CI_TMPL_API_V4_URL: URL of the common project, in our case we are using the project ci-templates inside the common group (note that the slash between the group and the project is escaped, that is needed to reference the project by name, if we don t like that approach we can replace the url encoded path by the project id, i.e. we could use a value like $ CI_API_V4_URL /projects/31)
  • CI_TMPL_ARCHIVE_URL: Base URL to use the gitlab API to download files from a repository, we will add the arguments path and sha to select which sub path to download and from which commit, branch or tag (we will explain later why we use the CI_TMPL_REF, for now just keep in mind that if it is not defined we will download the version of the files available on the main branch when the job is executed).
  • CI_TMPL_ASSETS_DIR: Destination of the downloaded files.
And uses variables defined in other places:
  • CI_TMPL_READ_TOKEN: token that includes the read_api scope for the common project, we need it because the tokens created by the CI/CD pipelines of other projects can t be used to access the api of the common one.We define the variable on the gitlab CI/CD variables section to be able to change it if needed (i.e. if it expires)
  • CI_TMPL_REF: branch or tag of the common repo from which to get the files (we need that to make sure we are using the right version of the files, i.e. when testing we will use a branch and on production pipelines we can use fixed tags to make sure that the assets don t change between executions unless we change the reference).We will set the value on the .gitlab-ci.yml file of the remote projects and will use the same reference when including the files to make sure that everything is coherent.
This is an example YAML file that defines a pipeline with a job that uses the script from the common repository:
pipeline.yml
include:
  - /bootstrap.yaml
stages:
  - test
dumb_job:
  stage: test
  before_script:
    - !reference [.bootstrap_ci_templates, create_dumb_sh]
  script:
    - $ CI_TMPL_ASSETS_DIR /dumb.sh ARG1 ARG2
To use it from an external project we will use the following gitlab ci configuration:
gitlab-ci.yml
include:
  - project: 'common/ci-templates'
    ref: &ciTmplRef 'main'
    file: '/pipeline.yml'
variables:
  CI_TMPL_REF: *ciTmplRef
Where we use a YAML anchor to ensure that we use the same reference when including and when assigning the value to the CI_TMPL_REF variable (as far as I know we have to pass the ref value explicitly to know which reference was used when including the file, the anchor allows us to make sure that the value is always the same in both places). The reference we use is quite important for the reproducibility of the jobs, if we don t use fixed tags or commit hashes as references each time a job that downloads the files is executed we can get different versions of them. For that reason is not a bad idea to create tags on our common repo and use them as reference on the projects or branches that we want to behave as if their CI/CD configuration was local (if we point to a fixed version of the common repo the way everything is going to work is almost the same as having the pipelines directly in our repo). But while developing pipelines using branches as references is a really useful option; it allows us to re-run the jobs that we want to test and they will download the latest versions of the asset files on the branch, speeding up the testing process. However keep in mind that the trick only works with the asset files, if we change a job or a pipeline on the YAML files restarting the job is not enough to test the new version as the restart uses the same job created with the current pipeline. To try the updated jobs we have to create a new pipeline using a new action against the repository or executing the pipeline manually.

ConclusionFor now I m using the second solution and as it is working well my guess is that I ll keep using that approach unless giltab itself provides a better or simpler way of doing the same thing.

18 July 2023

Sergio Talens-Oliag: Testing cilium with k3d and kind

This post describes how to deploy cilium (and hubble) using docker on a Linux system with k3d or kind to test it as CNI and Service Mesh. I wrote some scripts to do a local installation and evaluate cilium to use it at work (in fact we are using cilium on an EKS cluster now), but I thought it would be a good idea to share my original scripts in this blog just in case they are useful to somebody, at least for playing a little with the technology.

InstallationFor each platform we are going to deploy two clusters on the same docker network; I ve chosen this model because it allows the containers to see the addresses managed by metallb from both clusters (the idea is to use those addresses for load balancers and treat them as if they were public). The installation(s) use cilium as CNI, metallb for BGP (I tested the cilium options, but I wasn t able to configure them right) and nginx as the ingress controller (again, I tried to use cilium but something didn t work either). To be able to use the previous components some default options have been disabled on k3d and kind and, in the case of k3d, a lot of k3s options (traefik, servicelb, kubeproxy, network-policy, ) have also been disabled to avoid conflicts. To use the scripts we need to install cilium, docker, helm, hubble, k3d, kind, kubectl and tmpl in our system. After cloning the repository, the sbin/tools.sh script can be used to do that on a linux-amd64 system:
$ git clone https://gitea.mixinet.net/blogops/cilium-docker.git
$ cd cilium-docker
$ ./sbin/tools.sh apps
Once we have the tools, to install everything on k3d (for kind replace k3d by kind) we can use the sbin/cilium-install.sh script as follows:
$ # Deploy first k3d cluster with cilium & cluster-mesh
$ ./sbin/cilium-install.sh k3d 1 full
[...]
$ # Deploy second k3d cluster with cilium & cluster-mesh
$ ./sbin/cilium-install.sh k3d 2 full
[...]
$ # The 2nd cluster-mesh installation connects the clusters
If we run the command cilium status after the installation we should get an output similar to the one seen on the following screenshot:
cilium status
The installation script uses the following templates:
Once we have finished our tests we can remove the installation using the sbin/cilium-remove.sh script.

Some notes about the configuration
  • As noted on the documentation, the cilium deployment needs to mount the bpffs on /sys/fs/bpf and cgroupv2 on /run/cilium/cgroupv2; that is done automatically on kind, but fails on k3d because the image does not include bash (see this issue).To fix it we mount a script on all the k3d containers that is executed each time they are started (the script is mounted as /bin/k3d-entrypoint-cilium.sh because the /bin/k3d-entrypoint.sh script executes the scripts that follow the pattern /bin/k3d-entrypoint-*.sh before launching the k3s daemon). The source code of the script is available here.
  • When testing the multi-cluster deployment with k3d we have found issues with open files, looks like they are related to inotify (see this page on the kind documentation); adding the following to the /etc/sysctl.conf file fixed the issue:
    # fix inotify issues with docker & k3d
    fs.inotify.max_user_watches = 524288
    fs.inotify.max_user_instances = 512
  • Although the deployment theoretically supports it, we are not using cilium as the cluster ingress yet (it did not work, so it is no longer enabled) and we are also ignoring the gateway-api for now.
  • The documentation uses the cilium cli to do all the installations, but I noticed that following that route the current version does not work right with hubble (it messes up the TLS support, there are some notes about the problems on this cilium issue), so we are deploying with helm right now.The problem with the helm approach is that there is no official documentation on how to install the cluster mesh with it (there is a request for documentation here), so we are using the cilium cli for now and it looks that it does not break the hubble configuration.

TestsTo test cilium we have used some scripts & additional config files that are available on the test sub directory of the repository:
  • cilium-connectivity.sh: a script that runs the cilium connectivity test for one cluster or in multi cluster mode (for mesh testing).If we export the variable HUBBLE_PF=true the script executes the command cilium hubble port-forward before launching the tests.
  • http-sw.sh: Simple tests for cilium policies from the cilium demo; the script deploys the Star Wars demo application and allows us to add the L3/L4 policy or the L3/L4/L7 policy, test the connectivity and view the policies.
  • ingress-basic.sh: This test is for checking the ingress controller, it is prepared to work against cilium and nginx, but as explained before the use of cilium as an ingress controller is not working as expected, so the idea is to call it with nginx always as the first argument for now.
  • mesh-test.sh: Tool to deploy a global service on two clusters, change the service affinity to local or remote, enable or disable if the service is shared and test how the tools respond.

Running the testsThe cilium-connectivity.sh executes the standard cilium tests:
$ ./test/cilium-connectivity.sh k3d 12
   Monitor aggregation detected, will skip some flow validation
steps
  [k3d-cilium1] Creating namespace cilium-test for connectivity
check...
  [k3d-cilium2] Creating namespace cilium-test for connectivity
check...
[...]
  All 33 tests (248 actions) successful, 2 tests skipped,
0 scenarios skipped.
To test how the cilium policies work use the http-sw.sh script:
kubectx k3d-cilium2 # (just in case)
# Create test namespace and services
./test/http-sw.sh create
# Test without policies (exaust-port fails by design)
./test/http-sw.sh test
# Create and view L3/L4 CiliumNetworkPolicy
./test/http-sw.sh policy-l34
# Test policy (no access from xwing, exaust-port fails)
./test/http-sw.sh test
# Create and view L7 CiliumNetworkPolicy
./test/http-sw.sh policy-l7
# Test policy (no access from xwing, exaust-port returns 403)
./test/http-sw.sh test
# Delete http-sw test
./test/http-sw.sh delete
And to see how the service mesh works use the mesh-test.sh script:
# Create services on both clusters and test
./test/mesh-test.sh k3d create
./test/mesh-test.sh k3d test
# Disable service sharing from cluster 1 and test
./test/mesh-test.sh k3d svc-shared-false
./test/mesh-test.sh k3d test
# Restore sharing, set local affinity and test
./test/mesh-test.sh k3d svc-shared-default
./test/mesh-test.sh k3d svc-affinity-local
./test/mesh-test.sh k3d test
# Delete deployment from cluster 1 and test
./test/mesh-test.sh k3d delete-deployment
./test/mesh-test.sh k3d test
# Delete test
./test/mesh-test.sh k3d delete

9 October 2022

Sergio Talens-Oliag: Shared networking for Virtual Machines and Containers

This entry explains how I have configured a linux bridge, dnsmasq and iptables to be able to run and communicate different virtualization systems and containers on laptops running Debian GNU/Linux. I ve used different variations of this setup for a long time with VirtualBox and KVM for the Virtual Machines and Linux-VServer, OpenVZ, LXC and lately Docker or Podman for the Containers.

Required packagesI m running Debian Sid with systemd and network-manager to configure the WiFi and Ethernet interfaces, but for the bridge I use bridge-utils with ifupdown (as I said this setup is old, I guess ifupdow2 and ifupdown-ng will work too). To start and stop the DNS and DHCP services and add NAT rules when the bridge is brought up or down I execute a script that uses:
  • ip from iproute2 to get the network information,
  • dnsmasq to provide the DNS and DHCP services (currently only the dnsmasq-base package is needed and it is recommended by network-manager, so it is probably installed),
  • iptables to configure NAT (for now docker kind of forces me to keep using iptables, but at some point I d like to move to nftables).
To make sure you have everything installed you can run the following command:
sudo apt install bridge-utils dnsmasq-base ifupdown iproute2 iptables

Bridge configurationThe bridge configuration for ifupdow is available on the file /etc/network/interfaces.d/vmbr0:
# Virtual servers NAT Bridge
auto vmbr0
iface vmbr0 inet static
    address         10.0.4.1
    network         10.0.4.0
    netmask         255.255.255.0
    broadcast       10.0.4.255
    bridge_ports    none
    bridge_maxwait  0
    up              /usr/local/sbin/vmbridge $ IFACE  start nat
    pre-down        /usr/local/sbin/vmbridge $ IFACE  stop nat
Warning: To use a separate file with ifupdown make sure that /etc/network/interfaces contains the line:
source /etc/network/interfaces.d/*
or add its contents to /etc/network/interfaces directly, if you prefer.
This configuration creates a bridge with the address 10.0.4.1 and assumes that the machines connected to it will use the 10.0.4.0/24 network; you can change the network address if you want, as long as you use a private range and it does not collide with networks used in your Virtual Machines all should be OK. The vmbridge script is used to start the dnsmasq server and setup the NAT rules when the interface is brought up and remove the firewall rules and stop the dnsmasq server when it is brought down.

The vmbridge scriptThe vmbridge script launches an instance of dnsmasq that binds to the bridge interface (vmbr0 in our case) that is used as DNS and DHCP server. The DNS server reads the /etc/hosts file to publish local DNS names and forwards all the other requests to the the dnsmasq server launched by NetworkManager that is listening on the loopback interface. As this server already does catching we disable it for our server, with the added advantage that, if we change networks, new requests go to the new resolvers because the DNS server handled by NetworkManager gets restarted and flushes its cache (this is useful if we connect to a new network that has internal DNS servers that are configured to do split DNS for internal services; if we use this model all requests get the internal address as soon as the DNS server is queried again). The DHCP server is configured to provide IPs to unknown hosts for a sub range of the addresses on the bridge network and use fixed IPs if the /etc/ethers file has a MAC with a matching hostname on the /etc/hosts file. To make things work with old DHCP clients the script also adds checksums to the DHCP packets using iptables (when the interface is not linked to a physical device the kernel does not add checksums, but we can fix it adding a rule on the mangle table). If we want external connectivity we can pass the nat argument and then the script creates a MASQUERADE rule for the bridge network and enables IP forwarding. The script source code is the following:
/usr/local/sbin/vmbridge
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
LOCAL_DOMAIN="vmnet"
MIN_IP_LEASE="192"
MAX_IP_LEASE="223"
# ---------
# FUNCTIONS
# ---------
get_net()  
  NET="$(
    ip a ls "$ BRIDGE " 2>/dev/null   sed -ne 's/^.*inet \(.*\) brd.*$/\1/p'
  )"
  [ "$NET" ]   return 1
 
checksum_fix_start()  
  iptables -t mangle -A POSTROUTING -o "$ BRIDGE " -p udp --dport 68 \
    -j CHECKSUM --checksum-fill 2>/dev/null   true
 
checksum_fix_stop()  
  iptables -t mangle -D POSTROUTING -o "$ BRIDGE " -p udp --dport 68 \
    -j CHECKSUM --checksum-fill 2>/dev/null   true
 
nat_start()  
  [ "$NAT" = "yes" ]   return 0
  # Configure NAT
  iptables -t nat -A POSTROUTING -s "$ NET " ! -d "$ NET " -j MASQUERADE
  # Enable forwarding (just in case)
  echo 1 >/proc/sys/net/ipv4/ip_forward
 
nat_stop()  
  [ "$NAT" = "yes" ]   return 0
  iptables -t nat -D POSTROUTING -s "$ NET " ! -d "$ NET " \
    -j MASQUERADE 2>/dev/null   true
 
do_start()  
  # Bridge address
  _addr="$ NET%%/* "
  # DNS leases (between .MIN_IP_LEASE and .MAX_IP_LEASE)
  _dhcp_range="$ _addr%.* .$ MIN_IP_LEASE ,$ _addr%.* .$ MAX_IP_LEASE "
  # Bridge mtu
  _mtu="$(
    ip link show dev "$ BRIDGE "  
      sed -n -e '/mtu/   s/^.*mtu \([0-9]\+\).*$/\1/p  '
  )"
  # Compute extra dnsmasq options
  dnsmasq_extra_opts=""
  # Disable gateway when not using NAT
  if [ "$NAT" != "yes" ]; then
    dnsmasq_extra_opts="$dnsmasq_extra_opts --dhcp-option=3"
  fi
  # Adjust MTU size if needed
  if [ -n "$_mtu" ] && [ "$_mtu" -ne "1500" ]; then
    dnsmasq_extra_opts="$dnsmasq_extra_opts --dhcp-option=26,$_mtu"
  fi
  # shellcheck disable=SC2086
  dnsmasq --bind-interfaces \
    --cache-size="0" \
    --conf-file="/dev/null" \
    --dhcp-authoritative \
    --dhcp-leasefile="/var/lib/misc/dnsmasq.$ BRIDGE .leases" \
    --dhcp-no-override \
    --dhcp-range "$ _dhcp_range " \
    --domain="$ LOCAL_DOMAIN " \
    --except-interface="lo" \
    --expand-hosts \
    --interface="$ BRIDGE " \
    --listen-address "$ _addr " \
    --no-resolv \
    --pid-file="$ PIDF " \
    --read-ethers \
    --server="127.0.0.1" \
    $dnsmasq_extra_opts
  checksum_fix_start
  nat_start
 
do_stop()  
  nat_stop
  checksum_fix_stop
  if [ -f "$ PIDF " ]; then
    kill "$(cat "$ PIDF ")"   true
    rm -f "$ PIDF "
  fi
 
do_status()  
  if [ -f "$ PIDF " ] && kill -HUP "$(cat "$ PIDF ")"; then
    echo "dnsmasq RUNNING"
  else
    echo "dnsmasq NOT running"
  fi
 
do_reload()  
  [ -f "$ PIDF " ] && kill -HUP "$(cat "$ PIDF ")"
 
usage()  
  echo "Uso: $0 BRIDGE (start stop [nat]) status reload"
  exit 1
 
# ----
# MAIN
# ----
[ "$#" -ge "2" ]   usage
BRIDGE="$1"
OPTION="$2"
shift 2
NAT="no"
for arg in "$@"; do
  case "$arg" in
  nat) NAT="yes" ;;
  *) echo "Unknown arg '$arg'" && exit 1 ;;
  esac
done
PIDF="/var/run/vmbridge-$ BRIDGE -dnsmasq.pid"
case "$OPTION" in
start) get_net && do_start ;;
stop) get_net && do_stop ;;
status) do_status ;;
reload) get_net && do_reload ;;
*) echo "Unknown command '$OPTION'" && exit 1 ;;
esac
# vim: ts=2:sw=2:et:ai:sts=2

NetworkManager ConfigurationThe default /etc/NetworkManager/NetworkManager.conf file has the following contents:
[main]
plugins=ifupdown,keyfile
[ifupdown]
managed=false
Which means that it will leave interfaces managed by ifupdown alone and, by default, will send the connection DNS configuration to systemd-resolved if it is installed. As we want to use dnsmasq for DNS resolution, but we don t want NetworkManager to modify our /etc/resolv.conf we are going to add the following file (/etc/NetworkManager/conf.d/dnsmasq.conf) to our system:
/etc/NetworkManager/conf.d/dnsmasq.conf
[main]
dns=dnsmasq
rc-manager=unmanaged
and restart the NetworkManager service:
$ sudo systemctl restart NetworkManager.service
From now on the NetworkManager will start a dnsmasq service that queries the servers provided by the DHCP servers we connect to on 127.0.0.1:53 but will not touch our /etc/resolv.conf file.

Configuring systemd-resolvedIf we start using our own name server but our system has systemd-resolved installed we will no longer need or use the DNS stub; programs using it will use our dnsmasq server directly now, but we keep running systemd-resolved for the host programs that use its native api or access it through /etc/nsswitch.conf (when libnss-resolve is installed). To disable the stub we add a /etc/systemd/resolved.conf.d/disable-stub.conf file to our machine with the following content:
# Disable the DNS Stub Listener, we use our own dnsmasq
[Resolve]
DNSStubListener=no
and restart the systemd-resolved to make sure that the stub is stopped:
$ sudo systemctl restart systemd-resolved.service

Adjusting /etc/resolv.confFirst we remove the existing /etc/resolv.conf file (it does not matter if it is a link or a regular file) and then create a new one that contains at least the following line (we can add a search line if is useful for us):
nameserver 10.0.4.1
From now on we will be using the dnsmasq server launched when we bring up the vmbr0 for multiple systems:
  • as our main DNS server from the host (if we use the standard /etc/nsswitch.conf and libnss-resolve is installed it is queried first, but the systemd-resolved uses it as forwarder by default if needed),
  • as the DNS server of the Virtual Machines or containers that use DHCP for network configuration and attach their virtual interfaces to our bridge,
  • as the DNS server of docker containers that get the DNS information from /etc/resolv.conf (if we have entries that use loopback addresses the containers that don t use the host network tend to fail, as those addresses inside the running containers are not linked to the loopback device of the host).

TestingAfter all the configuration files and scripts are in place we just need to bring up the bridge interface and check that everything works:
$ # Bring interface up
$ sudo ifup vmbr0
$ # Check that it is available
$ ip a ls dev vmbr0
4: vmbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
          group default qlen 1000
    link/ether 0a:b8:ef:b8:07:6c brd ff:ff:ff:ff:ff:ff
    inet 10.0.4.1/24 brd 10.0.4.255 scope global vmbr0
       valid_lft forever preferred_lft forever
$ # View the listening ports used by our dnsmasq servers
$ sudo ss -tulpan   grep dnsmasq
udp UNCONN 0 0  127.0.0.1:53     0.0.0.0:* users:(("dnsmasq",pid=1733930,fd=4))
udp UNCONN 0 0  10.0.4.1:53      0.0.0.0:* users:(("dnsmasq",pid=1705267,fd=6))
udp UNCONN 0 0  0.0.0.0%vmbr0:67 0.0.0.0:* users:(("dnsmasq",pid=1705267,fd=4))
tcp LISTEN 0 32 10.0.4.1:53      0.0.0.0:* users:(("dnsmasq",pid=1705267,fd=7))
tcp LISTEN 0 32 127.0.0.1:53     0.0.0.0:* users:(("dnsmasq",pid=1733930,fd=5))
$ # Verify that the DNS server works on the vmbr0 address
$ host www.debian.org 10.0.4.1
Name: 10.0.4.1
Address: 10.0.4.1#53
Aliases:
www.debian.org has address 130.89.148.77
www.debian.org has IPv6 address 2001:67c:2564:a119::77

Managing running systemsIf we want to update DNS entries and/or MAC addresses we can edit the /etc/hosts and /etc/ethers files and reload the dnsmasq configuration using the vmbridge script:
$ sudo /usr/local/sbin/vmbridge vmbr0 reload
That call sends a signal to the running dnsmasq server and it reloads the files; after that we can refresh the DHCP addresses from the client machines or start using the new DNS names immediately.

25 September 2022

Sergio Talens-Oliag: Kubernetes Static Content Server

This post describes how I ve put together a simple static content server for kubernetes clusters using a Pod with a persistent volume and multiple containers: an sftp server to manage contents, a web server to publish them with optional access control and another one to run scripts which need access to the volume filesystem. The sftp server runs using MySecureShell, the web server is nginx and the script runner uses the webhook tool to publish endpoints to call them (the calls will come from other Pods that run backend servers or are executed from Jobs or CronJobs).

HistoryThe system was developed because we had a NodeJS API with endpoints to upload files and store them on S3 compatible services that were later accessed via HTTPS, but the requirements changed and we needed to be able to publish folders instead of individual files using their original names and apply access restrictions using our API. Thinking about our requirements the use of a regular filesystem to keep the files and folders was a good option, as uploading and serving files is simple. For the upload I decided to use the sftp protocol, mainly because I already had an sftp container image based on mysecureshell prepared; once we settled on that we added sftp support to the API server and configured it to upload the files to our server instead of using S3 buckets. To publish the files we added a nginx container configured to work as a reverse proxy that uses the ngx_http_auth_request_module to validate access to the files (the sub request is configurable, in our deployment we have configured it to call our API to check if the user can access a given URL). Finally we added a third container when we needed to execute some tasks directly on the filesystem (using kubectl exec with the existing containers did not seem a good idea, as that is not supported by CronJobs objects, for example). The solution we found avoiding the NIH Syndrome (i.e. write our own tool) was to use the webhook tool to provide the endpoints to call the scripts; for now we have three:
  • one to get the disc usage of a PATH,
  • one to hardlink all the files that are identical on the filesystem,
  • one to copy files and folders from S3 buckets to our filesystem.

Container definitions

mysecureshellThe mysecureshell container can be used to provide an sftp service with multiple users (although the files are owned by the same UID and GID) using standalone containers (launched with docker or podman) or in an orchestration system like kubernetes, as we are going to do here. The image is generated using the following Dockerfile:
ARG ALPINE_VERSION=3.16.2
FROM alpine:$ALPINE_VERSION as builder
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
RUN apk update &&\
 apk add --no-cache alpine-sdk git musl-dev &&\
 git clone https://github.com/sto/mysecureshell.git &&\
 cd mysecureshell &&\
 ./configure --prefix=/usr --sysconfdir=/etc --mandir=/usr/share/man\
 --localstatedir=/var --with-shutfile=/var/lib/misc/sftp.shut --with-debug=2 &&\
 make all && make install &&\
 rm -rf /var/cache/apk/*
FROM alpine:$ALPINE_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
COPY --from=builder /usr/bin/mysecureshell /usr/bin/mysecureshell
COPY --from=builder /usr/bin/sftp-* /usr/bin/
RUN apk update &&\
 apk add --no-cache openssh shadow pwgen &&\
 sed -i -e "s ^.*\(AuthorizedKeysFile\).*$ \1 /etc/ssh/auth_keys/%u "\
 /etc/ssh/sshd_config &&\
 mkdir /etc/ssh/auth_keys &&\
 cat /dev/null > /etc/motd &&\
 add-shell '/usr/bin/mysecureshell' &&\
 rm -rf /var/cache/apk/*
COPY bin/* /usr/local/bin/
COPY etc/sftp_config /etc/ssh/
COPY entrypoint.sh /
EXPOSE 22
VOLUME /sftp
ENTRYPOINT ["/entrypoint.sh"]
CMD ["server"]
The /etc/sftp_config file is used to configure the mysecureshell server to have all the user homes under /sftp/data, only allow them to see the files under their home directories as if it were at the root of the server and close idle connections after 5m of inactivity:
etc/sftp_config
# Default mysecureshell configuration
<Default>
   # All users will have access their home directory under /sftp/data
   Home /sftp/data/$USER
   # Log to a file inside /sftp/logs/ (only works when the directory exists)
   LogFile /sftp/logs/mysecureshell.log
   # Force users to stay in their home directory
   StayAtHome true
   # Hide Home PATH, it will be shown as /
   VirtualChroot true
   # Hide real file/directory owner (just change displayed permissions)
   DirFakeUser true
   # Hide real file/directory group (just change displayed permissions)
   DirFakeGroup true
   # We do not want users to keep forever their idle connection
   IdleTimeOut 5m
</Default>
# vim: ts=2:sw=2:et
The entrypoint.sh script is the one responsible to prepare the container for the users included on the /secrets/user_pass.txt file (creates the users with their HOME directories under /sftp/data and a /bin/false shell and creates the key files from /secrets/user_keys.txt if available). The script expects a couple of environment variables:
  • SFTP_UID: UID used to run the daemon and for all the files, it has to be different than 0 (all the files managed by this daemon are going to be owned by the same user and group, even if the remote users are different).
  • SFTP_GID: GID used to run the daemon and for all the files, it has to be different than 0.
And can use the SSH_PORT and SSH_PARAMS values if present. It also requires the following files (they can be mounted as secrets in kubernetes):
  • /secrets/host_keys.txt: Text file containing the ssh server keys in mime format; the file is processed using the reformime utility (the one included on busybox) and can be generated using the gen-host-keys script included on the container (it uses ssh-keygen and makemime).
  • /secrets/user_pass.txt: Text file containing lines of the form username:password_in_clear_text (only the users included on this file are available on the sftp server, in fact in our deployment we use only the scs user for everything).
And optionally can use another one:
  • /secrets/user_keys.txt: Text file that contains lines of the form username:public_ssh_ed25519_or_rsa_key; the public keys are installed on the server and can be used to log into the sftp server if the username exists on the user_pass.txt file.
The contents of the entrypoint.sh script are:
entrypoint.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
# Expects SSH_UID & SSH_GID on the environment and uses the value of the
# SSH_PORT & SSH_PARAMS variables if present
# SSH_PARAMS
SSH_PARAMS="-D -e -p $ SSH_PORT:=22  $ SSH_PARAMS "
# Fixed values
# DIRECTORIES
HOME_DIR="/sftp/data"
CONF_FILES_DIR="/secrets"
AUTH_KEYS_PATH="/etc/ssh/auth_keys"
# FILES
HOST_KEYS="$CONF_FILES_DIR/host_keys.txt"
USER_KEYS="$CONF_FILES_DIR/user_keys.txt"
USER_PASS="$CONF_FILES_DIR/user_pass.txt"
USER_SHELL_CMD="/usr/bin/mysecureshell"
# TYPES
HOST_KEY_TYPES="dsa ecdsa ed25519 rsa"
# ---------
# FUNCTIONS
# ---------
# Validate HOST_KEYS, USER_PASS, SFTP_UID and SFTP_GID
_check_environment()  
  # Check the ssh server keys ... we don't boot if we don't have them
  if [ ! -f "$HOST_KEYS" ]; then
    cat <<EOF
We need the host keys on the '$HOST_KEYS' file to proceed.
Call the 'gen-host-keys' script to create and export them on a mime file.
EOF
    exit 1
  fi
  # Check that we have users ... if we don't we can't continue
  if [ ! -f "$USER_PASS" ]; then
    cat <<EOF
We need at least the '$USER_PASS' file to provision users.
Call the 'gen-users-tar' script to create a tar file to create an archive that
contains public and private keys for users, a 'user_keys.txt' with the public
keys of the users and a 'user_pass.txt' file with random passwords for them 
(pass the list of usernames to it).
EOF
    exit 1
  fi
  # Check SFTP_UID
  if [ -z "$SFTP_UID" ]; then
    echo "The 'SFTP_UID' can't be empty, pass a 'GID'."
    exit 1
  fi
  if [ "$SFTP_UID" -eq "0" ]; then
    echo "The 'SFTP_UID' can't be 0, use a different 'UID'"
    exit 1
  fi
  # Check SFTP_GID
  if [ -z "$SFTP_GID" ]; then
    echo "The 'SFTP_GID' can't be empty, pass a 'GID'."
    exit 1
  fi
  if [ "$SFTP_GID" -eq "0" ]; then
    echo "The 'SFTP_GID' can't be 0, use a different 'GID'"
    exit 1
  fi
 
# Adjust ssh host keys
_setup_host_keys()  
  opwd="$(pwd)"
  tmpdir="$(mktemp -d)"
  cd "$tmpdir"
  ret="0"
  reformime <"$HOST_KEYS"   ret="1"
  for kt in $HOST_KEY_TYPES; do
    key="ssh_host_$ kt _key"
    pub="ssh_host_$ kt _key.pub"
    if [ ! -f "$key" ]; then
      echo "Missing '$key' file"
      ret="1"
    fi
    if [ ! -f "$pub" ]; then
      echo "Missing '$pub' file"
      ret="1"
    fi
    if [ "$ret" -ne "0" ]; then
      continue
    fi
    cat "$key" >"/etc/ssh/$key"
    chmod 0600 "/etc/ssh/$key"
    chown root:root "/etc/ssh/$key"
    cat "$pub" >"/etc/ssh/$pub"
    chmod 0600 "/etc/ssh/$pub"
    chown root:root "/etc/ssh/$pub"
  done
  cd "$opwd"
  rm -rf "$tmpdir"
  return "$ret"
 
# Create users
_setup_user_pass()  
  opwd="$(pwd)"
  tmpdir="$(mktemp -d)"
  cd "$tmpdir"
  ret="0"
  [ -d "$HOME_DIR" ]   mkdir "$HOME_DIR"
  # Make sure the data dir can be managed by the sftp user
  chown "$SFTP_UID:$SFTP_GID" "$HOME_DIR"
  # Allow the user (and root) to create directories inside the $HOME_DIR, if
  # we don't allow it the directory creation fails on EFS (AWS)
  chmod 0755 "$HOME_DIR"
  # Create users
  echo "sftp:sftp:$SFTP_UID:$SFTP_GID:::/bin/false" >"newusers.txt"
  sed -n "/^[^#]/   s/:/ /p  " "$USER_PASS"   while read -r _u _p; do
    echo "$_u:$_p:$SFTP_UID:$SFTP_GID::$HOME_DIR/$_u:$USER_SHELL_CMD"
  done >>"newusers.txt"
  newusers --badnames newusers.txt
  # Disable write permission on the directory to forbid remote sftp users to
  # remove their own root dir (they have already done it); we adjust that
  # here to avoid issues with EFS (see before)
  chmod 0555 "$HOME_DIR"
  # Clean up the tmpdir
  cd "$opwd"
  rm -rf "$tmpdir"
  return "$ret"
 
# Adjust user keys
_setup_user_keys()  
  if [ -f "$USER_KEYS" ]; then
    sed -n "/^[^#]/   s/:/ /p  " "$USER_KEYS"   while read -r _u _k; do
      echo "$_k" >>"$AUTH_KEYS_PATH/$_u"
    done
  fi
 
# Main function
exec_sshd()  
  _check_environment
  _setup_host_keys
  _setup_user_pass
  _setup_user_keys
  echo "Running: /usr/sbin/sshd $SSH_PARAMS"
  # shellcheck disable=SC2086
  exec /usr/sbin/sshd -D $SSH_PARAMS
 
# ----
# MAIN
# ----
case "$1" in
"server") exec_sshd ;;
*) exec "$@" ;;
esac
# vim: ts=2:sw=2:et
The container also includes a couple of auxiliary scripts, the first one can be used to generate the host_keys.txt file as follows:
$ docker run --rm stodh/mysecureshell gen-host-keys > host_keys.txt
Where the script is as simple as:
bin/gen-host-keys
#!/bin/sh
set -e
# Generate new host keys
ssh-keygen -A >/dev/null
# Replace hostname
sed -i -e 's/@.*$/@mysecureshell/' /etc/ssh/ssh_host_*_key.pub
# Print in mime format (stdout)
makemime /etc/ssh/ssh_host_*
# vim: ts=2:sw=2:et
And there is another script to generate a .tar file that contains auth data for the list of usernames passed to it (the file contains a user_pass.txt file with random passwords for the users, public and private ssh keys for them and the user_keys.txt file that matches the generated keys). To generate a tar file for the user scs we can execute the following:
$ docker run --rm stodh/mysecureshell gen-users-tar scs > /tmp/scs-users.tar
To see the contents and the text inside the user_pass.txt file we can do:
$ tar tvf /tmp/scs-users.tar
-rw-r--r-- root/root        21 2022-09-11 15:55 user_pass.txt
-rw-r--r-- root/root       822 2022-09-11 15:55 user_keys.txt
-rw------- root/root       387 2022-09-11 15:55 id_ed25519-scs
-rw-r--r-- root/root        85 2022-09-11 15:55 id_ed25519-scs.pub
-rw------- root/root      3357 2022-09-11 15:55 id_rsa-scs
-rw------- root/root      3243 2022-09-11 15:55 id_rsa-scs.pem
-rw-r--r-- root/root       729 2022-09-11 15:55 id_rsa-scs.pub
$ tar xfO /tmp/scs-users.tar user_pass.txt
scs:20JertRSX2Eaar4x
The source of the script is:
bin/gen-users-tar
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
USER_KEYS_FILE="user_keys.txt"
USER_PASS_FILE="user_pass.txt"
# ---------
# MAIN CODE
# ---------
# Generate user passwords and keys, return 1 if no username is received
if [ "$#" -eq "0" ]; then
  return 1
fi
opwd="$(pwd)"
tmpdir="$(mktemp -d)"
cd "$tmpdir"
for u in "$@"; do
  ssh-keygen -q -a 100 -t ed25519 -f "id_ed25519-$u" -C "$u" -N ""
  ssh-keygen -q -a 100 -b 4096 -t rsa -f "id_rsa-$u" -C "$u" -N ""
  # Legacy RSA private key format
  cp -a "id_rsa-$u" "id_rsa-$u.pem"
  ssh-keygen -q -p -m pem -f "id_rsa-$u.pem" -N "" -P "" >/dev/null
  chmod 0600 "id_rsa-$u.pem"
  echo "$u:$(pwgen -s 16 1)" >>"$USER_PASS_FILE"
  echo "$u:$(cat "id_ed25519-$u.pub")" >>"$USER_KEYS_FILE"
  echo "$u:$(cat "id_rsa-$u.pub")" >>"$USER_KEYS_FILE"
done
tar cf - "$USER_PASS_FILE" "$USER_KEYS_FILE" id_* 2>/dev/null
cd "$opwd"
rm -rf "$tmpdir"
# vim: ts=2:sw=2:et

nginx-scsThe nginx-scs container is generated using the following Dockerfile:
ARG NGINX_VERSION=1.23.1
FROM nginx:$NGINX_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
RUN rm -f /docker-entrypoint.d/*
COPY docker-entrypoint.d/* /docker-entrypoint.d/
Basically we are removing the existing docker-entrypoint.d scripts from the standard image and adding a new one that configures the web server as we want using a couple of environment variables:
  • AUTH_REQUEST_URI: URL to use for the auth_request, if the variable is not found on the environment auth_request is not used.
  • HTML_ROOT: Base directory of the web server, if not passed the default /usr/share/nginx/html is used.
Note that if we don t pass the variables everything works as if we were using the original nginx image. The contents of the configuration script are:
docker-entrypoint.d/10-update-default-conf.sh
#!/bin/sh
# Replace the default.conf nginx file by our own version.
set -e
if [ -z "$HTML_ROOT" ]; then
  HTML_ROOT="/usr/share/nginx/html"
fi
if [ "$AUTH_REQUEST_URI" ]; then
  cat >/etc/nginx/conf.d/default.conf <<EOF
server  
  listen       80;
  server_name  localhost;
  location /  
    auth_request /.auth;
    root  $HTML_ROOT;
    index index.html index.htm;
   
  location /.auth  
    internal;
    proxy_pass $AUTH_REQUEST_URI;
    proxy_pass_request_body off;
    proxy_set_header Content-Length "";
    proxy_set_header X-Original-URI \$request_uri;
   
  error_page   500 502 503 504  /50x.html;
  location = /50x.html  
    root /usr/share/nginx/html;
   
 
EOF
else
  cat >/etc/nginx/conf.d/default.conf <<EOF
server  
  listen       80;
  server_name  localhost;
  location /  
    root  $HTML_ROOT;
    index index.html index.htm;
   
  error_page   500 502 503 504  /50x.html;
  location = /50x.html  
    root /usr/share/nginx/html;
   
 
EOF
fi
# vim: ts=2:sw=2:et
As we will see later the idea is to use the /sftp/data or /sftp/data/scs folder as the root of the web published by this container and create an Ingress object to provide access to it outside of our kubernetes cluster.

webhook-scsThe webhook-scs container is generated using the following Dockerfile:
ARG ALPINE_VERSION=3.16.2
ARG GOLANG_VERSION=alpine3.16
FROM golang:$GOLANG_VERSION AS builder
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
ENV WEBHOOK_VERSION 2.8.0
ENV WEBHOOK_PR 549
ENV S3FS_VERSION v1.91
WORKDIR /go/src/github.com/adnanh/webhook
RUN apk update &&\
 apk add --no-cache -t build-deps curl libc-dev gcc libgcc patch
RUN curl -L --silent -o webhook.tar.gz\
 https://github.com/adnanh/webhook/archive/$ WEBHOOK_VERSION .tar.gz &&\
 tar xzf webhook.tar.gz --strip 1 &&\
 curl -L --silent -o $ WEBHOOK_PR .patch\
 https://patch-diff.githubusercontent.com/raw/adnanh/webhook/pull/$ WEBHOOK_PR .patch &&\
 patch -p1 < $ WEBHOOK_PR .patch &&\
 go get -d && \
 go build -o /usr/local/bin/webhook
WORKDIR /src/s3fs-fuse
RUN apk update &&\
 apk add ca-certificates build-base alpine-sdk libcurl automake autoconf\
 libxml2-dev libressl-dev mailcap fuse-dev curl-dev
RUN curl -L --silent -o s3fs.tar.gz\
 https://github.com/s3fs-fuse/s3fs-fuse/archive/refs/tags/$S3FS_VERSION.tar.gz &&\
 tar xzf s3fs.tar.gz --strip 1 &&\
 ./autogen.sh &&\
 ./configure --prefix=/usr/local &&\
 make -j && \
 make install
FROM alpine:$ALPINE_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
WORKDIR /webhook
RUN apk update &&\
 apk add --no-cache ca-certificates mailcap fuse libxml2 libcurl libgcc\
 libstdc++ rsync util-linux-misc &&\
 rm -rf /var/cache/apk/*
COPY --from=builder /usr/local/bin/webhook /usr/local/bin/webhook
COPY --from=builder /usr/local/bin/s3fs /usr/local/bin/s3fs
COPY entrypoint.sh /
COPY hooks/* ./hooks/
EXPOSE 9000
ENTRYPOINT ["/entrypoint.sh"]
CMD ["server"]
Again, we use a multi-stage build because in production we wanted to support a functionality that is not already on the official versions (streaming the command output as a response instead of waiting until the execution ends); this time we build the image applying the PATCH included on this pull request against a released version of the source instead of creating a fork. The entrypoint.sh script is used to generate the webhook configuration file for the existing hooks using environment variables (basically the WEBHOOK_WORKDIR and the *_TOKEN variables) and launch the webhook service:
entrypoint.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
WEBHOOK_BIN="$ WEBHOOK_BIN:-/webhook/hooks "
WEBHOOK_YML="$ WEBHOOK_YML:-/webhook/scs.yml "
WEBHOOK_OPTS="$ WEBHOOK_OPTS:--verbose "
# ---------
# FUNCTIONS
# ---------
print_du_yml()  
  cat <<EOF
- id: du
  execute-command: '$WEBHOOK_BIN/du.sh'
  command-working-directory: '$WORKDIR'
  response-headers:
  - name: 'Content-Type'
    value: 'application/json'
  http-methods: ['GET']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
  pass-arguments-to-command:
  - source: 'url'
    name: 'path'
  pass-environment-to-command:
  - source: 'string'
    envname: 'OUTPUT_FORMAT'
    name: 'json'
EOF
 
print_hardlink_yml()  
  cat <<EOF
- id: hardlink
  execute-command: '$WEBHOOK_BIN/hardlink.sh'
  command-working-directory: '$WORKDIR'
  http-methods: ['GET']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
EOF
 
print_s3sync_yml()  
  cat <<EOF
- id: s3sync
  execute-command: '$WEBHOOK_BIN/s3sync.sh'
  command-working-directory: '$WORKDIR'
  http-methods: ['POST']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
  pass-environment-to-command:
  - source: 'payload'
    envname: 'AWS_KEY'
    name: 'aws.key'
  - source: 'payload'
    envname: 'AWS_SECRET_KEY'
    name: 'aws.secret_key'
  - source: 'payload'
    envname: 'S3_BUCKET'
    name: 's3.bucket'
  - source: 'payload'
    envname: 'S3_REGION'
    name: 's3.region'
  - source: 'payload'
    envname: 'S3_PATH'
    name: 's3.path'
  - source: 'payload'
    envname: 'SCS_PATH'
    name: 'scs.path'
  stream-command-output: true
EOF
 
print_token_yml()  
  if [ "$1" ]; then
    cat << EOF
  trigger-rule:
    match:
      type: 'value'
      value: '$1'
      parameter:
        source: 'header'
        name: 'X-Webhook-Token'
EOF
  fi
 
exec_webhook()  
  # Validate WORKDIR
  if [ -z "$WEBHOOK_WORKDIR" ]; then
    echo "Must define the WEBHOOK_WORKDIR variable!" >&2
    exit 1
  fi
  WORKDIR="$(realpath "$WEBHOOK_WORKDIR" 2>/dev/null)"   true
  if [ ! -d "$WORKDIR" ]; then
    echo "The WEBHOOK_WORKDIR '$WEBHOOK_WORKDIR' is not a directory!" >&2
    exit 1
  fi
  # Get TOKENS, if the DU_TOKEN or HARDLINK_TOKEN is defined that is used, if
  # not if the COMMON_TOKEN that is used and in other case no token is checked
  # (that is the default)
  DU_TOKEN="$ DU_TOKEN:-$COMMON_TOKEN "
  HARDLINK_TOKEN="$ HARDLINK_TOKEN:-$COMMON_TOKEN "
  S3_TOKEN="$ S3_TOKEN:-$COMMON_TOKEN "
  # Create webhook configuration
    
    print_du_yml
    print_token_yml "$DU_TOKEN"
    echo ""
    print_hardlink_yml
    print_token_yml "$HARDLINK_TOKEN"
    echo ""
    print_s3sync_yml
    print_token_yml "$S3_TOKEN"
   >"$WEBHOOK_YML"
  # Run the webhook command
  # shellcheck disable=SC2086
  exec webhook -hooks "$WEBHOOK_YML" $WEBHOOK_OPTS
 
# ----
# MAIN
# ----
case "$1" in
"server") exec_webhook ;;
*) exec "$@" ;;
esac
The entrypoint.sh script generates the configuration file for the webhook server calling functions that print a yaml section for each hook and optionally adds rules to validate access to them comparing the value of a X-Webhook-Token header against predefined values. The expected token values are taken from environment variables, we can define a token variable for each hook (DU_TOKEN, HARDLINK_TOKEN or S3_TOKEN) and a fallback value (COMMON_TOKEN); if no token variable is defined for a hook no check is done and everybody can call it. The Hook Definition documentation explains the options you can use for each hook, the ones we have right now do the following:
  • du: runs on the $WORKDIR directory, passes as first argument to the script the value of the path query parameter and sets the variable OUTPUT_FORMAT to the fixed value json (we use that to print the output of the script in JSON format instead of text).
  • hardlink: runs on the $WORKDIR directory and takes no parameters.
  • s3sync: runs on the $WORKDIR directory and sets a lot of environment variables from values read from the JSON encoded payload sent by the caller (all the values must be sent by the caller even if they are assigned an empty value, if they are missing the hook fails without calling the script); we also set the stream-command-output value to true to make the script show its output as it is working (we patched the webhook source to be able to use this option).

The du hook scriptThe du hook script code checks if the argument passed is a directory, computes its size using the du command and prints the results in text format or as a JSON dictionary:
hooks/du.sh
#!/bin/sh
set -e
# Script to print disk usage for a PATH inside the scs folder
# ---------
# FUNCTIONS
# ---------
print_error()  
  if [ "$OUTPUT_FORMAT" = "json" ]; then
    echo " \"error\":\"$*\" "
  else
    echo "$*" >&2
  fi
  exit 1
 
usage()  
  if [ "$OUTPUT_FORMAT" = "json" ]; then
    echo " \"error\":\"Pass arguments as '?path=XXX\" "
  else
    echo "Usage: $(basename "$0") PATH" >&2
  fi
  exit 1
 
# ----
# MAIN
# ----
if [ "$#" -eq "0" ]   [ -z "$1" ]; then
  usage
fi
if [ "$1" = "." ]; then
  DU_PATH="./"
else
  DU_PATH="$(find . -name "$1" -mindepth 1 -maxdepth 1)"   true
fi
if [ -z "$DU_PATH" ]   [ ! -d "$DU_PATH/." ]; then
  print_error "The provided PATH ('$1') is not a directory"
fi
# Print disk usage in bytes for the given PATH
OUTPUT="$(du -b -s "$DU_PATH")"
if [ "$OUTPUT_FORMAT" = "json" ]; then
  # Format output as  "path":"PATH","bytes":"BYTES" 
  echo "$OUTPUT"  
    sed -e "s%^\(.*\)\t.*/\(.*\)$% \"path\":\"\2\",\"bytes\":\"\1\" %"  
    tr -d '\n'
else
  # Print du output as is
  echo "$OUTPUT"
fi
# vim: ts=2:sw=2:et:ai:sts=2

The s3sync hook scriptThe s3sync hook script uses the s3fs tool to mount a bucket and synchronise data between a folder inside the bucket and a directory on the filesystem using rsync; all values needed to execute the task are taken from environment variables:
hooks/s3sync.sh
#!/bin/ash
set -euo pipefail
set -o errexit
set -o errtrace
# Functions
finish()  
  ret="$1"
  echo ""
  echo "Script exit code: $ret"
  exit "$ret"
 
# Check variables
if [ -z "$AWS_KEY" ]   [ -z "$AWS_SECRET_KEY" ]   [ -z "$S3_BUCKET" ]  
  [ -z "$S3_PATH" ]   [ -z "$SCS_PATH" ]; then
  [ "$AWS_KEY" ]   echo "Set the AWS_KEY environment variable"
  [ "$AWS_SECRET_KEY" ]   echo "Set the AWS_SECRET_KEY environment variable"
  [ "$S3_BUCKET" ]   echo "Set the S3_BUCKET environment variable"
  [ "$S3_PATH" ]   echo "Set the S3_PATH environment variable"
  [ "$SCS_PATH" ]   echo "Set the SCS_PATH environment variable"
  finish 1
fi
if [ "$S3_REGION" ] && [ "$S3_REGION" != "us-east-1" ]; then
  EP_URL="endpoint=$S3_REGION,url=https://s3.$S3_REGION.amazonaws.com"
else
  EP_URL="endpoint=us-east-1"
fi
# Prepare working directory
WORK_DIR="$(mktemp -p "$HOME" -d)"
MNT_POINT="$WORK_DIR/s3data"
PASSWD_S3FS="$WORK_DIR/.passwd-s3fs"
# Check the moutpoint
if [ ! -d "$MNT_POINT" ]; then
  mkdir -p "$MNT_POINT"
elif mountpoint "$MNT_POINT"; then
  echo "There is already something mounted on '$MNT_POINT', aborting!"
  finish 1
fi
# Create password file
touch "$PASSWD_S3FS"
chmod 0400 "$PASSWD_S3FS"
echo "$AWS_KEY:$AWS_SECRET_KEY" >"$PASSWD_S3FS"
# Mount s3 bucket as a filesystem
s3fs -o dbglevel=info,retries=5 -o "$EP_URL" -o "passwd_file=$PASSWD_S3FS" \
  "$S3_BUCKET" "$MNT_POINT"
echo "Mounted bucket '$S3_BUCKET' on '$MNT_POINT'"
# Remove the password file, just in case
rm -f "$PASSWD_S3FS"
# Check source PATH
ret="0"
SRC_PATH="$MNT_POINT/$S3_PATH"
if [ ! -d "$SRC_PATH" ]; then
  echo "The S3_PATH '$S3_PATH' can't be found!"
  ret=1
fi
# Compute SCS_UID & SCS_GID (by default based on the working directory owner)
SCS_UID="$ SCS_UID:=$(stat -c "%u" "." 2>/dev/null) "   true
SCS_GID="$ SCS_GID:=$(stat -c "%g" "." 2>/dev/null) "   true
# Check destination PATH
DST_PATH="./$SCS_PATH"
if [ "$ret" -eq "0" ] && [ -d "$DST_PATH" ]; then
  mkdir -p "$DST_PATH"   ret="$?"
fi
# Copy using rsync
if [ "$ret" -eq "0" ]; then
  rsync -rlptv --chown="$SCS_UID:$SCS_GID" --delete --stats \
    "$SRC_PATH/" "$DST_PATH/"   ret="$?"
fi
# Unmount the S3 bucket
umount -f "$MNT_POINT"
echo "Called umount for '$MNT_POINT'"
# Remove mount point dir
rmdir "$MNT_POINT"
# Remove WORK_DIR
rmdir "$WORK_DIR"
# We are done
finish "$ret"
# vim: ts=2:sw=2:et:ai:sts=2

Deployment objectsThe system is deployed as a StatefulSet with one replica. Our production deployment is done on AWS and to be able to scale we use EFS for our PersistenVolume; the idea is that the volume has no size limit, its AccessMode can be set to ReadWriteMany and we can mount it from multiple instances of the Pod without issues, even if they are in different availability zones. For development we use k3d and we are also able to scale the StatefulSet for testing because we use a ReadWriteOnce PVC, but it points to a hostPath that is backed up by a folder that is mounted on all the compute nodes, so in reality Pods in different k3d nodes use the same folder on the host.

secrets.yamlThe secrets file contains the files used by the mysecureshell container that can be generated using kubernetes pods as follows (we are only creating the scs user):
$ kubectl run "mysecureshell" --restart='Never' --quiet --rm --stdin \
  --image "stodh/mysecureshell:latest" -- gen-host-keys >"./host_keys.txt"
$ kubectl run "mysecureshell" --restart='Never' --quiet --rm --stdin \
  --image "stodh/mysecureshell:latest" -- gen-users-tar scs >"./users.tar"
Once we have the files we can generate the secrets.yaml file as follows:
$ tar xf ./users.tar user_keys.txt user_pass.txt
$ kubectl --dry-run=client -o yaml create secret generic "scs-secret" \
  --from-file="host_keys.txt=host_keys.txt" \
  --from-file="user_keys.txt=user_keys.txt" \
  --from-file="user_pass.txt=user_pass.txt" > ./secrets.yaml
The resulting secrets.yaml will look like the following file (the base64 would match the content of the files, of course):
secrets.yaml
apiVersion: v1
data:
  host_keys.txt: TWlt...
  user_keys.txt: c2Nz...
  user_pass.txt: c2Nz...
kind: Secret
metadata:
  creationTimestamp: null
  name: scs-secret

pvc.yamlThe persistent volume claim for a simple deployment (one with only one instance of the statefulSet) can be as simple as this:
pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
On this definition we don t set the storageClassName to use the default one.

Volumes in our development environment (k3d)In our development deployment we create the following PersistentVolume as required by the Local Persistence Volume Static Provisioner (note that the /volumes/scs-pv has to be created by hand, in our k3d system we mount the same host directory on the /volumes path of all the nodes and create the scs-pv directory by hand before deploying the persistent volume):
k3d-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: scs-pv
  labels:
    app.kubernetes.io/name: scs
spec:
  capacity:
    storage: 8Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  claimRef:
    name: scs-pvc
  storageClassName: local-storage
  local:
    path: /volumes/scs-pv
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
          - k3s
And to make sure that everything works as expected we update the PVC definition to add the right storageClassName:
k3d-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: local-storage

Volumes in our production environment (aws)In the production deployment we don t create the PersistentVolume (we are using the aws-efs-csi-driver which supports Dynamic Provisioning) but we add the storageClassName (we set it to the one mapped to the EFS driver, i.e. efs-sc) and set ReadWriteMany as the accessMode:
efs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 8Gi
  storageClassName: efs-sc

statefulset.yamlThe definition of the statefulSet is as follows:
statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: scs
  labels:
    app.kubernetes.io/name: scs
spec:
  serviceName: scs
  replicas: 1
  selector:
    matchLabels:
      app: scs
  template:
    metadata:
      labels:
        app: scs
    spec:
      containers:
      - name: nginx
        image: stodh/nginx-scs:latest
        ports:
        - containerPort: 80
          name: http
        env:
        - name: AUTH_REQUEST_URI
          value: ""
        - name: HTML_ROOT
          value: /sftp/data
        volumeMounts:
        - mountPath: /sftp
          name: scs-datadir
      - name: mysecureshell
        image: stodh/mysecureshell:latest
        ports:
        - containerPort: 22
          name: ssh
        securityContext:
          capabilities:
            add:
            - IPC_OWNER
        env:
        - name: SFTP_UID
          value: '2020'
        - name: SFTP_GID
          value: '2020'
        volumeMounts:
        - mountPath: /secrets
          name: scs-file-secrets
          readOnly: true
        - mountPath: /sftp
          name: scs-datadir
      - name: webhook
        image: stodh/webhook-scs:latest
        securityContext:
          privileged: true
        ports:
        - containerPort: 9000
          name: webhook-http
        env:
        - name: WEBHOOK_WORKDIR
          value: /sftp/data/scs
        volumeMounts:
        - name: devfuse
          mountPath: /dev/fuse
        - mountPath: /sftp
          name: scs-datadir
      volumes:
      - name: devfuse
        hostPath:
          path: /dev/fuse
      - name: scs-file-secrets
        secret:
          secretName: scs-secrets
      - name: scs-datadir
        persistentVolumeClaim:
          claimName: scs-pvc
Notes about the containers:
  • nginx: As this is an example the web server is not using an AUTH_REQUEST_URI and uses the /sftp/data directory as the root of the web (to get to the files uploaded for the scs user we will need to use /scs/ as a prefix on the URLs).
  • mysecureshell: We are adding the IPC_OWNER capability to the container to be able to use some of the sftp-* commands inside it, but they are not really needed, so adding the capability is optional.
  • webhook: We are launching this container in privileged mode to be able to use the s3fs-fuse, as it will not work otherwise for now (see this kubernetes issue); if the functionality is not needed the container can be executed with regular privileges; besides, as we are not enabling public access to this service we don t define *_TOKEN variables (if required the values should be read from a Secret object).
Notes about the volumes:
  • the devfuse volume is only needed if we plan to use the s3fs command on the webhook container, if not we can remove the volume definition and its mounts.

service.yamlTo be able to access the different services on the statefulset we publish the relevant ports using the following Service object:
service.yaml
apiVersion: v1
kind: Service
metadata:
  name: scs-svc
  labels:
    app.kubernetes.io/name: scs
spec:
  ports:
  - name: ssh
    port: 22
    protocol: TCP
    targetPort: 22
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: webhook-http
    port: 9000
    protocol: TCP
    targetPort: 9000
  selector:
    app: scs

ingress.yamlTo download the scs files from the outside we can add an ingress object like the following (the definition is for testing using the localhost name):
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: scs-ingress
  labels:
    app.kubernetes.io/name: scs
spec:
  ingressClassName: nginx
  rules:
  - host: 'localhost'
    http:
      paths:
      - path: /scs
        pathType: Prefix
        backend:
          service:
            name: scs-svc
            port:
              number: 80

DeploymentTo deploy the statefulSet we create a namespace and apply the object definitions shown before:
$ kubectl create namespace scs-demo
namespace/scs-demo created
$ kubectl -n scs-demo apply -f secrets.yaml
secret/scs-secrets created
$ kubectl -n scs-demo apply -f pvc.yaml
persistentvolumeclaim/scs-pvc created
$ kubectl -n scs-demo apply -f statefulset.yaml
statefulset.apps/scs created
$ kubectl -n scs-demo apply -f service.yaml
service/scs-svc created
$ kubectl -n scs-demo apply -f ingress.yaml
ingress.networking.k8s.io/scs-ingress created
Once the objects are deployed we can check that all is working using kubectl:
$ kubectl  -n scs-demo get all,secrets,ingress
NAME        READY   STATUS    RESTARTS   AGE
pod/scs-0   3/3     Running   0          24s
NAME            TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)                  AGE
service/scs-svc ClusterIP  10.43.0.47  <none>       22/TCP,80/TCP,9000/TCP   21s

NAME                   READY   AGE
statefulset.apps/scs   1/1     24s
NAME                         TYPE                                  DATA   AGE
secret/default-token-mwcd7   kubernetes.io/service-account-token   3      53s
secret/scs-secrets           Opaque                                3      39s
NAME                                   CLASS  HOSTS      ADDRESS     PORTS   AGE
ingress.networking.k8s.io/scs-ingress  nginx  localhost  172.21.0.5  80      17s
At this point we are ready to use the system.

Usage examples

File uploadsAs previously mentioned in our system the idea is to use the sftp server from other Pods, but to test the system we are going to do a kubectl port-forward and connect to the server using our host client and the password we have generated (it is on the user_pass.txt file, inside the users.tar archive):
$ kubectl -n scs-demo port-forward service/scs-svc 2020:22 &
Forwarding from 127.0.0.1:2020 -> 22
Forwarding from [::1]:2020 -> 22
$ PF_PID=$!
$ sftp -P 2020 scs@127.0.0.1                                                 1
Handling connection for 2020
The authenticity of host '[127.0.0.1]:2020 ([127.0.0.1]:2020)' can't be \
  established.
ED25519 key fingerprint is SHA256:eHNwCnyLcSSuVXXiLKeGraw0FT/4Bb/yjfqTstt+088.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[127.0.0.1]:2020' (ED25519) to the list of known \
  hosts.
scs@127.0.0.1's password: **********
Connected to 127.0.0.1.
sftp> ls -la
drwxr-xr-x    2 sftp     sftp         4096 Sep 25 14:47 .
dr-xr-xr-x    3 sftp     sftp         4096 Sep 25 14:36 ..
sftp> !date -R > /tmp/date.txt                                               2
sftp> put /tmp/date.txt .
Uploading /tmp/date.txt to /date.txt
date.txt                                      100%   32    27.8KB/s   00:00
sftp> ls -l
-rw-r--r--    1 sftp     sftp           32 Sep 25 15:21 date.txt
sftp> ln date.txt date.txt.1                                                 3
sftp> ls -l
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt.1
sftp> put /tmp/date.txt date.txt.2                                           4
Uploading /tmp/date.txt to /date.txt.2
date.txt                                      100%   32    27.8KB/s   00:00
sftp> ls -l                                                                  5
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt.1
-rw-r--r--    1 sftp     sftp           32 Sep 25 15:21 date.txt.2
sftp> exit
$ kill "$PF_PID"
[1]  + terminated  kubectl -n scs-demo port-forward service/scs-svc 2020:22
  1. We connect to the sftp service on the forwarded port with the scs user.
  2. We put a file we have created on the host on the directory.
  3. We do a hard link of the uploaded file.
  4. We put a second copy of the file we created locally.
  5. On the file list we can see that the two first files have two hardlinks

File retrievalsIf our ingress is configured right we can download the date.txt file from the URL http://localhost/scs/date.txt:
$ curl -s http://localhost/scs/date.txt
Sun, 25 Sep 2022 17:21:51 +0200

Use of the webhook containerTo finish this post we are going to show how we can call the hooks directly, from a CronJob and from a Job.

Direct script call (du)In our deployment the direct calls are done from other Pods, to simulate it we are going to do a port-forward and call the script with an existing PATH (the root directory) and a bad one:
$ kubectl -n scs-demo port-forward service/scs-svc 9000:9000 >/dev/null &
$ PF_PID=$!
$ JSON="$(curl -s "http://localhost:9000/hooks/du?path=.")"
$ echo $JSON
 "path":"","bytes":"4160" 
$ JSON="$(curl -s "http://localhost:9000/hooks/du?path=foo")"
$ echo $JSON
 "error":"The provided PATH ('foo') is not a directory" 
$ kill $PF_PID
As we only have files on the base directory we print the disk usage of the . PATH and the output is in json format because we export OUTPUT_FORMAT with the value json on the webhook configuration.

Jobs (s3sync)The following job can be used to synchronise the contents of a directory in a S3 bucket with the SCS Filesystem:
job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: s3sync
  labels:
    cronjob: 's3sync'
spec:
  template:
    metadata:
      labels:
        cronjob: 's3sync'
    spec:
      containers:
      - name: s3sync-job
        image: alpine:latest
        command: 
        - "wget"
        - "-q"
        - "--header"
        - "Content-Type: application/json"
        - "--post-file"
        - "/secrets/s3sync.json"
        - "-O-"
        - "http://scs-svc:9000/hooks/s3sync"
        volumeMounts:
        - mountPath: /secrets
          name: job-secrets
          readOnly: true
      restartPolicy: Never
      volumes:
      - name: job-secrets
        secret:
          secretName: webhook-job-secrets
The file with parameters for the script must be something like this:
s3sync.json
 
  "aws":  
    "key": "********************",
    "secret_key": "****************************************"
   ,
  "s3":  
    "region": "eu-north-1",
    "bucket": "blogops-test",
    "path": "test"
   ,
  "scs":  
    "path": "test"
   
 
Once we have both files we can run the Job as follows:
$ kubectl -n scs-demo create secret generic webhook-job-secrets \            1
  --from-file="s3sync.json=s3sync.json"
secret/webhook-job-secrets created
$ kubectl -n scs-demo apply -f webhook-job.yaml                              2
job.batch/s3sync created
$ kubectl -n scs-demo get pods -l "cronjob=s3sync"                           3
NAME           READY   STATUS      RESTARTS   AGE
s3sync-zx2cj   0/1     Completed   0          12s
$ kubectl -n scs-demo logs s3sync-zx2cj                                      4
Mounted bucket 's3fs-test' on '/root/tmp.jiOjaF/s3data'
sending incremental file list
created directory ./test
./
kyso.png
Number of files: 2 (reg: 1, dir: 1)
Number of created files: 2 (reg: 1, dir: 1)
Number of deleted files: 0
Number of regular files transferred: 1
Total file size: 15,075 bytes
Total transferred file size: 15,075 bytes
Literal data: 15,075 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.147 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 15,183
Total bytes received: 74
sent 15,183 bytes  received 74 bytes  30,514.00 bytes/sec
total size is 15,075  speedup is 0.99
Called umount for '/root/tmp.jiOjaF/s3data'
Script exit code: 0
$ kubectl -n scs-demo delete -f webhook-job.yaml                             5
job.batch "s3sync" deleted
$ kubectl -n scs-demo delete secrets webhook-job-secrets                     6
secret "webhook-job-secrets" deleted
  1. Here we create the webhook-job-secrets secret that contains the s3sync.json file.
  2. This command runs the job.
  3. Checking the label cronjob=s3sync we get the Pods executed by the job.
  4. Here we print the logs of the completed job.
  5. Once we are finished we remove the Job.
  6. And also the secret.

Final remarksThis post has been longer than I expected, but I believe it can be useful for someone; in any case, next time I ll try to explain something shorter or will split it into multiple entries.

1 August 2022

Sergio Talens-Oliag: Using Git Server Hooks on GitLab CE to Validate Tags

Since a long time ago I ve been a gitlab-ce user, in fact I ve set it up on three of the last four companies I ve worked for (initially I installed it using the omnibus packages on a debian server but on the last two places I moved to the docker based installation, as it is easy to maintain and we don t need a big installation as the teams using it are small). On the company I work for now (kyso) we are using it to host all our internal repositories and to do all the CI/CD work (the automatic deployments are triggered by web hooks in some cases, but the rest is all done using gitlab-ci). The majority of projects are using nodejs as programming language and we have automated the publication of npm packages on our gitlab instance npm registry and even the publication into the npmjs registry. To publish the packages we have added rules to the gitlab-ci configuration of the relevant repositories and we publish them when a tag is created. As the we are lazy by definition, I configured the system to use the tag as the package version; I tested if the contents of the package.json where in sync with the expected version and if it was not I updated it and did a force push of the tag with the updated file using the following code on the script that publishes the package:
# Update package version & add it to the .build-args
INITIAL_PACKAGE_VERSION="$(npm pkg get version tr -d '"')"
npm version --allow-same --no-commit-hooks --no-git-tag-version \
  "$CI_COMMIT_TAG"
UPDATED_PACKAGE_VERSION="$(npm pkg get version tr -d '"')"
echo "UPDATED_PACKAGE_VERSION=$UPDATED_PACKAGE_VERSION" >> .build-args
# Update tag if the version was updated or abort
if [ "$INITIAL_PACKAGE_VERSION" != "$UPDATED_PACKAGE_VERSION" ]; then
  if [ -n "$CI_GIT_USER" ] && [ -n "$CI_GIT_TOKEN" ]; then
    git commit -m "Updated version from tag $CI_COMMIT_TAG" package.json
    git tag -f "$CI_COMMIT_TAG" -m "Updated version from tag"
    git push -f -o ci.skip origin "$CI_COMMIT_TAG"
  else
    echo "!!! ERROR !!!"
    echo "The updated tag could not be uploaded."
    echo "Set CI_GIT_USER and CI_GIT_TOKEN or fix the 'package.json' file"
    echo "!!! ERROR !!!"
    exit 1
  fi
fi
This feels a little dirty (we are leaving commits on the tag but not updating the original branch); I thought about trying to find the branch using the tag and update it, but I drop the idea pretty soon as there were multiple issues to consider (i.e. we can have tags pointing to commits present in multiple branches and even if it only points to one the tag does not have to be the HEAD of the branch making the inclusion difficult). In any case this system was working, so we left it until we started to publish to the NPM Registry; as we are using a token to push the packages that we don t want all developers to have access to (right now it would not matter, but when the team grows it will) I started to use gitlab protected branches on the projects that need it and adjusting the .npmrc file using protected variables. The problem then was that we can no longer do a standard force push for a branch (that is the main point of the protected branches feature) unless we use the gitlab api, so the tags with the wrong version started to fail. As the way things were being done seemed dirty anyway I thought that the best way of fixing things was to forbid users to push a tag that includes a version that does not match the package.json version. After thinking about it we decided to use githooks on the gitlab server for the repositories that need it, as we are only interested in tags we are going to use the update hook; it is executed once for each ref to be updated, and takes three parameters:
  • the name of the ref being updated,
  • the old object name stored in the ref,
  • and the new object name to be stored in the ref.
To install our hook we have found the gitaly relative path of each repo and located it on the server filesystem (as I said we are using docker and the gitlab s data directory is on /srv/gitlab/data, so the path to the repo has the form /srv/gitlab/data/git-data/repositories/@hashed/xx/yy/hash.git). Once we have the directory we need to:
  • create a custom_hooks sub directory inside it,
  • add the update script (as we only need one script we used that instead of creating an update.d directory, the good thing is that this will also work with a standard git server renaming the base directory to hooks instead of custom_hooks),
  • make it executable, and
  • change the directory and file ownership to make sure it can be read and executed from the gitlab container
On a console session:
$ cd /srv/gitlab/data/git-data/repositories/@hashed/xx/yy/hash.git
$ mkdir custom_hooks
$ edit_or_copy custom_hooks/update
$ chmod 0755 custom_hooks/update
$ chown --reference=. -R custom_hooks
The update script we are using is as follows:
#!/bin/sh
set -e
# kyso update hook
#
# Right now it checks version.txt or package.json versions against the tag name
# (it supports a 'v' prefix on the tag)
# Arguments
ref_name="$1"
old_rev="$2"
new_rev="$3"
# Initial test
if [ -z "$ref_name" ]    [ -z "$old_rev" ]   [ -z "$new_rev" ]; then
  echo "usage: $0 <ref> <oldrev> <newrev>" >&2
  exit 1
fi
# Get the tag short name
tag_name="$ ref_name##refs/tags/ "
# Exit if the update is not for a tag
if [ "$tag_name" = "$ref_name" ]; then
  exit 0
fi
# Get the null rev value (string of zeros)
zero=$(git hash-object --stdin </dev/null   tr '0-9a-f' '0')
# Get if the tag is new or not
if [ "$old_rev" = "$zero" ]; then
  new_tag="true"
else
  new_tag="false"
fi
# Get the type of revision:
# - delete: if the new_rev is zero
# - commit: annotated tag
# - tag: un-annotated tag
if [ "$new_rev" = "$zero" ]; then
  new_rev_type="delete"
else
  new_rev_type="$(git cat-file -t "$new_rev")"
fi
# Exit if we are deleting a tag (nothing to check here)
if [ "$new_rev_type" = "delete" ]; then
  exit 0
fi
# Check the version against the tag (supports version.txt & package.json)
if git cat-file -e "$new_rev:version.txt" >/dev/null 2>&1; then
  version="$(git cat-file -p "$new_rev:version.txt")"
  if [ "$version" = "$tag_name" ]   [ "$version" = "$ tag_name#v " ]; then
    exit 0
  else
    EMSG="tag '$tag_name' and 'version.txt' contents '$version' don't match"
    echo "GL-HOOK-ERR: $EMSG"
    exit 1
  fi
elif git cat-file -e "$new_rev:package.json" >/dev/null 2>&1; then
  version="$(
    git cat-file -p "$new_rev:package.json"   jsonpath version   tr -d '\[\]"'
  )"
  if [ "$version" = "$tag_name" ]   [ "$version" = "$ tag_name#v " ]; then
    exit 0
  else
    EMSG="tag '$tag_name' and 'package.json' version '$version' don't match"
    echo "GL-HOOK-ERR: $EMSG"
    exit 1
  fi
else
  # No version.txt or package.json file found
  exit 0
fi
Some comments about it:
  • we are only looking for tags, if the ref_name does not have the prefix refs/tags/ the script does an exit 0,
  • although we are checking if the tag is new or not we are not using the value (in gitlab that is handled by the protected tag feature),
  • if we are deleting a tag the script does an exit 0, we don t need to check anything in that case,
  • we are ignoring if the tag is annotated or not (we set the new_rev_type to tag or commit, but we don t use the value),
  • we test first the version.txt file and if it does not exist we check the package.json file, if it does not exist either we do an exit 0, as there is no version to check against and we allow that on a tag,
  • we add the GL-HOOK-ERR: prefix to the messages to show them on the gitlab web interface (can be tested creating a tag from it),
  • to get the version on the package.json file we use the jsonpath binary (it is installed by the jsonpath ruby gem) because it is available on the gitlab container (initially I used sed to get the value, but a real JSON parser is always a better option).
Once the hook is installed when a user tries to push a tag to a repository that has a version.txt file or package.json file and the tag does not match the version (if version.txt is present it takes precedence) the push fails. If the tag matches or the files are not present the tag is added if the user has permission to add it in gitlab (our hook is only executed if the user is allowed to create or update the tag).

26 May 2022

Sergio Talens-Oliag: New Blog Config

As promised, on this post I m going to explain how I ve configured this blog using hugo, asciidoctor and the papermod theme, how I publish it using nginx, how I ve integrated the remark42 comment system and how I ve automated its publication using gitea and json2file-go. It is a long post, but I hope that at least parts of it can be interesting for some, feel free to ignore it if that is not your case

Hugo Configuration

Theme settingsThe site is using the PaperMod theme and as I m using asciidoctor to publish my content I ve adjusted the settings to improve how things are shown with it. The current config.yml file is the one shown below (probably some of the settings are not required nor being used right now, but I m including the current file, so this post will have always the latest version of it):
config.yml
baseURL: https://blogops.mixinet.net/
title: Mixinet BlogOps
paginate: 5
theme: PaperMod
destination: public/
enableInlineShortcodes: true
enableRobotsTXT: true
buildDrafts: false
buildFuture: false
buildExpired: false
enableEmoji: true
pygmentsUseClasses: true
minify:
  disableXML: true
  minifyOutput: true
languages:
  en:
    languageName: "English"
    description: "Mixinet BlogOps - https://blogops.mixinet.net/"
    author: "Sergio Talens-Oliag"
    weight: 1
    title: Mixinet BlogOps
    homeInfoParams:
      Title: "Sergio Talens-Oliag Technical Blog"
      Content: >
        ![Mixinet BlogOps](/images/mixinet-blogops.png)
    taxonomies:
      category: categories
      tag: tags
      series: series
    menu:
      main:
        - name: Archive
          url: archives
          weight: 5
        - name: Categories
          url: categories/
          weight: 10
        - name: Tags
          url: tags/
          weight: 10
        - name: Search
          url: search/
          weight: 15
outputs:
  home:
    - HTML
    - RSS
    - JSON
params:
  env: production
  defaultTheme: light
  disableThemeToggle: false
  ShowShareButtons: true
  ShowReadingTime: true
  disableSpecial1stPost: true
  disableHLJS: true
  displayFullLangName: true
  ShowPostNavLinks: true
  ShowBreadCrumbs: true
  ShowCodeCopyButtons: true
  ShowRssButtonInSectionTermList: true
  ShowFullTextinRSS: true
  ShowToc: true
  TocOpen: false
  comments: true
  remark42SiteID: "blogops"
  remark42Url: "/remark42"
  profileMode:
    enabled: false
    title: Sergio Talens-Oliag Technical Blog
    imageUrl: "/images/mixinet-blogops.png"
    imageTitle: Mixinet BlogOps
    buttons:
      - name: Archives
        url: archives
      - name: Categories
        url: categories
      - name: Tags
        url: tags
  socialIcons:
    - name: CV
      url: "https://www.uv.es/~sto/cv/"
    - name: Debian
      url: "https://people.debian.org/~sto/"
    - name: GitHub
      url: "https://github.com/sto/"
    - name: GitLab
      url: "https://gitlab.com/stalens/"
    - name: Linkedin
      url: "https://www.linkedin.com/in/sergio-talens-oliag/"
    - name: RSS
      url: "index.xml"
  assets:
    disableHLJS: true
    favicon: "/favicon.ico"
    favicon16x16:  "/favicon-16x16.png"
    favicon32x32:  "/favicon-32x32.png"
    apple_touch_icon:  "/apple-touch-icon.png"
    safari_pinned_tab:  "/safari-pinned-tab.svg"
  fuseOpts:
    isCaseSensitive: false
    shouldSort: true
    location: 0
    distance: 1000
    threshold: 0.4
    minMatchCharLength: 0
    keys: ["title", "permalink", "summary", "content"]
markup:
  asciidocExt:
    attributes:  
    backend: html5s
    extensions: ['asciidoctor-html5s','asciidoctor-diagram']
    failureLevel: fatal
    noHeaderOrFooter: true
    preserveTOC: false
    safeMode: unsafe
    sectionNumbers: false
    trace: false
    verbose: false
    workingFolderCurrent: true
privacy:
  vimeo:
    disabled: false
    simple: true
  twitter:
    disabled: false
    enableDNT: true
    simple: true
  instagram:
    disabled: false
    simple: true
  youtube:
    disabled: false
    privacyEnhanced: true
services:
  instagram:
    disableInlineCSS: true
  twitter:
    disableInlineCSS: true
security:
  exec:
    allow:
      - '^asciidoctor$'
      - '^dart-sass-embedded$'
      - '^go$'
      - '^npx$'
      - '^postcss$'
Some notes about the settings:
  • disableHLJS and assets.disableHLJS are set to true; we plan to use rouge on adoc and the inclusion of the hljs assets adds styles that collide with the ones used by rouge.
  • ShowToc is set to true and the TocOpen setting is set to false to make the ToC appear collapsed initially. My plan was to use the asciidoctor ToC, but after trying I believe that the theme one looks nice and I don t need to adjust styles, although it has some issues with the html5s processor (the admonition titles use <h6> and they are shown on the ToC, which is weird), to fix it I ve copied the layouts/partial/toc.html to my site repository and replaced the range of headings to end at 5 instead of 6 (in fact 5 still seems a lot, but as I don t think I ll use that heading level on the posts it doesn t really matter).
  • params.profileMode values are adjusted, but for now I ve left it disabled setting params.profileMode.enabled to false and I ve set the homeInfoParams to show more or less the same content with the latest posts under it (I ve added some styles to my custom.css style sheet to center the text and image of the first post to match the look and feel of the profile).
  • On the asciidocExt section I ve adjusted the backend to use html5s, I ve added the asciidoctor-html5s and asciidoctor-diagram extensions to asciidoctor and adjusted the workingFolderCurrent to true to make asciidoctor-diagram work right (haven t tested it yet).

Theme customisationsTo write in asciidoctor using the html5s processor I ve added some files to the assets/css/extended directory:
  1. As said before, I ve added the file assets/css/extended/custom.css to make the homeInfoParams look like the profile page and I ve also changed a little bit some theme styles to make things look better with the html5s output:
    custom.css
    /* Fix first entry alignment to make it look like the profile */
    .first-entry   text-align: center;  
    .first-entry img   display: inline;  
    /**
     * Remove margin for .post-content code and reduce padding to make it look
     * better with the asciidoctor html5s output.
     **/
    .post-content code   margin: auto 0; padding: 4px;  
  2. I ve also added the file assets/css/extended/adoc.css with some styles taken from the asciidoctor-default.css, see this blog post about the original file; mine is the same after formatting it with css-beautify and editing it to use variables for the colors to support light and dark themes:
    adoc.css
    /* AsciiDoctor*/
    table  
        border-collapse: collapse;
        border-spacing: 0
     
    .admonitionblock>table  
        border-collapse: separate;
        border: 0;
        background: none;
        width: 100%
     
    .admonitionblock>table td.icon  
        text-align: center;
        width: 80px
     
    .admonitionblock>table td.icon img  
        max-width: none
     
    .admonitionblock>table td.icon .title  
        font-weight: bold;
        font-family: "Open Sans", "DejaVu Sans", sans-serif;
        text-transform: uppercase
     
    .admonitionblock>table td.content  
        padding-left: 1.125em;
        padding-right: 1.25em;
        border-left: 1px solid #ddddd8;
        color: var(--primary)
     
    .admonitionblock>table td.content>:last-child>:last-child  
        margin-bottom: 0
     
    .admonitionblock td.icon [class^="fa icon-"]  
        font-size: 2.5em;
        text-shadow: 1px 1px 2px var(--secondary);
        cursor: default
     
    .admonitionblock td.icon .icon-note::before  
        content: "\f05a";
        color: var(--icon-note-color)
     
    .admonitionblock td.icon .icon-tip::before  
        content: "\f0eb";
        color: var(--icon-tip-color)
     
    .admonitionblock td.icon .icon-warning::before  
        content: "\f071";
        color: var(--icon-warning-color)
     
    .admonitionblock td.icon .icon-caution::before  
        content: "\f06d";
        color: var(--icon-caution-color)
     
    .admonitionblock td.icon .icon-important::before  
        content: "\f06a";
        color: var(--icon-important-color)
     
    .conum[data-value]  
        display: inline-block;
        color: #fff !important;
        background-color: rgba(100, 100, 0, .8);
        -webkit-border-radius: 100px;
        border-radius: 100px;
        text-align: center;
        font-size: .75em;
        width: 1.67em;
        height: 1.67em;
        line-height: 1.67em;
        font-family: "Open Sans", "DejaVu Sans", sans-serif;
        font-style: normal;
        font-weight: bold
     
    .conum[data-value] *  
        color: #fff !important
     
    .conum[data-value]+b  
        display: none
     
    .conum[data-value]::after  
        content: attr(data-value)
     
    pre .conum[data-value]  
        position: relative;
        top: -.125em
     
    b.conum *  
        color: inherit !important
     
    .conum:not([data-value]):empty  
        display: none
     
  3. The previous file uses variables from a partial copy of the theme-vars.css file that changes the highlighted code background color and adds the color definitions used by the admonitions:
    theme-vars.css
    :root  
        /* Solarized base2 */
        /* --hljs-bg: rgb(238, 232, 213); */
        /* Solarized base3 */
        /* --hljs-bg: rgb(253, 246, 227); */
        /* Solarized base02 */
        --hljs-bg: rgb(7, 54, 66);
        /* Solarized base03 */
        /* --hljs-bg: rgb(0, 43, 54); */
        /* Default asciidoctor theme colors */
        --icon-note-color: #19407c;
        --icon-tip-color: var(--primary);
        --icon-warning-color: #bf6900;
        --icon-caution-color: #bf3400;
        --icon-important-color: #bf0000
     
    .dark  
        --hljs-bg: rgb(7, 54, 66);
        /* Asciidoctor theme colors with tint for dark background */
        --icon-note-color: #3e7bd7;
        --icon-tip-color: var(--primary);
        --icon-warning-color: #ff8d03;
        --icon-caution-color: #ff7847;
        --icon-important-color: #ff3030
     
  4. The previous styles use font-awesome, so I ve downloaded its resources for version 4.7.0 (the one used by asciidoctor) storing the font-awesome.css into on the assets/css/extended dir (that way it is merged with the rest of .css files) and copying the fonts to the static/assets/fonts/ dir (will be served directly):
    FA_BASE_URL="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0"
    curl "$FA_BASE_URL/css/font-awesome.css" \
      > assets/css/extended/font-awesome.css
    for f in FontAwesome.otf fontawesome-webfont.eot \
      fontawesome-webfont.svg fontawesome-webfont.ttf \
      fontawesome-webfont.woff fontawesome-webfont.woff2; do
        curl "$FA_BASE_URL/fonts/$f" > "static/assets/fonts/$f"
    done
  5. As already said the default highlighter is disabled (it provided a css compatible with rouge) so we need a css to do the highlight styling; as rouge provides a way to export them, I ve created the assets/css/extended/rouge.css file with the thankful_eyes theme:
    rougify style thankful_eyes > assets/css/extended/rouge.css
  6. To support the use of the html5s backend with admonitions I ve added a variation of the example found on this blog post to assets/js/adoc-admonitions.js:
    adoc-admonitions.js
    // replace the default admonitions block with a table that uses a format
    // similar to the standard asciidoctor ... as we are using fa-icons here there
    // is no need to add the icons: font entry on the document.
    window.addEventListener('load', function ()  
      const admonitions = document.getElementsByClassName('admonition-block')
      for (let i = admonitions.length - 1; i >= 0; i--)  
        const elm = admonitions[i]
        const type = elm.classList[1]
        const title = elm.getElementsByClassName('block-title')[0];
    	const label = title.getElementsByClassName('title-label')[0]
    		.innerHTML.slice(0, -1);
        elm.removeChild(elm.getElementsByClassName('block-title')[0]);
        const text = elm.innerHTML
        const parent = elm.parentNode
        const tempDiv = document.createElement('div')
        tempDiv.innerHTML =  <div class="admonitionblock $ type ">
        <table>
          <tbody>
            <tr>
              <td class="icon">
                <i class="fa icon-$ type " title="$ label "></i>
              </td>
              <td class="content">
                $ text 
              </td>
            </tr>
          </tbody>
        </table>
      </div> 
        const input = tempDiv.childNodes[0]
        parent.replaceChild(input, elm)
       
     )
    and enabled its minified use on the layouts/partials/extend_footer.html file adding the following lines to it:
     - $admonitions := slice (resources.Get "js/adoc-admonitions.js")
        resources.Concat "assets/js/adoc-admonitions.js"   minify   fingerprint  
    <script defer crossorigin="anonymous" src="  $admonitions.RelPermalink  "
      integrity="  $admonitions.Data.Integrity  "></script>

Remark42 configurationTo integrate Remark42 with the PaperMod theme I ve created the file layouts/partials/comments.html with the following content based on the remark42 documentation, including extra code to sync the dark/light setting with the one set on the site:
comments.html
<div id="remark42"></div>
<script>
  var remark_config =  
    host:   .Site.Params.remark42Url  ,
    site_id:   .Site.Params.remark42SiteID  ,
    url:   .Permalink  ,
    locale:   .Site.Language.Lang  
   ;
  (function(c)  
    /* Adjust the theme using the local-storage pref-theme if set */
    if (localStorage.getItem("pref-theme") === "dark")  
      remark_config.theme = "dark";
      else if (localStorage.getItem("pref-theme") === "light")  
      remark_config.theme = "light";
     
    /* Add remark42 widget */
    for(var i = 0; i < c.length; i++) 
      var d = document, s = d.createElement('script');
      s.src = remark_config.host + '/web/' + c[i] +'.js';
      s.defer = true;
      (d.head   d.body).appendChild(s);
     
   )(remark_config.components   ['embed']);
</script>
In development I use it with anonymous comments enabled, but to avoid SPAM the production site uses social logins (for now I ve only enabled Github & Google, if someone requests additional services I ll check them, but those were the easy ones for me initially). To support theme switching with remark42 I ve also added the following inside the layouts/partials/extend_footer.html file:
 - if (not site.Params.disableThemeToggle)  
<script>
/* Function to change theme when the toggle button is pressed */
document.getElementById("theme-toggle").addEventListener("click", () =>  
  if (typeof window.REMARK42 != "undefined")  
    if (document.body.className.includes('dark'))  
      window.REMARK42.changeTheme('light');
      else  
      window.REMARK42.changeTheme('dark');
     
   
 );
</script>
 - end  
With this code if the theme-toggle button is pressed we change the remark42 theme before the PaperMod one (that s needed here only, on page loads the remark42 theme is synced with the main one using the code from the layouts/partials/comments.html shown earlier).

Development setupTo preview the site on my laptop I m using docker-compose with the following configuration:
docker-compose.yaml
version: "2"
services:
  hugo:
    build:
      context: ./docker/hugo-adoc
      dockerfile: ./Dockerfile
    image: sto/hugo-adoc
    container_name: hugo-adoc-blogops
    restart: always
    volumes:
      - .:/documents
    command: server --bind 0.0.0.0 -D -F
    user: $ APP_UID :$ APP_GID 
  nginx:
    image: nginx:latest
    container_name: nginx-blogops
    restart: always
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf
    ports:
      -  1313:1313
  remark42:
    build:
      context: ./docker/remark42
      dockerfile: ./Dockerfile
    image: sto/remark42
    container_name: remark42-blogops
    restart: always
    env_file:
      - ./.env
      - ./remark42/env.dev
    volumes:
      - ./remark42/var.dev:/srv/var
To run it properly we have to create the .env file with the current user ID and GID on the variables APP_UID and APP_GID (if we don t do it the files can end up being owned by a user that is not the same as the one running the services):
$ echo "APP_UID=$(id -u)\nAPP_GID=$(id -g)" > .env
The Dockerfile used to generate the sto/hugo-adoc is:
Dockerfile
FROM asciidoctor/docker-asciidoctor:latest
RUN gem install --no-document asciidoctor-html5s &&\
 apk update && apk add --no-cache curl libc6-compat &&\
 repo_path="gohugoio/hugo" &&\
 api_url="https://api.github.com/repos/$repo_path/releases/latest" &&\
 download_url="$(\
  curl -sL "$api_url"  \
  sed -n "s/^.*download_url\": \"\\(.*.extended.*Linux-64bit.tar.gz\)\"/\1/p"\
 )" &&\
 curl -sL "$download_url" -o /tmp/hugo.tgz &&\
 tar xf /tmp/hugo.tgz hugo &&\
 install hugo /usr/bin/ &&\
 rm -f hugo /tmp/hugo.tgz &&\
 /usr/bin/hugo version &&\
 apk del curl && rm -rf /var/cache/apk/*
# Expose port for live server
EXPOSE 1313
ENTRYPOINT ["/usr/bin/hugo"]
CMD [""]
If you review it you will see that I m using the docker-asciidoctor image as the base; the idea is that this image has all I need to work with asciidoctor and to use hugo I only need to download the binary from their latest release at github (as we are using an image based on alpine we also need to install the libc6-compat package, but once that is done things are working fine for me so far). The image does not launch the server by default because I don t want it to; in fact I use the same docker-compose.yml file to publish the site in production simply calling the container without the arguments passed on the docker-compose.yml file (see later). When running the containers with docker-compose up (or docker compose up if you have the docker-compose-plugin package installed) we also launch a nginx container and the remark42 service so we can test everything together. The Dockerfile for the remark42 image is the original one with an updated version of the init.sh script:
Dockerfile
FROM umputun/remark42:latest
COPY init.sh /init.sh
The updated init.sh is similar to the original, but allows us to use an APP_GID variable and updates the /etc/group file of the container so the files get the right user and group (with the original script the group is always 1001):
init.sh
#!/sbin/dinit /bin/sh
uid="$(id -u)"
if [ "$ uid " -eq "0" ]; then
  echo "init container"
  # set container's time zone
  cp "/usr/share/zoneinfo/$ TIME_ZONE " /etc/localtime
  echo "$ TIME_ZONE " >/etc/timezone
  echo "set timezone $ TIME_ZONE  ($(date))"
  # set UID & GID for the app
  if [ "$ APP_UID " ]   [ "$ APP_GID " ]; then
    [ "$ APP_UID " ]   APP_UID="1001"
    [ "$ APP_GID " ]   APP_GID="$ APP_UID "
    echo "set custom APP_UID=$ APP_UID  & APP_GID=$ APP_GID "
    sed -i "s/^app:x:1001:1001:/app:x:$ APP_UID :$ APP_GID :/" /etc/passwd
    sed -i "s/^app:x:1001:/app:x:$ APP_GID :/" /etc/group
  else
    echo "custom APP_UID and/or APP_GID not defined, using 1001:1001"
  fi
  chown -R app:app /srv /home/app
fi
echo "prepare environment"
# replace  % REMARK_URL %  by content of REMARK_URL variable
find /srv -regex '.*\.\(html\ js\ mjs\)$' -print \
  -exec sed -i "s % REMARK_URL % $ REMARK_URL  g"   \;
if [ -n "$ SITE_ID " ]; then
  #replace "site_id: 'remark'" by SITE_ID
  sed -i "s 'remark' '$ SITE_ID ' g" /srv/web/*.html
fi
echo "execute \"$*\""
if [ "$ uid " -eq "0" ]; then
  exec su-exec app "$@"
else
  exec "$@"
fi
The environment file used with remark42 for development is quite minimal:
env.dev
TIME_ZONE=Europe/Madrid
REMARK_URL=http://localhost:1313/remark42
SITE=blogops
SECRET=123456
ADMIN_SHARED_ID=sto
AUTH_ANON=true
EMOJI=true
And the nginx/default.conf file used to publish the service locally is simple too:
default.conf
server   
 listen 1313;
 server_name localhost;
 location /  
    proxy_pass http://hugo:1313;
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
  
 location /remark42/  
    rewrite /remark42/(.*) /$1 break;
    proxy_pass http://remark42:8080/;
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
   
 

Production setupThe VM where I m publishing the blog runs Debian GNU/Linux and uses binaries from local packages and applications packaged inside containers. To run the containers I m using docker-ce (I could have used podman instead, but I already had it installed on the machine, so I stayed with it). The binaries used on this project are included on the following packages from the main Debian repository:
  • git to clone & pull the repository,
  • jq to parse json files from shell scripts,
  • json2file-go to save the webhook messages to files,
  • inotify-tools to detect when new files are stored by json2file-go and launch scripts to process them,
  • nginx to publish the site using HTTPS and work as proxy for json2file-go and remark42 (I run it using a container),
  • task-spool to queue the scripts that update the deployment.
And I m using docker and docker compose from the debian packages on the docker repository:
  • docker-ce to run the containers,
  • docker-compose-plugin to run docker compose (it is a plugin, so no - in the name).

Repository checkoutTo manage the git repository I ve created a deploy key, added it to gitea and cloned the project on the /srv/blogops PATH (that route is owned by a regular user that has permissions to run docker, as I said before).

Compiling the site with hugoTo compile the site we are using the docker-compose.yml file seen before, to be able to run it first we build the container images and once we have them we launch hugo using docker compose run:
$ cd /srv/blogops
$ git pull
$ docker compose build
$ if [ -d "./public" ]; then rm -rf ./public; fi
$ docker compose run hugo --
The compilation leaves the static HTML on /srv/blogops/public (we remove the directory first because hugo does not clean the destination folder as jekyll does). The deploy script re-generates the site as described and moves the public directory to its final place for publishing.

Running remark42 with dockerOn the /srv/blogops/remark42 folder I have the following docker-compose.yml:
docker-compose.yml
version: "2"
services:
  remark42:
    build:
      context: ../docker/remark42
      dockerfile: ./Dockerfile
    image: sto/remark42
    env_file:
      - ../.env
      - ./env.prod
    container_name: remark42
    restart: always
    volumes:
      - ./var.prod:/srv/var
    ports:
      - 127.0.0.1:8042:8080
The ../.env file is loaded to get the APP_UID and APP_GID variables that are used by my version of the init.sh script to adjust file permissions and the env.prod file contains the rest of the settings for remark42, including the social network tokens (see the remark42 documentation for the available parameters, I don t include my configuration here because some of them are secrets).

Nginx configurationThe nginx configuration for the blogops.mixinet.net site is as simple as:
server  
  listen 443 ssl http2;
  server_name blogops.mixinet.net;
  ssl_certificate /etc/letsencrypt/live/blogops.mixinet.net/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/blogops.mixinet.net/privkey.pem;
  include /etc/letsencrypt/options-ssl-nginx.conf;
  ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
  access_log /var/log/nginx/blogops.mixinet.net-443.access.log;
  error_log  /var/log/nginx/blogops.mixinet.net-443.error.log;
  root /srv/blogops/nginx/public_html;
  location /  
    try_files $uri $uri/ =404;
   
  include /srv/blogops/nginx/remark42.conf;
 
server  
  listen 80 ;
  listen [::]:80 ;
  server_name blogops.mixinet.net;
  access_log /var/log/nginx/blogops.mixinet.net-80.access.log;
  error_log  /var/log/nginx/blogops.mixinet.net-80.error.log;
  if ($host = blogops.mixinet.net)  
    return 301 https://$host$request_uri;
   
  return 404;
 
On this configuration the certificates are managed by certbot and the server root directory is on /srv/blogops/nginx/public_html and not on /srv/blogops/public; the reason for that is that I want to be able to compile without affecting the running site, the deployment script generates the site on /srv/blogops/public and if all works well we rename folders to do the switch, making the change feel almost atomic.

json2file-go configurationAs I have a working WireGuard VPN between the machine running gitea at my home and the VM where the blog is served, I m going to configure the json2file-go to listen for connections on a high port using a self signed certificate and listening on IP addresses only reachable through the VPN. To do it we create a systemd socket to run json2file-go and adjust its configuration to listen on a private IP (we use the FreeBind option on its definition to be able to launch the service even when the IP is not available, that is, when the VPN is down). The following script can be used to set up the json2file-go configuration:
setup-json2file.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
BASE_DIR="/srv/blogops/webhook"
J2F_DIR="$BASE_DIR/json2file"
TLS_DIR="$BASE_DIR/tls"
J2F_SERVICE_NAME="json2file-go"
J2F_SERVICE_DIR="/etc/systemd/system/json2file-go.service.d"
J2F_SERVICE_OVERRIDE="$J2F_SERVICE_DIR/override.conf"
J2F_SOCKET_DIR="/etc/systemd/system/json2file-go.socket.d"
J2F_SOCKET_OVERRIDE="$J2F_SOCKET_DIR/override.conf"
J2F_BASEDIR_FILE="/etc/json2file-go/basedir"
J2F_DIRLIST_FILE="/etc/json2file-go/dirlist"
J2F_CRT_FILE="/etc/json2file-go/certfile"
J2F_KEY_FILE="/etc/json2file-go/keyfile"
J2F_CRT_PATH="$TLS_DIR/crt.pem"
J2F_KEY_PATH="$TLS_DIR/key.pem"
# ----
# MAIN
# ----
# Install packages used with json2file for the blogops site
sudo apt update
sudo apt install -y json2file-go uuid
if [ -z "$(type mkcert)" ]; then
  sudo apt install -y mkcert
fi
sudo apt clean
# Configuration file values
J2F_USER="$(id -u)"
J2F_GROUP="$(id -g)"
J2F_DIRLIST="blogops:$(uuid)"
J2F_LISTEN_STREAM="172.31.31.1:4443"
# Configure json2file
[ -d "$J2F_DIR" ]   mkdir "$J2F_DIR"
sudo sh -c "echo '$J2F_DIR' >'$J2F_BASEDIR_FILE'"
[ -d "$TLS_DIR" ]   mkdir "$TLS_DIR"
if [ ! -f "$J2F_CRT_PATH" ]   [ ! -f "$J2F_KEY_PATH" ]; then
  mkcert -cert-file "$J2F_CRT_PATH" -key-file "$J2F_KEY_PATH" "$(hostname -f)"
fi
sudo sh -c "echo '$J2F_CRT_PATH' >'$J2F_CRT_FILE'"
sudo sh -c "echo '$J2F_KEY_PATH' >'$J2F_KEY_FILE'"
sudo sh -c "cat >'$J2F_DIRLIST_FILE'" <<EOF
$(echo "$J2F_DIRLIST"   tr ';' '\n')
EOF
# Service override
[ -d "$J2F_SERVICE_DIR" ]   sudo mkdir "$J2F_SERVICE_DIR"
sudo sh -c "cat >'$J2F_SERVICE_OVERRIDE'" <<EOF
[Service]
User=$J2F_USER
Group=$J2F_GROUP
EOF
# Socket override
[ -d "$J2F_SOCKET_DIR" ]   sudo mkdir "$J2F_SOCKET_DIR"
sudo sh -c "cat >'$J2F_SOCKET_OVERRIDE'" <<EOF
[Socket]
# Set FreeBind to listen on missing addresses (the VPN can be down sometimes)
FreeBind=true
# Set ListenStream to nothing to clear its value and add the new value later
ListenStream=
ListenStream=$J2F_LISTEN_STREAM
EOF
# Restart and enable service
sudo systemctl daemon-reload
sudo systemctl stop "$J2F_SERVICE_NAME"
sudo systemctl start "$J2F_SERVICE_NAME"
sudo systemctl enable "$J2F_SERVICE_NAME"
# ----
# vim: ts=2:sw=2:et:ai:sts=2
Warning: The script uses mkcert to create the temporary certificates, to install the package on bullseye the backports repository must be available.

Gitea configurationTo make gitea use our json2file-go server we go to the project and enter into the hooks/gitea/new page, once there we create a new webhook of type gitea and set the target URL to https://172.31.31.1:4443/blogops and on the secret field we put the token generated with uuid by the setup script:
sed -n -e 's/blogops://p' /etc/json2file-go/dirlist
The rest of the settings can be left as they are:
  • Trigger on: Push events
  • Branch filter: *
Warning: We are using an internal IP and a self signed certificate, that means that we have to review that the webhook section of the app.ini of our gitea server allows us to call the IP and skips the TLS verification (you can see the available options on the gitea documentation). The [webhook] section of my server looks like this:
[webhook]
ALLOWED_HOST_LIST=private
SKIP_TLS_VERIFY=true
Once we have the webhook configured we can try it and if it works our json2file server will store the file on the /srv/blogops/webhook/json2file/blogops/ folder.

The json2file spooler scriptWith the previous configuration our system is ready to receive webhook calls from gitea and store the messages on files, but we have to do something to process those files once they are saved in our machine. An option could be to use a cronjob to look for new files, but we can do better on Linux using inotify we will use the inotifywait command from inotify-tools to watch the json2file output directory and execute a script each time a new file is moved inside it or closed after writing (IN_CLOSE_WRITE and IN_MOVED_TO events). To avoid concurrency problems we are going to use task-spooler to launch the scripts that process the webhooks using a queue of length 1, so they are executed one by one in a FIFO queue. The spooler script is this:
blogops-spooler.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
BASE_DIR="/srv/blogops/webhook"
BIN_DIR="$BASE_DIR/bin"
TSP_DIR="$BASE_DIR/tsp"
WEBHOOK_COMMAND="$BIN_DIR/blogops-webhook.sh"
# ---------
# FUNCTIONS
# ---------
queue_job()  
  echo "Queuing job to process file '$1'"
  TMPDIR="$TSP_DIR" TS_SLOTS="1" TS_MAXFINISHED="10" \
    tsp -n "$WEBHOOK_COMMAND" "$1"
 
# ----
# MAIN
# ----
INPUT_DIR="$1"
if [ ! -d "$INPUT_DIR" ]; then
  echo "Input directory '$INPUT_DIR' does not exist, aborting!"
  exit 1
fi
[ -d "$TSP_DIR" ]   mkdir "$TSP_DIR"
echo "Processing existing files under '$INPUT_DIR'"
find "$INPUT_DIR" -type f   sort   while read -r _filename; do
  queue_job "$_filename"
done
# Use inotifywatch to process new files
echo "Watching for new files under '$INPUT_DIR'"
inotifywait -q -m -e close_write,moved_to --format "%w%f" -r "$INPUT_DIR"  
  while read -r _filename; do
    queue_job "$_filename"
  done
# ----
# vim: ts=2:sw=2:et:ai:sts=2
To run it as a daemon we install it as a systemd service using the following script:
setup-spooler.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
BASE_DIR="/srv/blogops/webhook"
BIN_DIR="$BASE_DIR/bin"
J2F_DIR="$BASE_DIR/json2file"
SPOOLER_COMMAND="$BIN_DIR/blogops-spooler.sh '$J2F_DIR'"
SPOOLER_SERVICE_NAME="blogops-j2f-spooler"
SPOOLER_SERVICE_FILE="/etc/systemd/system/$SPOOLER_SERVICE_NAME.service"
# Configuration file values
J2F_USER="$(id -u)"
J2F_GROUP="$(id -g)"
# ----
# MAIN
# ----
# Install packages used with the webhook processor
sudo apt update
sudo apt install -y inotify-tools jq task-spooler
sudo apt clean
# Configure process service
sudo sh -c "cat > $SPOOLER_SERVICE_FILE" <<EOF
[Install]
WantedBy=multi-user.target
[Unit]
Description=json2file processor for $J2F_USER
After=docker.service
[Service]
Type=simple
User=$J2F_USER
Group=$J2F_GROUP
ExecStart=$SPOOLER_COMMAND
EOF
# Restart and enable service
sudo systemctl daemon-reload
sudo systemctl stop "$SPOOLER_SERVICE_NAME"   true
sudo systemctl start "$SPOOLER_SERVICE_NAME"
sudo systemctl enable "$SPOOLER_SERVICE_NAME"
# ----
# vim: ts=2:sw=2:et:ai:sts=2

The gitea webhook processorFinally, the script that processes the JSON files does the following:
  1. First, it checks if the repository and branch are right,
  2. Then, it fetches and checks out the commit referenced on the JSON file,
  3. Once the files are updated, compiles the site using hugo with docker compose,
  4. If the compilation succeeds the script renames directories to swap the old version of the site by the new one.
If there is a failure the script aborts but before doing it or if the swap succeeded the system sends an email to the configured address and/or the user that pushed updates to the repository with a log of what happened. The current script is this one:
blogops-webhook.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
# Values
REPO_REF="refs/heads/main"
REPO_CLONE_URL="https://gitea.mixinet.net/mixinet/blogops.git"
MAIL_PREFIX="[BLOGOPS-WEBHOOK] "
# Address that gets all messages, leave it empty if not wanted
MAIL_TO_ADDR="blogops@mixinet.net"
# If the following variable is set to 'true' the pusher gets mail on failures
MAIL_ERRFILE="false"
# If the following variable is set to 'true' the pusher gets mail on success
MAIL_LOGFILE="false"
# gitea's conf/app.ini value of NO_REPLY_ADDRESS, it is used for email domains
# when the KeepEmailPrivate option is enabled for a user
NO_REPLY_ADDRESS="noreply.example.org"
# Directories
BASE_DIR="/srv/blogops"
PUBLIC_DIR="$BASE_DIR/public"
NGINX_BASE_DIR="$BASE_DIR/nginx"
PUBLIC_HTML_DIR="$NGINX_BASE_DIR/public_html"
WEBHOOK_BASE_DIR="$BASE_DIR/webhook"
WEBHOOK_SPOOL_DIR="$WEBHOOK_BASE_DIR/spool"
WEBHOOK_ACCEPTED="$WEBHOOK_SPOOL_DIR/accepted"
WEBHOOK_DEPLOYED="$WEBHOOK_SPOOL_DIR/deployed"
WEBHOOK_REJECTED="$WEBHOOK_SPOOL_DIR/rejected"
WEBHOOK_TROUBLED="$WEBHOOK_SPOOL_DIR/troubled"
WEBHOOK_LOG_DIR="$WEBHOOK_SPOOL_DIR/log"
# Files
TODAY="$(date +%Y%m%d)"
OUTPUT_BASENAME="$(date +%Y%m%d-%H%M%S.%N)"
WEBHOOK_LOGFILE_PATH="$WEBHOOK_LOG_DIR/$OUTPUT_BASENAME.log"
WEBHOOK_ACCEPTED_JSON="$WEBHOOK_ACCEPTED/$OUTPUT_BASENAME.json"
WEBHOOK_ACCEPTED_LOGF="$WEBHOOK_ACCEPTED/$OUTPUT_BASENAME.log"
WEBHOOK_REJECTED_TODAY="$WEBHOOK_REJECTED/$TODAY"
WEBHOOK_REJECTED_JSON="$WEBHOOK_REJECTED_TODAY/$OUTPUT_BASENAME.json"
WEBHOOK_REJECTED_LOGF="$WEBHOOK_REJECTED_TODAY/$OUTPUT_BASENAME.log"
WEBHOOK_DEPLOYED_TODAY="$WEBHOOK_DEPLOYED/$TODAY"
WEBHOOK_DEPLOYED_JSON="$WEBHOOK_DEPLOYED_TODAY/$OUTPUT_BASENAME.json"
WEBHOOK_DEPLOYED_LOGF="$WEBHOOK_DEPLOYED_TODAY/$OUTPUT_BASENAME.log"
WEBHOOK_TROUBLED_TODAY="$WEBHOOK_TROUBLED/$TODAY"
WEBHOOK_TROUBLED_JSON="$WEBHOOK_TROUBLED_TODAY/$OUTPUT_BASENAME.json"
WEBHOOK_TROUBLED_LOGF="$WEBHOOK_TROUBLED_TODAY/$OUTPUT_BASENAME.log"
# Query to get variables from a gitea webhook json
ENV_VARS_QUERY="$(
  printf "%s" \
    '(.             @sh "gt_ref=\(.ref);"),' \
    '(.             @sh "gt_after=\(.after);"),' \
    '(.repository   @sh "gt_repo_clone_url=\(.clone_url);"),' \
    '(.repository   @sh "gt_repo_name=\(.name);"),' \
    '(.pusher       @sh "gt_pusher_full_name=\(.full_name);"),' \
    '(.pusher       @sh "gt_pusher_email=\(.email);")'
)"
# ---------
# Functions
# ---------
webhook_log()  
  echo "$(date -R) $*" >>"$WEBHOOK_LOGFILE_PATH"
 
webhook_check_directories()  
  for _d in "$WEBHOOK_SPOOL_DIR" "$WEBHOOK_ACCEPTED" "$WEBHOOK_DEPLOYED" \
    "$WEBHOOK_REJECTED" "$WEBHOOK_TROUBLED" "$WEBHOOK_LOG_DIR"; do
    [ -d "$_d" ]   mkdir "$_d"
  done
 
webhook_clean_directories()  
  # Try to remove empty dirs
  for _d in "$WEBHOOK_ACCEPTED" "$WEBHOOK_DEPLOYED" "$WEBHOOK_REJECTED" \
    "$WEBHOOK_TROUBLED" "$WEBHOOK_LOG_DIR" "$WEBHOOK_SPOOL_DIR"; do
    if [ -d "$_d" ]; then
      rmdir "$_d" 2>/dev/null   true
    fi
  done
 
webhook_accept()  
  webhook_log "Accepted: $*"
  mv "$WEBHOOK_JSON_INPUT_FILE" "$WEBHOOK_ACCEPTED_JSON"
  mv "$WEBHOOK_LOGFILE_PATH" "$WEBHOOK_ACCEPTED_LOGF"
  WEBHOOK_LOGFILE_PATH="$WEBHOOK_ACCEPTED_LOGF"
 
webhook_reject()  
  [ -d "$WEBHOOK_REJECTED_TODAY" ]   mkdir "$WEBHOOK_REJECTED_TODAY"
  webhook_log "Rejected: $*"
  if [ -f "$WEBHOOK_JSON_INPUT_FILE" ]; then
    mv "$WEBHOOK_JSON_INPUT_FILE" "$WEBHOOK_REJECTED_JSON"
  fi
  mv "$WEBHOOK_LOGFILE_PATH" "$WEBHOOK_REJECTED_LOGF"
  exit 0
 
webhook_deployed()  
  [ -d "$WEBHOOK_DEPLOYED_TODAY" ]   mkdir "$WEBHOOK_DEPLOYED_TODAY"
  webhook_log "Deployed: $*"
  mv "$WEBHOOK_ACCEPTED_JSON" "$WEBHOOK_DEPLOYED_JSON"
  mv "$WEBHOOK_ACCEPTED_LOGF" "$WEBHOOK_DEPLOYED_LOGF"
  WEBHOOK_LOGFILE_PATH="$WEBHOOK_DEPLOYED_LOGF"
 
webhook_troubled()  
  [ -d "$WEBHOOK_TROUBLED_TODAY" ]   mkdir "$WEBHOOK_TROUBLED_TODAY"
  webhook_log "Troubled: $*"
  mv "$WEBHOOK_ACCEPTED_JSON" "$WEBHOOK_TROUBLED_JSON"
  mv "$WEBHOOK_ACCEPTED_LOGF" "$WEBHOOK_TROUBLED_LOGF"
  WEBHOOK_LOGFILE_PATH="$WEBHOOK_TROUBLED_LOGF"
 
print_mailto()  
  _addr="$1"
  _user_email=""
  # Add the pusher email address unless it is from the domain NO_REPLY_ADDRESS,
  # which should match the value of that variable on the gitea 'app.ini' (it
  # is the domain used for emails when the user hides it).
  # shellcheck disable=SC2154
  if [ -n "$ gt_pusher_email##*@"$ NO_REPLY_ADDRESS " " ] &&
    [ -z "$ gt_pusher_email##*@* " ]; then
    _user_email="\"$gt_pusher_full_name <$gt_pusher_email>\""
  fi
  if [ "$_addr" ] && [ "$_user_email" ]; then
    echo "$_addr,$_user_email"
  elif [ "$_user_email" ]; then
    echo "$_user_email"
  elif [ "$_addr" ]; then
    echo "$_addr"
  fi
 
mail_success()  
  to_addr="$MAIL_TO_ADDR"
  if [ "$MAIL_LOGFILE" = "true" ]; then
    to_addr="$(print_mailto "$to_addr")"
  fi
  if [ "$to_addr" ]; then
    # shellcheck disable=SC2154
    subject="OK - $gt_repo_name updated to commit '$gt_after'"
    mail -s "$ MAIL_PREFIX $ subject " "$to_addr" \
      <"$WEBHOOK_LOGFILE_PATH"
  fi
 
mail_failure()  
  to_addr="$MAIL_TO_ADDR"
  if [ "$MAIL_ERRFILE" = true ]; then
    to_addr="$(print_mailto "$to_addr")"
  fi
  if [ "$to_addr" ]; then
    # shellcheck disable=SC2154
    subject="KO - $gt_repo_name update FAILED for commit '$gt_after'"
    mail -s "$ MAIL_PREFIX $ subject " "$to_addr" \
      <"$WEBHOOK_LOGFILE_PATH"
  fi
 
# ----
# MAIN
# ----
# Check directories
webhook_check_directories
# Go to the base directory
cd "$BASE_DIR"
# Check if the file exists
WEBHOOK_JSON_INPUT_FILE="$1"
if [ ! -f "$WEBHOOK_JSON_INPUT_FILE" ]; then
  webhook_reject "Input arg '$1' is not a file, aborting"
fi
# Parse the file
webhook_log "Processing file '$WEBHOOK_JSON_INPUT_FILE'"
eval "$(jq -r "$ENV_VARS_QUERY" "$WEBHOOK_JSON_INPUT_FILE")"
# Check that the repository clone url is right
# shellcheck disable=SC2154
if [ "$gt_repo_clone_url" != "$REPO_CLONE_URL" ]; then
  webhook_reject "Wrong repository: '$gt_clone_url'"
fi
# Check that the branch is the right one
# shellcheck disable=SC2154
if [ "$gt_ref" != "$REPO_REF" ]; then
  webhook_reject "Wrong repository ref: '$gt_ref'"
fi
# Accept the file
# shellcheck disable=SC2154
webhook_accept "Processing '$gt_repo_name'"
# Update the checkout
ret="0"
git fetch >>"$WEBHOOK_LOGFILE_PATH" 2>&1   ret="$?"
if [ "$ret" -ne "0" ]; then
  webhook_troubled "Repository fetch failed"
  mail_failure
fi
# shellcheck disable=SC2154
git checkout "$gt_after" >>"$WEBHOOK_LOGFILE_PATH" 2>&1   ret="$?"
if [ "$ret" -ne "0" ]; then
  webhook_troubled "Repository checkout failed"
  mail_failure
fi
# Remove the build dir if present
if [ -d "$PUBLIC_DIR" ]; then
  rm -rf "$PUBLIC_DIR"
fi
# Build site
docker compose run hugo -- >>"$WEBHOOK_LOGFILE_PATH" 2>&1   ret="$?"
# go back to the main branch
git switch main && git pull
# Fail if public dir was missing
if [ "$ret" -ne "0" ]   [ ! -d "$PUBLIC_DIR" ]; then
  webhook_troubled "Site build failed"
  mail_failure
fi
# Remove old public_html copies
webhook_log 'Removing old site versions, if present'
find $NGINX_BASE_DIR -mindepth 1 -maxdepth 1 -name 'public_html-*' -type d \
  -exec rm -rf   \; >>"$WEBHOOK_LOGFILE_PATH" 2>&1   ret="$?"
if [ "$ret" -ne "0" ]; then
  webhook_troubled "Removal of old site versions failed"
  mail_failure
fi
# Switch site directory
TS="$(date +%Y%m%d-%H%M%S)"
if [ -d "$PUBLIC_HTML_DIR" ]; then
  webhook_log "Moving '$PUBLIC_HTML_DIR' to '$PUBLIC_HTML_DIR-$TS'"
  mv "$PUBLIC_HTML_DIR" "$PUBLIC_HTML_DIR-$TS" >>"$WEBHOOK_LOGFILE_PATH" 2>&1  
    ret="$?"
fi
if [ "$ret" -eq "0" ]; then
  webhook_log "Moving '$PUBLIC_DIR' to '$PUBLIC_HTML_DIR'"
  mv "$PUBLIC_DIR" "$PUBLIC_HTML_DIR" >>"$WEBHOOK_LOGFILE_PATH" 2>&1  
    ret="$?"
fi
if [ "$ret" -ne "0" ]; then
  webhook_troubled "Site switch failed"
  mail_failure
else
  webhook_deployed "Site deployed successfully"
  mail_success
fi
# ----
# vim: ts=2:sw=2:et:ai:sts=2

22 May 2022

Sergio Talens-Oliag: New Blog

Welcome to my new Blog for Technical Stuff. For a long time I was planning to start publishing technical articles again but to do it I wanted to replace my old blog based on ikiwiki by something more modern. I ve used Jekyll with GitLab Pages to build the Intranet of the ITI and to generate internal documentation sites on Agile Content, but, as happened with ikiwiki, I felt that things were kind of slow and not as easy to maintain as I would like. So on Kyso (the Company I work for right now) I switched to Hugo as the Static Site Generator (I still use GitLab Pages to automate the deployment, though), but the contents are written using the Markdown format, while my personal preference is the Asciidoc format. One thing I liked about Jekyll was that it was possible to use Asciidoctor to generate the HTML simply by using the Jekyll Asciidoc plugin (I even configured my site to generate PDF documents from .adoc files using the Asciidoctor PDF converter) and, luckily for me, that is also possible with Hugo, so that is what I plan to use on this blog, in fact this post is written in .adoc. My plan is to start publishing articles about things I m working on to keep them documented for myself and maybe be useful to someone else. The general intention is to write about Container Orchestration (mainly Kubernetes), CI/CD tools (currently I m using GitLab CE for that), System Administration (with Debian GNU/Linux as my preferred OS) and that sort of things. My next post will be about how I build, publish and update the Blog, but probably I will not finish it until next week, once the site is fully operational and the publishing system is tested.
Spoiler Alert: This is a personal site, so I m using Gitea to host the code instead of GitLab. To handle the deployment I ve configured json2file-go to save the data sent by the hook calls and process it asynchronously using inotify-tools. When a new file is detected a script parses the JSON file using jq and builds and updates the site if appropriate.

15 February 2015

Sergio Talens-Oliag: Retooling

I haven't blogged for a long time, but I've decided that I'm going to try to write again, at least about technical stuff. My plan was to blog about the projects I've been working on lately, the main one being the setup of the latest version of Kolab with the systems we already have at work, but I'll do that on the next days. Today I'm just going to make a list of the tools I use on a daily basis and my plans to start using additional ones in the near future. Shells, Terminals and Text Editors I do almost all my work on Z Shell sessions running inside tmux; for terminal emulation I use gnome-terminal on X, VX ConnectBot on Android systems and iTerm2 on Mac OS X. For text editing I've been using Vim for a long time (even on Mobile devices) and while I'm aware I don't know half of the things it can do, what I know is good enough for my day to day needs. In the past I also used Emacs as a programming editor and my main tool to write HTML, SGML and XML, but since I haven't really needed an IDE for a long time and I mainly use Lightweight Markup Languages I haven't used it for a long time (I briefly tried to use Org mode, but for some reason I ended up leaving it). Documentation formats and tools Since a long time ago I've been an advocate of Lightweight Markup Languages; I started to use LaTeX and Lout, then moved to SGML/XML formats (LinuxDoc and DocBook) and finally moved to plain text based formats. I started using Wiki formats (parsewiki) and soon moved to reStructuredText; I also use other markup languages like Markdown (for this blog, aka ikiwiki) and tried MultiMarkdown to replace reStructuredText for general use, but as I never liked Markdown syntax I didn't liked an extended version of it. While I've been using ReStructuredText for a long time, I recently found Asciidoctor and the Asciidoc format and I guess I'll be using it instead of rst whenever I can (I still need to try the slide backends and conversions to ODT, but if that works I guess I'll write all my new documents using Asciidoc). Programming languages I'm not a developer, but I read and patch a lot of free software code written on a lot of different programming languages (I wouldn't be able to write whole programs on most of them, but thanks to Stack Overflow I'm usually able to fix what I need). Anyway, I'm able to program in some languages; I write a lot of shell scripts and I go for Python and C when I need something more complicated. On the near future I plan to read about javascript programming and nodejs (I'll probably need it at work) and I already started looking at Haskell (I guess it was time to learn about functional programming and after reading about it, it looks like haskell is the way to go for me). Version Control For a long time I've been a Subversion user, at least for my own projects, but seems that everything has moved to git now and I finally started to use it (I even opened a github account) and plan to move all my personal subversion repositories at home and at work to git, including the move of all my debian packages from svn-buildpackage to git-buildpackage. Further Reading With the previous plans in mind, I've started reading a couple of interesting books: Now I just need to get enough time to finish reading them ... ;)

1 October 2011

Sergio Talens-Oliag: Static website generators

The last month I was supposed to work on a OpenStack related project, but for administrative reasons it has been delayed and I've tried to do small tasks to be able to finish them quickly and start the work on the main project when the issues get solved. As the delay has been longer than expected last Wednesday I've realized than on the last weeks I did a lot of small system administration tasks:
  • With a co-worker I started to work on a GNU/Linux version of our firewall based on Shorewall to handle the rules and conntrackd and keepalived to make it highly available (I had to stop my work on the Debian GNU/kFreeBSD based firewall a long time ago, and this summer the old firewalls' hardware started to fail, so a migration from Linux to Linux makes sense now, as it will be faster and a future migration will be simpler, as we will have a cleaner set of rules and better documentation),
  • I installed and configured an instance of a web based File Exchange server (F*EX),
  • I installed and configured an instance of a pastebin clone,
  • I installed and configured an instance of ProFTPD that works only as a SFTP server using virtual users (without shell access),
  • I also installed an instance of a web based event management system called indico that is being used to manage a conference and probably will be used for other events in the future,
  • I installed and patched some plugins in our Trac servers,
  • I tried a groupware system called SOGo that we will probable deploy in a week or two,
  • And updated and fixed configurations of some other services,
With all the changes I did I noticed that I had to do something with our Intranet server; it is just a reverse proxy for a lot of different web services and its main page was one static HTML page with links to them, nothing else. In the long term maybe we will replace it with something based on Drupal or Lifeay, but for now I just wanted something to be able to organize the links and provide some information about the services for the new users without having to write HTML (I really like Agile Documentation Tools that let me focus on the content and forget about the markup), and started to look at some of them. My first idea was to use ikiwiki, as it has all the features I was looking for: I can use Markdown or reStructuredText to write the contents, the source pages are easily handled on a Version Control System, it supports the use of templates for the HTML, etc., but it seemed to me that using ikiwiki was like killing flies with a cannon (that's a Spanish say, I guess it's easy to understand it in English, no?) and I decided to review other tools to build static web sites. To make a long story short, I selected some tools that met my requirements and looked nice on their demo sites; after my first review I thought that Hyde was going to be my bet, as it uses technologies I'm already familiar with, but after trying it I saw that I was going to have a problem with documentation (the current Hyde version lacks it) and it was going to be more complicated that using ikiwiki. Before giving up I decided to review simpler tools, just in case, and after looking some of them I ended up using poole, a simple python script (the source is just one file and it only requires python-markdown to work). Before moving to the content I tried to adapt a couple of free themes to be used by the tool, but I didn't liked the result, so I went back to the plain style provided by the tool and added a logo and a background. With that simple look and feel I started to work with the content, splitting it into eight markdown files and a python macro to include a file that has all the links used on the site. While trying to make the main page look good I noticed how little I know about CSS, but using search engines I was able to build a two column block into the main page and publish the contents and with the help of some CSS enabled co-workers I changed the look and feel of the site in about 30 minutes. In summary, if you want a really simple website, you know a little bit of python and don't want to spend much time learning how to use a website generator then Poole is a good option. If you want something more complex I still think that ikiwiki is a good option, but YMMV.

30 November 2010

Sergio Talens-Oliag: freebsd-utils-8.1-2 to 2.4.patch

Sergio Talens-Oliag: The FreakyWall (Part 3: Packages)

In this post I'll describe the changes made to the kernel and some of the Squeeze packages for the Freaky Wall. The plan is to submit whishlist bugs to the BTS on the hope of having all what is needed for this project available on Debian after the Squeeze release, as my feeling is that a freeze is not the right time to push this changes... ;) I'm giving access here to all the changes made to the source packages, but if anyone wants the binary packages (amd64 only) send me an email and I'll give you the URL of an apt repository that contains all the modified packages (it's the one at work, that contains other modified packages) or, if there is interest, I can put them on people.debian.org. Kernel To be able to build the firewall we need a kFreeBSD kernel with some options not compiled on the version distributed with Debian. To compile the kernel I've followed the procedure described on the following debian-bsd mailing list post: http://lists.debian.org/debian-bsd/2010/09/msg00023.html Basically I've done the following:
    apt-get build-dep kfreebsd-8
    apt-get source kfreebsd-8
    cd kfreebsd-8-8.1
    cat >> debian/arch/amd64/amd64.config << EOF
    # Add pflog, pfsync, ALTQ and CARP support
    # ----------------------------------------
    # http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/firewalls-pf.html
    device      pf
    device      pflog
    device      pfsync
    options         ALTQ
    options         ALTQ_CBQ        # Class Bases Queuing (CBQ)
    options         ALTQ_RED        # Random Early Detection (RED)
    options         ALTQ_RIO        # RED In/Out
    options         ALTQ_HFSC       # Hierarchical Packet Scheduler (HFSC)
    options         ALTQ_PRIQ       # Priority Queuing (PRIQ)
    options         ALTQ_NOPCC      # Required for SMP build
    # http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/carp.html
    device      carp
    EOF
    vi debian/changelog 
    dpkg-buildpackage -B -uc
Once the package was built I installed the new kernel package and rebooted the machine. Utilities To be able to use some utilities related to pf I have built patched versions of three packages:
  • freebsd-utils: I have included pflogd and ftp-proxy on the package and have added some options to allow the use of additional interface types with ifconfig (carp, pfsync, lagg, bridges, ...). There were a lot of changes needed:
    1. The pflogd and ftp-proxy Makefiles are missing from the original tarball, I modified the get-orig-source of the debian/rules, but I build my packages against the original tarball, with the extra files included directly on the package .diff.gz.
    2. The pflogd daemon needs a _pflogd user and group and a /var/run/_pflogd directory, so I've added the directory and the creation of the user and group to the package post-install.
    3. The ftp-proxy daemon uses the proxy user when dropping privileges and I've modified the code to make it use the /var/run/ftp-proxy directory for the chroot.
    4. Some C header files that are not available on -dev packages were missing and I put them on the packages' debian directory. I've created a debian/include directory and moved there the original debian/net directory and added the headers debian/include/net/if_bridgevar.h and debian/include/net/if_lagg.h to add bridging support to ifconfig and the debian/include/pcap-config.h and debian/include/pcap-int.h libpcap private headers to be able to compile the pflogd binary.
    As I'm not familiar with the way people changes code for kFreeBSD some of the patches are a little bit dirty, but at least things work; besides, probably I should also have had to include init.d scripts for pf, pflogd and ftp-proxy, but I have not done it at the package level yet, as what I got was enough to work with the tools. The debdiff against the freebsd-utils-8.1-2 source package is available here or here.
  • libpcap: A test had to be removed in order to be able to support pflog on the library; the debdiff against the libpcap-1.1.1-2 package is available here or here.
  • tcpdump: The package also had to be modified to include the code to print the pflogd entries on the pcap file; the debdiff against tcpdump_4.1.1-1 is available here or here.
On the next post I'll describe how I've configured the system, the network interfaces and the different utilities patched and compiled on this post.

Sergio Talens-Oliag: libpcap-1.1.1-2 to 2.1.patch

Sergio Talens-Oliag: tcpdump 4.1.1-1 to 1.1.patch

24 November 2010

Sergio Talens-Oliag: The Freaky Wall (Part 2: Initial Installation)

For the Freaky Wall I have installed a Debian GNU/kFreeBSD system using the installer with ZFS support announced on: http://robertmh.wordpress.com/2010/09/06/debian-installer-with-zfs/ I used the mini.iso found on: http://people.debian.org/~rmh/zfs/kfreebsd-amd64/monolithic/mini.iso the 12th of October of 2010; as I had some problems and reported them to Robert is possible that the current image solves part of them. Installation plan I did a standard installation on a machine with two hard disks, but only used the first one from the installer. The plan was to use ZFS with RAID-1, but current versions of grub do not support booting from a ZFS + RAID file system, so I had to use the same technique used for Linux for a long time; three partitions: a swap partition, a small /boot partition and a big partition for /; / and /boot were formated to use ZFS. First reboot After the installation the system failed to boot because of a bug when building the /boot/grub/grub.cfg (some routes were missing a //@ prefix); to be able to boot Iwe edited the config on the grub prompt and later fixed the file:
    --- grub.cfg.orig       2010-10-13 16:40:39.000000000 +0200
    +++ grub.cfg    2010-10-13 18:38:47.535436766 +0200
    @@ -64,7 +64,7 @@
            set root='(hd0,1)'
            search --no-floppy --fs-uuid --set a371979bb836d1fe
            echo                    'Loading kernel of FreeBSD 8.1-1-amd64 ...'
    -       kfreebsd                /kfreebsd-8.1-1-amd64.gz
    +       kfreebsd                //@/kfreebsd-8.1-1-amd64.gz
            insmod part_msdos
            insmod zfs
            set root='(hd0,3)'
    @@ -75,7 +75,7 @@
            insmod zfs
            set root='(hd0,1)'
            search --no-floppy --fs-uuid --set a371979bb836d1fe
    -       kfreebsd_module         /zfs/zpool.cache type=/boot/zfs/zpool.cache
    +       kfreebsd_module         //@/zfs/zpool.cache type=/boot/zfs/zpool.cache
            set kFreeBSD.vfs.root.mountfrom=zfs:dkfbf1-ad4s3
            set kFreeBSD.vfs.root.mountfrom.options=rw
      
I haven't tested the installer since that day, but I believe that the current ZFS installer was fixed by Robert to deal with that problem. Once the system was booted I had to fix a couple of things:
  • The keyboard configuration was wrong, but it was easy to fix the Debian Way:
    dpkg-reconfigure kbdcontrol
    
  • The /boot partition was mounted on /target/boot, as that was what was recorded on the ZFS file system; to fix it I executed the following commands:
     # zfs set mountpoint=/     dkfbf1-ad4s3
     # zfs set mountpoint=/boot dkfbf1-ad4s1
    
    Where dkfbf1-ad4s3 is the root file system and dkfbf1-ad4s1 is the original /boot. I reported that to Robert also and I believe it is fixed on the ZFS installer now.
Adjusting ZFS to do RAID-1 On the second disk I created the same partitions as the ones on the first disk using parted; the final result was:
    # parted -l     
    Model: ST3250620NS/3BKS (ide)
    Disk /dev/ad6: 250GB
    Sector size (logical/physical): 512B/512B
    Partition Table: msdos
    Number  Start   End     Size    Type     File system  Flags
     1      1049kB  256MB   255MB   primary
     2      256MB   4256MB  4000MB  primary
     3      4256MB  250GB   246GB   primary
    Model: ST3250620NS/3BKS (ide)
    Disk /dev/ad4: 250GB
    Sector size (logical/physical): 512B/512B
    Partition Table: msdos
    Number  Start   End     Size    Type     File system     Flags
      1      1049kB  256MB   255MB   primary
      2      256MB   4256MB  4000MB  primary  linux-swap(v1)
      3      4256MB  250GB   246GB   primary
To use the second partition of both disks as swap I added the following to /etc/fstab:
    /dev/ad4s2      none            swap    sw              0       0
    /dev/ad6s2      none            swap    sw              0       0
To configure the mirroring for the root file system I did the following:
    zpool attach dkfbf1-ad4s3 ad4s3 ad6s3
As the /boot can't work as a replica I adjusted it to make two copies of every file:
    zfs   set copies=2     dkfbf1-ad4s1
Leaving the second disk copy alone, although my plan is to configure it to hold a copy of the /boot partition synchronized with rsync each night. After all those changes the system didn't boot, as the grub-pc generates a buggy /boot/grub/grub.cfg; the problem is on the /etc/grub.d/10_kfreebsd section:
    ### BEGIN /etc/grub.d/10_kfreebsd ###
    menuentry 'Debian GNU/kFreeBSD, with kFreeBSD 8.1-1-amd64' --class debian \
      --class gnu-kfreebsd --class gnu --class os  
          insmod part_msdos
          insmod zfs
          set root='(hd0,1)'
          search --no-floppy --fs-uuid --set a371979bb836d1fe
          echo 'Loading kernel of FreeBSD 8.1-1-amd64 ...'
          kfreebsd /kfreebsd-8.1-1-amd64.gz
          set kFreeBSD.vfs.root.mountfrom=unknown:/dev/ad4s3
          set kFreeBSD.vfs.root.mountfrom.options=rw
     
    ### END /etc/grub.d/10_kfreebsd ###
To fix it there has to be a copy of the modules for ZFS on the boot partition (in my case I moved the /lib/modules directory to /boot and created a link on the root partition to the new directory):
    cd /boot
    mkdir lib
    mv /lib/modules lib
    cd /lib
    ln -s ../boot/lib/modules
And instead of fixing the /etc/grub.d/10_kfreebsd code I wrote a new script (/etc/grub.d/09_zfs_kfreebsd) that creates the right config for my current configuration on the grub.cfg file:
    #!/bin/sh
    prefix=/usr
    exec_prefix=$ prefix 
    bindir=$ exec_prefix /bin
    libdir=$ exec_prefix /lib
    . $ libdir /grub/grub-mkconfig_lib
    prepare_boot_cache="$(prepare_grub_to_access_device $ GRUB_DEVICE_BOOT    sed -e "s/^/\t/")"
    kfreebsd_versions="$(ls /lib/modules/)"
    zfs_root_device="$(zfs list   awk '/\/$/   print $1  '   head -1)"
    for kversion in $kfreebsd_versions; do
      cat << EOF
    # Entry when using ZFS (we have issues with /etc/grub.d/10_kfreebsd)
    menuentry 'Debian GNU/kFreeBSD, with kFreeBSD $kversion and ZFS' --class debian --class gnu-kfreebsd --class gnu --class os  
    $ prepare_boot_cache 
          echo                    'Loading kernel of FreeBSD $kversion ...'
          kfreebsd                //@/kfreebsd-$kversion.gz
          kfreebsd_module_elf     //@/lib/modules/$kversion/opensolaris.ko
          kfreebsd_module_elf     //@/lib/modules/$kversion/zfs.ko
          kfreebsd_module         //@/zfs/zpool.cache type=/boot/zfs/zpool.cache
          set kFreeBSD.vfs.root.mountfrom=zfs:$zfs_root_device
          set kFreeBSD.vfs.root.mountfrom.options=rw
     
    EOF
    done
I solved the problem this way to have a working solution that does not break with squeeze upgrades, assuming that a future grub-pc package will deal well with my config and I'll be able to remove this script, but I guess I'll have to install it from backports. The entry generated by the script when called from update-grub will be similar to:
    ### BEGIN /etc/grub.d/09_zfs-kfreebsd ###
    # Entry when using ZFS (we have issues with /etc/grub.d/10_kfreebsd)
    menuentry 'Debian GNU/kFreeBSD, with kFreeBSD 8.1-1-amd64 @ ITI' --class debian --class gnu-kfreebsd --class gnu --class os  
          insmod part_msdos
          insmod zfs
          set root='(hd0,1)'
          search --no-floppy --fs-uuid --set a371979bb836d1fe
          echo                    'Loading kernel of FreeBSD 8.1-1-amd64 ...'
          kfreebsd                //@/kfreebsd-8.1-1-amd64.gz
          kfreebsd_module_elf     //@/lib/modules/8.1-1-amd64/opensolaris.ko
          kfreebsd_module_elf     //@/lib/modules/8.1-1-amd64/zfs.ko
          kfreebsd_module         //@/zfs/zpool.cache type=/boot/zfs/zpool.cache
          set kFreeBSD.vfs.root.mountfrom=zfs:dkfbf1-ad4s3
          set kFreeBSD.vfs.root.mountfrom.options=rw
     
    ### END /etc/grub.d/10_iti-kfreebsd ###
And after rebooting the machine with this new configuration the system boots OK. On my next post I'll continue explaining how to compile a kernel that supports the use of the OpenBSD Packet Filter and related technologies (CARP, pflog, etc.).

22 November 2010

Sergio Talens-Oliag: The Freaky Wall (Part 1: Why?)

This post and the next to come are about a project I'm doing at work that I've called The Freaky Wall. The project has its origins on the idea that using multiples technologies is better for security; almost all the servers I use are running Debian GNU/Linux and use iptables locally, so when I decided that we had to build new firewalls at work I thought it was a good idea to look at different technologies, that is, a different kernel and firewalling tools. As I wanted to avoid iptables and the Linux kernel my first idea was to go after the free BSD systems (FreeBSD, OpenBSD or NetBSD), and soon realized that pf (the OpenBSD Packet Filter) was the way to go; it has a clean syntax and includes advanced features like CARP and pfsync that allow me to build redundant firewalls. Before going after the standard systems I looked at pfSense a firewall appliance built on top of FreeBSD that uses a PHP interface to do everything. At first it seemed that it was going to be a good option, but soon I felt that I wasn't in control of what the system was doing and I had to change the PHP code to do trivial things (I wanted to configure IP aliases on a CARP interface and it was not possible with the web interface, while it is trivial to do using the standard system configuration files), so I left the idea of using it. The second option was to use OpenBSD directly, as it is the system were pf has been developed. Soon I saw that I was going to be able to do what I wanted with the system, but I missed the Debian's way of installing and upgrading the system and the list of packages available. For different reasons the firewall project was left in a limbo for a little while and when I went back to it I already had to upgrade my test systems to a new OpenBSD release; after reading a little bit about how to upgrade and not liking the idea of doing it I remembered that jordi suggested that if we only want the kernel and the firewall tools the Debian GNU/kFreeBSD port could be an option instead of OpenBSD or FreeBSD. Before trying to install the Debian GNU/kFreeBSD system I saw Robert's post about a Debian installer with ZFS support and I decided to start with it, as the use of ZFS will allow us to use software RAID-1 and snapshots, something we have on almost all our Linux servers (we use software RAID for redundancy and LVM snapshots to be able to do our backups at any time of the day with consistent data, but that is for another post). On my next post I'll explain how I did the initial installation with ZFS, and after that I'll explain the changes I did to the kernel and some of the packages to be able to build a firewall as described on the Firewalling with PF document (that is, I needed pfctl, pflogd, a tcpdump with pflog support, the pf's ftp-proxy, etc.) and on the last document I'll explain how I've configured the firewalls The Debian Way .

26 October 2010

Sergio Talens-Oliag: Debian Squeeze, PowerPC and the Linux Containers

Two kids, their really busy mother and my paid job leave me without much time to blog or do Debian related work lately (well, at least on my free time, I do Debian related things at work, but mostly as a user, not as a developer). Anyway, a couple of weeks ago I decided it was time to upgrade my home servers to Squeeze and I did it, but it was harder than expected. At home I'm using two old laptops as servers, an old Aluminium PowerBook and an Asus EeePC; the Asus was installed to replace an older PowerBook (a really old one, BTW) that I was using as home server since my father gave it to me. The plan was to use OpenVZ on the Asus to move all the PowerPC services to a couple of Virtual Environments, but as I wanted to migrate and change almost all the services I never got enough free time to finish the job and when the old PowerBook hardware failed I replaced it with another PowerBook that I wasn't using anymore, but instead of reinstalling the machine I did a clean Lenny install using a kernel with support for linux-vserver (OpenVZ does not work on PowerPC) and transformed the old machine installation (it was an Etch installation at the time) into a Virtual Private Server that run on the new hardware. Having both systems running I upgraded the VPS to Lenny and, as usually happens, left the things as they were without consolidating the services into only one machine, as I initially planned. With this state of affairs I upgraded the Asus to Squeeze without much trouble (in fact I installed a kernel without OpenVZ support, as the services I use from this laptop were running on the host and not on a VE) and did the same with the PowerPC host, but to my surprise the linux-vserver VPS failed to start with a message that seemed to imply that the VServer support was not enabled. I should have filled a bug on the BTS then, but as I looked into how to solve the issue I found bugs saying that the meaning of the message was that I had no support for linux-vserver and I needed to start the VPS ASAP, as it was the machine that runs my SMTP server. Before doing a restore of my last backup I did some digging and found a lot of messages recommending to move OpenVZ and Linux-VServer virtual machines to LXC and decided to give it a try. First I built a container on the Asus and it worked OK, after that I did the same on the PowerPC, but the script failed; luckily the patch was trivial, the problem was on the /usr/lib/lxc/templates/lxc-debian script; it uses arch to get the Debian architecture, but for powerpc it gives ppc instead of powerpc, so it needs to be fixed on the script (Note to self: I have to submit bug + patch to the lxc package to fix it). After creating this container and trying it I tried to boot my old VPS with a LXC configuration:
  • For the network I used a veth device attached to a bridge (I was already using a local bridge and the /etc/network/interfaces file on the container was right, as it was the one that I copied from the old real machine).
  • I also reviewed the containers' /etc/rc*.d contents, disabling the hardware related services (to do that I just followed the template script actions).
After a couple of tries I noticed that the system was not booting because it was missing the devices files needed; to fix it I copied the /dev directory of my first LXC test and using a chroot I also removed the udev packages from the container. After that last changes the machine booted as expected and all services were running OK. To summarize, I decided to do the move to LXC and fixed the configuration to boot the virtual machines on each restart:
  • First I moved the machines to /var/lib/lxc/, putting each container in a sub directory that includes the machine's config file and its rootfs.
  • Once I had that I linked the /var/lib/lxc/$CONTAINER/config files of the machines I want to boot on each host restart with names of the form /etc/lxc/$CONTAINER.conf and adjusted the /etc/default/lxc file accordingly:
  • To try everything I rebooted the machine and all worked fine.
I know that LXC is still missing some functionality (I hate the way the container stop function kills everything instead of doing a run-level change, I guess I'll be using hacks until I move to a newer kernel with the proper support enters into Debian), but having the code on the mainline kernel is a great bonus and the user level utilities are good enough for my home needs... and I hope they'll arrive to a point where we'll be able to migrate the OpenVZ containers at work (we are using Proxmox and the support of the OpenVZ patchset is starting to worry us). On my next post: The Freak Firewall or The Story of a HA Firewall based on OpenBSD's pf running on Debian GNU/kFreeBSD hosts.

16 September 2009

Sergio Talens-Oliag: Hugo meets Marc

As promised a photo of the first meeting between Hugo and Marc. Hugo meets Marc With luck this afternoon the whole family will be at home.

Sergio Talens-Oliag: Redmine

I've been using Subversion and Trac for some years now, and I have encouraged its use at work since the last couple of years, with the undesired effect of having to maintain four different Trac installations with different database systems (SQLite3 and PostgreSQL), plugins (more than 15 on the big servers), authentication systems (htpass files, LDAP and a database based system) and tons of projects published (two internal servers have 64 and 16 projects, one of the client system has 33 projects and there is only one single project installation, but it is living at a client's system). Yesterday night, while reading Planet Debian I found a post from John Goerzen about tools to replace Trac, including the option to use Git as the project VCS. In the post he talks about different options, mainly projects that I would categorize as issue tracking systems (mantis, roundup, etc.), but it also talks about Redmine, a project management system implemented using the Ruby on Rails framework that is similar to Trac. As it looked interesting I downloaded, installed and executed an instance in about 15 minutes (I love the systems that support sqlite3 for this quick tests, not having to touch real database servers speeds up simple tests a lot). I played a little bit with the system and I believe that I will spend some more time testing it at work next week, as it looks quite promising; the standard version has almost all the features I'm interested in without the need to install additional plugins and it can do most of the things I was missing from Trac to do lightweight project management. I evaluated ]project-open[ to use it together with Trac for our internal project management tasks, mainly because we miss important features from Trac, like having clean systems to view the tasks of a user in all projects or a clean way to do the project planning using tickets and gantt charts. Of course there are ways to do it, but the plugins I've tried are not as good and simple as I would like. The problem with the use of ]project-open[ is that I don't really like it for us, as it has tons of features that I feel we don't need nor will use and, on a first try, the system seemed difficult to deploy and maintain, probably because my lack of knowledge about OpenACS and TCL. In fact we still don't have ]po[ running at work because I was unable to to integrate the authentication system with our LDAP server on my first tries and have had no time to investigate further since then. The good thing about trying Redmine is that if we don't end up using it at least I can take the most of this opportunity by looking at Ruby on Rails and the Ruby Programming Language, at least from the administration side, as I have never looked at it seriously.

Next.