ivanch.me/content/posts/automated-changelogs-gitlab.md

---
title: "Automated Changelogs on GitLab"
date: 2023-05-15T22:38:55-03:00
draft: false
summary: "Changelog automation on GitLab CI"
---

Changelogs are good, mainly if you need to keep track of what was changed on a release. But they can be a pain to write, especially if you have a lot of commits, people working on the same project, lots of tasks, and so on. A good spot to put some **automation**.

There are a couple of ways we could make an automated changelog system, we will focus on making one that uses GitLab CI and the commit messages from the project. We will also take into consideration that *releases are made through git tags*.

For this, we will start with a few requirements:
* Plan on a commit message pattern, for example: "[TASK-200] Fixing something for a task on Jira";
* Have the release notes/changelogs on a specific part of pipeline (for example production release);
* The release notes generation will take part when creating a tag.

We will take advantage of these two commands:
1. `git log --pretty=format:"%s" --no-merges <tag>..HEAD` - This will give us the commit messages from the last tag to the HEAD;
2. `git describe --abbrev=0 --tags` - This will give us the latest tag.

## Creating a basic pipeline

Let's start by creating a basic pipeline that will run on the production release.

```yaml
run:
  script:
    - echo "Running the pipeline"

.generateChangelog:
  image: python:latest
  stage: test
  script:
    - echo "Generating changelog..."
    # Generate changelog here
  artifacts:
    name: changelog.txt
    paths:
      - changelog.txt
    when: always
    expire_in: 1 week

deploy:
  stage: deploy
  extends:
    - .generateChangelog
  rules:
    - if: $CI_COMMIT_TAG
      when: manual
  environment: production
```

We will output the changelog into a file named `changelog.txt` and then we will use the `artifacts` keyword to save it.

## Generating the changelog

Note that we set the image to be `python:latest` on the `.generateChangelog` job, this is because we will use a Python script to generate the changelog. Inside the code we will set two functions: one that will return the latest tag, and another that will get the commits between the latest tag and the HEAD.

To call commands on the OS we will use the `subprocess` module, and to get the output from the command we will use the `communicate()` function. In case of an error, we can further add some error handling (more on this later).

```python
def get_last_tag():
    pipe = sp.Popen('git describe --abbrev=0 --tags', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)
    prev_tag, err = pipe.communicate()

    # If it returns 0, it means it was successful
    if (pipe.returncode == 0):
        return prev_tag.strip()

def get_commits():
    prev_tag = get_last_tag().decode('utf-8')

    print('Previous tag: ' + prev_tag)

    pipe = sp.Popen('git rev-list ' + prev_tag + '..HEAD --format=%s', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)

    commits, err = pipe.communicate()

    # Only dealing with 0 for now
    if (pipe.returncode == 0):
        commits = commits.strip().decode('utf-8').split('\n')

    return commits
```

Now we should get a list of the commits that we want. Calling the function `get_commits()` will return a string list with all the commits, but there could be some commits that we don't want to show on the changelog, for example: `Merge branch 'master' into 'develop'`. **This is where having a pattern will help.**

```python
def get_formatted_commits():
    commits = get_commits()

    formatted_commits = []

    for commit in commits:
        if commit.startswith('[TASK-') or commit.startswith('[BUG-'):
            formatted_commits.append(commit)

    return formatted_commits
```

This will give us only the important commit messages with the pattern that we want. We can further improve this by adding a regex, transforming `formatted_commits` into a `set` of Task Numbers, do some parsing, API calls, whatever we want. For now, we will keep simple and do the basic.

## Writing the changelog

Now that we have the commits that we want, we can write them to a file. We will use the `open` function to open the file and write the commits to it.

```python
def write_changelog():
    commits = get_formatted_commits()

    with open('changelog.txt', 'w') as f:
        for commit in commits:
            f.write(commit + '\n')
```

## Putting it all together on the pipeline yaml file

Now that we have the everything we want, we can put them all together on the pipeline yaml file.

```yaml
run:
  script:
    - echo "Running the pipeline"

.generateChangelog:
    image: python:latest
    stage: test
    script:
        - echo "Generating changelog..."
        - git tag -d $(git describe --abbrev=0 --tags) || true
        - python changelog.py
    artifacts:
        name: changelog.txt
        paths:
            - changelog.txt
        when: always
        expire_in: 1 week

deploy:
    stage: deploy
    extends:
        - .generateChangelog
    rules:
        - if: $CI_COMMIT_TAG
        when: manual
    environment: production
```

Note that we had to add `git tag -d $(git describe --abbrev=0 --tags)` command there to delete the latest tag. This is because we are using the `git describe` command to get the latest tag, and if we don't delete it, the changelog will be empty. The `|| true` is there to make sure that the pipeline doesn't fail if a tag doesn't exist.

## Error handling

We can further improve this by adding some error handling. For example, if we don't have any tags, we can set a default hash (which would be the start of git history).

```python
def get_last_tag():
    pipe = sp.Popen('git describe --abbrev=0 --tags', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)
    prev_tag, err = pipe.communicate()

    # If it's successful, we return the tag name
    if (pipe.returncode == 0):
        return prev_tag.strip()
    else:
        # If it's not successful, we return the first commit hash
        pipe = sp.Popen('git rev-list --max-parents=0 HEAD', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)
        first_commit, err = pipe.communicate()

        # If it's successful, we return the first commit hash
        if (pipe.returncode == 0):
            return first_commit.strip()
        else:
            # If it's not successful, we print the error and exit, there's something else wrong
            print('Error: Could not get the last commit hash')
            print(err.strip())
            sys.exit(1)
```

Further error handling or improvements can be done, this is just a proof of concept. On another note, the code hasn't been tested *as is*, so there might be some errors.
automated changelogs :D 2023-05-25 22:05:07 -03:00			`---`
small thing 2023-05-27 20:21:09 -03:00			`title: "Automated Changelogs on GitLab"`
automated changelogs :D 2023-05-25 22:05:07 -03:00			`date: 2023-05-15T22:38:55-03:00`
			`draft: false`
typos 2023-05-25 22:20:17 -03:00			`summary: "Changelog automation on GitLab CI"`
automated changelogs :D 2023-05-25 22:05:07 -03:00			`---`

typos 2023-05-25 22:20:17 -03:00			`Changelogs are good, mainly if you need to keep track of what was changed on a release. But they can be a pain to write, especially if you have a lot of commits, people working on the same project, lots of tasks, and so on. A good spot to put some automation.`
automated changelogs :D 2023-05-25 22:05:07 -03:00
			`There are a couple of ways we could make an automated changelog system, we will focus on making one that uses GitLab CI and the commit messages from the project. We will also take into consideration that releases are made through git tags.`

			`For this, we will start with a few requirements:`
			`* Plan on a commit message pattern, for example: "[TASK-200] Fixing something for a task on Jira";`
			`* Have the release notes/changelogs on a specific part of pipeline (for example production release);`
			`* The release notes generation will take part when creating a tag.`

			`We will take advantage of these two commands:`
			1. `git log --pretty=format:"%s" --no-merges <tag>..HEAD` - This will give us the commit messages from the last tag to the HEAD;
			2. `git describe --abbrev=0 --tags` - This will give us the latest tag.

			`## Creating a basic pipeline`

typos 2023-05-25 22:20:17 -03:00			`Let's start by creating a basic pipeline that will run on the production release.`
automated changelogs :D 2023-05-25 22:05:07 -03:00
			```yaml
			`run:`
			`script:`
			`- echo "Running the pipeline"`

			`.generateChangelog:`
			`image: python:latest`
			`stage: test`
			`script:`
			`- echo "Generating changelog..."`
			`# Generate changelog here`
			`artifacts:`
			`name: changelog.txt`
			`paths:`
			`- changelog.txt`
			`when: always`
			`expire_in: 1 week`

			`deploy:`
			`stage: deploy`
			`extends:`
			`- .generateChangelog`
			`rules:`
			`- if: $CI_COMMIT_TAG`
			`when: manual`
			`environment: production`
			```

			We will output the changelog into a file named `changelog.txt` and then we will use the `artifacts` keyword to save it.

			`## Generating the changelog`

typos 2023-05-25 22:20:17 -03:00			Note that we set the image to be `python:latest` on the `.generateChangelog` job, this is because we will use a Python script to generate the changelog. Inside the code we will set two functions: one that will return the latest tag, and another that will get the commits between the latest tag and the HEAD.
automated changelogs :D 2023-05-25 22:05:07 -03:00
			To call commands on the OS we will use the `subprocess` module, and to get the output from the command we will use the `communicate()` function. In case of an error, we can further add some error handling (more on this later).

			```python
			`def get_last_tag():`
			`pipe = sp.Popen('git describe --abbrev=0 --tags', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)`
			`prev_tag, err = pipe.communicate()`

			`# If it returns 0, it means it was successful`
			`if (pipe.returncode == 0):`
			`return prev_tag.strip()`

			`def get_commits():`
			`prev_tag = get_last_tag().decode('utf-8')`

			`print('Previous tag: ' + prev_tag)`

			`pipe = sp.Popen('git rev-list ' + prev_tag + '..HEAD --format=%s', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)`

			`commits, err = pipe.communicate()`

			`# Only dealing with 0 for now`
			`if (pipe.returncode == 0):`
			`commits = commits.strip().decode('utf-8').split('\n')`

			`return commits`
			```

			Now we should get a list of the commits that we want. Calling the function `get_commits()` will return a string list with all the commits, but there could be some commits that we don't want to show on the changelog, for example: `Merge branch 'master' into 'develop'`. This is where having a pattern will help.

			```python
			`def get_formatted_commits():`
			`commits = get_commits()`

			`formatted_commits = []`

			`for commit in commits:`
			`if commit.startswith('[TASK-') or commit.startswith('[BUG-'):`
			`formatted_commits.append(commit)`

			`return formatted_commits`
			```

			This will give us only the important commit messages with the pattern that we want. We can further improve this by adding a regex, transforming `formatted_commits` into a `set` of Task Numbers, do some parsing, API calls, whatever we want. For now, we will keep simple and do the basic.

			`## Writing the changelog`

			Now that we have the commits that we want, we can write them to a file. We will use the `open` function to open the file and write the commits to it.

			```python
			`def write_changelog():`
			`commits = get_formatted_commits()`

			`with open('changelog.txt', 'w') as f:`
			`for commit in commits:`
			`f.write(commit + '\n')`
			```

			`## Putting it all together on the pipeline yaml file`

			`Now that we have the everything we want, we can put them all together on the pipeline yaml file.`

			```yaml
			`run:`
			`script:`
			`- echo "Running the pipeline"`

			`.generateChangelog:`
			`image: python:latest`
			`stage: test`
			`script:`
			`- echo "Generating changelog..."`
			`- git tag -d $(git describe --abbrev=0 --tags) \|\| true`
			`- python changelog.py`
			`artifacts:`
			`name: changelog.txt`
			`paths:`
			`- changelog.txt`
			`when: always`
			`expire_in: 1 week`

			`deploy:`
			`stage: deploy`
			`extends:`
			`- .generateChangelog`
			`rules:`
			`- if: $CI_COMMIT_TAG`
			`when: manual`
			`environment: production`
			```

			Note that we had to add `git tag -d $(git describe --abbrev=0 --tags)` command there to delete the latest tag. This is because we are using the `git describe` command to get the latest tag, and if we don't delete it, the changelog will be empty. The `\|\| true` is there to make sure that the pipeline doesn't fail if a tag doesn't exist.

			`## Error handling`

			`We can further improve this by adding some error handling. For example, if we don't have any tags, we can set a default hash (which would be the start of git history).`

			```python
			`def get_last_tag():`
			`pipe = sp.Popen('git describe --abbrev=0 --tags', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)`
			`prev_tag, err = pipe.communicate()`

			`# If it's successful, we return the tag name`
			`if (pipe.returncode == 0):`
			`return prev_tag.strip()`
			`else:`
			`# If it's not successful, we return the first commit hash`
			`pipe = sp.Popen('git rev-list --max-parents=0 HEAD', shell=True, stdout=sp.PIPE, stderr=sp.PIPE)`
			`first_commit, err = pipe.communicate()`

			`# If it's successful, we return the first commit hash`
			`if (pipe.returncode == 0):`
			`return first_commit.strip()`
			`else:`
			`# If it's not successful, we print the error and exit, there's something else wrong`
			`print('Error: Could not get the last commit hash')`
			`print(err.strip())`
			`sys.exit(1)`
			```

			`Further error handling or improvements can be done, this is just a proof of concept. On another note, the code hasn't been tested as is, so there might be some errors.`