Published on

MLOps Basics [Week 6]: CI/CD - GitHub Actions

Authors
normal

What is CI/CD ?

CI/CD is a coding philosophy and set of practices with which you can continuously build, test, and deploy iterative code changes.

This iterative process helps reduce the chance that you develop new code based on buggy or failed previous versions. With this method, you strive to have less human intervention or even no intervention at all, from the development of new code until its deployment

A simple flow looks like:

normal

And a complex flow looks like:

normal

Let's stick to the basic flow for now.

There are many tools with which we can perform CI/CD. The prominent ones are:

and many more...

I will be using GitHub Actions.

In this post, I will be going through the following topics:

  • Basics of GitHub Actions
  • First GitHub Action
  • Creating Google Service Account
  • Giving access to service account
  • Configuring DVC to use Google Service account
  • Configuring Github Action

Note: MLOps includes model development also as a part of the cycle. This post covers only devops part.

ocean

Basics of GitHub Actions



Since we are usingoceanas the version control system, we can use GitHub Actions right off the bat without having the need to setup another tool. (Might not be the same if you are using different version control system.)

GitHub Actions are just a set instructions declared using yaml files.

These files needs to be in a specific folder: .github/workflows and this has to be in the root directory (where .git folder is present).

There are 5 main concepts in GitHub Actions:

  • Events: An event is a trigger for workflow.

  • Jobs: Jobs defines the steps to run when a workflow is triggered. A workflow can contain multiple jobs.

  • Runners: Defines where to run the code. By default, github will run the code in it's own servers.

  • Steps: Steps contains actions to run. Each job can contains multiple steps to run.

  • Actions: Actions contains actual commands to run like installing dependencies, testing code, etc.

πŸ₯‡ First GitHub Action

Let's create the folder using the command:

mkdir .github/workflows

Now let's create a basic workflow file.

name: GitHub Actions Basic Flow
on: [push]
jobs:
  Basic-workflow:
    runs-on: ubuntu-latest
    steps:
      - name: Basic Information
        run: |
          echo "🎬 The job was automatically triggered by a ${{ github.event_name }} event."
          echo "πŸ’» This job is now running on a ${{ runner.os }} server hosted by GitHub!"
          echo "πŸŽ‹ Workflow is running on the branch ${{ github.ref }}"
      - name: Checking out the repository
        uses: actions/checkout@v2
      - name: Information after checking out
        run: |
          echo "πŸ’‘ The ${{ github.repository }} repository has been cloned to the runner."
          echo "πŸ–₯️ The workflow is now ready to test your code on the runner."
      - name: List files in the repository
        run: |
          ls ${{ github.workspace }}
      - run: echo "🍏 This job's status is ${{ job.status }}."

Let's understand what's happening here:

  • Created a CICD workflow with name GitHub Actions Basic Flow

  • on is called Event which triggers the workflow. Here it is push event. Whenever a push is happened on the repository, workflow will be triggered. There are 30+ ways of triggering the workflow. Refer to the documentation for more information

  • Workflow contains a single job called Basic-workflow running on ubuntu-latest

  • Basic-workflow job contains multiple steps (Basic Information, Checking out the repository, Information after checking out, List files in the repository)

  • Basic Information step contains the actions to do some echoing.

  • Checking out the repository step contains the action to checkout the repository. Here we are using actions/checkoutv2 which is a open source action. Check for other available actions here

  • Information after checking out step contains the action to echo some information about repository and runner.

  • List files in the repository step contains the action to list the contents of the repository.

Commit the file. Let's see how does it look in github.

  • On GitHub, navigate to the main page of the repository.

  • Under your repository name, click Actions.

normal
  • In the left sidebar, you can see the workflows.
normal
  • You can see a workflow is trigged.
normal
  • Select the latest run of the required workflow.
normal
  • Select the job
normal
  • Job contains all the steps it ran
normal
  • The logs shows the execution of steps present in the job.
normal
  • The icon indicates whether the workflow ran sucessfully or not.
normal

Now that we understand how to configure a basic workflow, let's configure workflow for the project.

In the previous posts, we have seen how to push the model to dvc, pull the model from dvc, create docker container and test it locally. Let's see how to do it using GitHub Actions.

βš™οΈ Creating Google Service Account

I have used Google Drive as the remote server for storing the trained model. In order to pull the model from Google Drive we need authentication. Ideally a link will be prompted (for the first time) when we attempt to push / pull the model from remote server and copy&paste the password prompted.

But it will be difficult to do this automatically. Inorder to be able to download the model and test it automatically in CICD, service account can be used.

What are service accounts?

A service account is a Google account associated with your GCP project, and not a specific user. They are intended for scenarios where your code needs to access data on its own, e.g. running inside a Compute Engine, automatic CI/CD, etc. No interactive user OAuth authentication is needed.

  • Go to GCP consle and create a project.

  • To create a service account, navigate to IAM & Admin in the left sidebar, and select Service Accounts. Click + CREATE SERVICE ACCOUNT, on the next screen, enter Service account name e.g. "MLOps", and click Create.

normal
  • Provide a name to service account like model and click Done
normal
  • Go the keys tab and create a new key. When prompted choose the key type as json.
normal

A json file will be downloaded

⚠️ Be careful about sharing the key file with others.

  • Enable Google Drive API for the project. Search for Google Drive API in the search bar and enable it for the project
normal
normal

🀝 Giving access to service account

Now that service account is created, add this to the remote server (google drive).

Go to the google drive and navigate to the remote storage folder (MLOps) and this service account email in the sharing permissions.

normal

βš™οΈ Configuring DVC to use Google Service account

Now let's modify the dvc to use service account instead of acutal google account.

This can be done via

dvc remote add -d storage gdrive://19JK5AFbqOBlrFVwDHjTrf9uvQFtS0954
dvc remote modify storage gdrive_use_service_account true
dvc remote modify storage gdrive_service_account_json_file_path creds.json

Let's understand the commands here:

  • We are creating a default remote storage with name storage and link gdrive://19JK5AFbqOBlrFVwDHjTrf9uvQFtS0954

  • Configuring the remote storage to use service account

  • Providing the credentials (json file which is created)

We can test it by trying to pull the model using the command:

cd dvcfiles
dvc pull trained_model.dvc

ocean

Configuring GitHub Action


Before creating new a workflow, let's modify the dockerfile to accomodate all the changes.

FROM huggingface/transformers-pytorch-cpu:latest

COPY ./ /app
WORKDIR /app

# install requirements
RUN pip install "dvc[gdrive]"
RUN pip install -r requirements_inference.txt

# initialise dvc
RUN dvc init --no-scm
# configuring remote server in dvc
RUN dvc remote add -d storage gdrive://19JK5AFbqOBlrFVwDHjTrf9uvQFtS0954
RUN dvc remote modify storage gdrive_use_service_account true
RUN dvc remote modify storage gdrive_service_account_json_file_path creds.json

# pulling the trained model
RUN dvc pull dvcfiles/trained_model.dvc

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

# running the application
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

NOTE: Do not share the credentials json file publicly.

Now let's create a new github action file in .github/worflows folder as build_docker_image.yaml

name: Create Docker Container

on: [push]

jobs:
  mlops-container:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ./week_6_github_actions
    steps:
      - name: Checkout
        uses: actions/checkout@v2
        with:
          ref: ${{ github.ref }}
      - name: Build container
        run: |
          docker network create data
          docker build --tag inference:latest .
          docker run -d -p 8000:8000 --network data --name inference_container inference:latest

The file pretty self explanatory.

In Github, this will look this:

normal

πŸ”š

This concludes the post. We have seen how to automatically create a docker image using GitHub Actions. But the problem here is it is not accessible. Also json way of configuring credentials is not a good practice. In the next post, I will explore into AWS S3 as the remote storage, AWS ECR for storing the docker images.

Complete code for this post can also be found here: Github

References