Publishing a Customized Docker Image with Apache Hop, Workflows, and Pipelines

Introduction

In my previous blog on Docker container customization
(https://saikatsrecentworks.blogspot.com/2026/03/custom-hop-docker.html), I discussed making the customized environment easily accessible. This post completes that effort by sharing the fully prepared Docker image.

Docker Image Contents

This Docker image is built on Alpine Linux and comes pre-installed with Java and Apache Hop. In addition, it includes all components required for my project: automated information ingestion from Microsoft Office 365 using Microsoft Graph APIs ( https://saikatsrecentworks.blogspot.com/2026/03/apache-hop-office365.html), powered by Python, Selenium, and Apache Hop.

The purpose and implementation details of these components are explained in my earlier articles.

Included components:

  • Python 3
  • pip (Py3-pip)
  • wget
  • dpkg
  • Chromium & Chromium Chromedriver
  • Git
  • Selenium and BeautifulSoup (bs4) for Python

Accessing the Docker Image

The complete image is available via GitHub Container Registry-

https://github.com/users/basusk/packages/container/package/hop_customized_docker

You can pull and run the image using:

        docker pull ghcr.io/basusk/hop_customized_docker:latest

    

After pulling, start the container using the standard docker run command.

If you encounter any issues, feel free to reach out at basusk@comcat.net

Apache Hop Setup

Apache Hop (version 2.13) is installed inside the container at: 

    /opt/hop

Note: This is an older version. You can upgrade it in-place if needed.

All workflows and pipelines are located at: 

    /home/hop/EmailHandling

Hop Workflows and Pipelines

The main (parent) workflow is: 

    office365-workflow-main.hwf

(https://github.com/basusk/hop/blob/main/office365_workflow_main.hwf)

All related child workflows, pipelines, shell scripts, and Python scripts are available inside docker container as well as in this repository:

👉 https://github.com/basusk/hop

Implementation Overview

As described in my earlier blog
(https://saikatsrecentworks.blogspot.com/2026/03/apache-hop-office365.html), the solution automates Office 365 data ingestion using a combination of Hop workflows, pipelines and supporting scripts.

The implementation is structured in multiple stages.

Some workflows also include server-specific versions, adapted for execution in Linux servers or Docker environments.

Getting Started

To use this setup:

  1. Pull and run the Docker image
  2. Login and execute main workflow inside docker via Hop Command line.
  3. Otherwise, clone the workflows repository locally ( https://github.com/basusk/hop )
  4. Configure required Azure and Office 365 credentials
  5. Substitute the correct values in email-attachment-download.properties file
  6. Execute the main workflows to test the end-to-end job

Final Notes

This setup is designed to reduce manual effort and enable end-to-end automation of Office 365 data ingestion workflows. Feedback and suggestions are always welcome.

Comments

Popular posts from this blog

Customizing an Apache Hop Docker Container

Secured Information Ingestion following Azure Entra OAuth2 from Office365

Automating Kyligence Index Recommendation Feature After Analysing Pushdown Queries