Publishing a Customized Docker Image with Apache Hop, Workflows, and Pipelines
Introduction
(https://saikatsrecentworks.blogspot.com/2026/03/custom-hop-docker.html), I discussed making the customized environment easily accessible. This post completes that effort by sharing the fully prepared Docker image.
Docker Image Contents
The purpose and implementation details of these components are explained in my earlier articles.
Included components:
- Python 3
- pip (Py3-pip)
- wget
- dpkg
- Chromium & Chromium Chromedriver
- Git
- Selenium and BeautifulSoup (bs4) for Python
Accessing the Docker Image
The complete image is available via GitHub Container Registry-
https://github.com/users/basusk/packages/container/package/hop_customized_docker
You can pull and run the image using:
docker pull ghcr.io/basusk/hop_customized_docker:latest
After pulling, start the container using the standard docker run command.
If you encounter any issues, feel free to reach out at basusk@comcat.net
Apache Hop Setup
Apache Hop (version 2.13) is installed inside the container at:
/opt/hop
Note: This is an older version. You can upgrade it in-place if needed.
All workflows and pipelines are located at:
/home/hop/EmailHandling
Hop Workflows and Pipelines
The main (parent) workflow is:
office365-workflow-main.hwf(https://github.com/basusk/hop/blob/main/office365_workflow_main.hwf)
All related child workflows, pipelines, shell scripts, and Python scripts are available inside docker container as well as in this repository:
👉 https://github.com/basusk/hop
Implementation Overview
As described in my earlier blog
(https://saikatsrecentworks.blogspot.com/2026/03/apache-hop-office365.html), the solution automates Office 365 data ingestion using a combination of Hop workflows, pipelines and supporting scripts.
The implementation is structured in multiple stages.
Some workflows also include server-specific versions, adapted for execution in Linux servers or Docker environments.
Getting Started
To use this setup:
- Pull and run the Docker image
- Login and execute main workflow inside docker via Hop Command line.
- Otherwise, clone the workflows repository locally ( https://github.com/basusk/hop )
- Configure required Azure and Office 365 credentials
- Substitute the correct values in email-attachment-download.properties file
- Execute the main workflows to test the end-to-end job
Final Notes
This setup is designed to reduce manual effort and enable end-to-end automation of Office 365 data ingestion workflows. Feedback and suggestions are always welcome.
Comments
Post a Comment