Kyligence Cloud – A Self- Managed Production Ready Platform
Architecting a Production-Ready Monitoring System For Kyligence Cloud
Introduction
In this article, we discuss the components of the Kyligence Cloud platform that enable it as a production-ready, self-managed distributed computing system.
While the core Kyligence engine is built upon Apache Kylin—a query accelerator offering sub-second response times at petabyte scale—this article focuses on the value-added features that make the solution robust, cost-effective, and enterprise-grade.
Kyligence Cloud Architecture
The following figure shows the architecture of Kyligence Cloud including the monitoring system.
However, readers of this blog will find lots of information about Kyligence Cloud Architecture, details of each component within Kyligence cloud on the Kyligence website, and existing blogs.
Monitoring and Alerting System
Kyligence Cloud provides out-of-the-box system health and performance monitoring. These applications feature customizable alert mechanisms to prevent performance degradation or system outages in mission-critical environments.
The Role of InfluxDB
Kyligence uses InfluxDB, a low-footprint time-series database, to store transactional events. It is deployed as an embedded component inside a Docker container on the manager node.
Configuration Details: Default connection definitions are stored in the
cloud.properties file:
Location: /data1/kyligence_cloud/conf/cloud.propertiesPro Tip: If you wish to use a central, company-wide monitoring system, you must update the IP address and port parameters in this file to point to your external database.
Default Kyligence Cloud configuration includes connection definition for InfluxDB server. Users can log in to the manager node and find cloud deployment configuration in the cloud.properties file inside /data1/kyligence_cloud/conf folder.
This configuration file is pre-populated with the default IP address and port numbers of the Influx database as shown below.
However, please note if users want to use their own database server as a single, integrated company-wide central monitoring system, they have to change these 2 above highlighted configuration parameters according to their environment.
High Availability (HA) Considerations
Another point to be noted here is – Kyligence Cloud offers HA (High Availability/Fail-Safe) deployment option. And in this mode of deployment, there will be 2 Kyligence Cloud manager nodes, each hosting a docker instance with the InfluxDB server.
In an HA/Fail-Safe deployment, Kyligence Cloud utilizes two manager nodes. Each hosts a Docker instance with InfluxDB.
Active Node: Records and stores all cluster network events.
Failover: If the standby manager node takes control, ensure the standby InfluxDB server becomes active simultaneously to maintain data continuity.
InfluxDB Health Check
To verify that your out-of-the-box InfluxDB is functioning correctly (checking for running states and populated tables), open a shell inside the Influx Docker instance and run your verification commands. You can also verify the active IP and port directly on the manager node.
Also for configuration purposes, you may verify the correct IP address and port number for the active InfluxDB server in the Kyligence Cloud manager node as follows.
Grafana Visualization Dashboard
To complete the monitoring stack, users can utilize the built-in Grafana server or point to an existing instance.
Quick Setup Guide:
Deployment: You can run Grafana in a dedicated Docker instance.
Networking: Ensure VPC/Firewall rules are configured for ports (Default: Port 3000 for HTTP).
Default Credentials: * User:
abcPassword: xyz
Built-in Dashboards
Kyligence provides ready-to-use JSON dashboards covering:
Cluster health and query execution.
Model building job monitoring.
Query latencies and usage statistics.
Metrics for Zookeeper, Azure, and AWS.
Actionable Alerts: A Practical Example
Users can define specific thresholds using the Grafana function library.
Scenario: Cluster Overload Monitoring
Condition: Check every 30 minutes.
Threshold: Observe for 5 minutes if transactions exceed 600 QPM (Queries Per Minute).
Action: Automatically trigger an alert.
Grafana supports dozens of channels, including Email and Slack. When the platform experiences an unexpected load, the system administrator is alerted immediately with a customized message.
Summary
By following the guidelines in this article, you can build a robust, real-time monitoring and alerting application using Kyligence Cloud’s built-in components.
Key Benefits:
- Following guidelines and examples provided in this article, users can build arobust, real-time monitoring and alerting application using Kyligence cloud’s built-in components very easily.
Avoid the high cost of building custom monitoring via REST APIs.
Essential for large organizations using Kyligence for business-critical production platforms.
Comments
Post a Comment