Self-Monitoring
Self-Monitoring is a module that includes a set of dashboards for centralized cluster health monitoring. It enables timely detection and resolution of cluster and data collection layer anomalies before failures occur.
Conventions
SM_INSTALLER- Directory where theSmart Monitorinstallation package is extractedUSER- System administrator user, typicallyadminOS_HOME- OpenSearch home directory, typically/app/opensearch/OSD_HOME- OpenSearch Dashboards home directory, typically/app/opensearch-dashboards/LOGSTASH_HOME- Logstash home directory, typically/app/logstash/SBM_HOME- Smart Beat Manager home directory, typically/app/smartBeatManager/SB_HOME- Smart Beat home directory, typically/app/smartBeat/
Useful Links
- Installing SM Master Node
- Installing SM Data Storage
- Installing SM Data Collector
- Installing Smart Beat Manager
- Installing Smart Beat for Linux
- Configuring Cross Cluster Search
Self-Monitoring Usage
The module consists of dashboards displaying various critical metrics of the target (monitored) cluster, enabling rapid response to any changes or issues.
Monitored components include:
- SM Data Storage: Tracks all processes related to cluster operations
- SM Data Collector: Monitors event ingestion and proper data collection
- SM Master Node: Monitors control components status and cluster coordination
Architecture
Self-monitoring can be deployed using two distinct architecture types:
- Deployment on a
dedicated monitoring cluster - Deployment within the
target cluster
Target cluster - the cluster being monitored for metrics.
Type I Architecture (Dedicated Monitoring Cluster)
We recommend deploying all self-monitoring server components on a single node, particularly for small to medium-sized solutions - this simplifies deployment and maintenance.
Minimum self-monitoring server specifications:
-
CPU: from 4 cores
-
RAM: from 16 GB
-
Storage: from 100 GB
Storage capacity depends on: Volume of collected data,number of monitored clusters, data retention period Parameters can be scaled up as needed.
In Type I architecture, the self-monitoring server is a fully isolated cluster comprising:
- SM Data Storage - Handles data storage and full-text search
- SM Dashboards - Provides data visualization
- SM Data Collector - Receives data from target cluster agents and polls via REST API
- SmartBeatManager - Manages agents installed on target clusters
Advantages
Deploying the self-monitoring server on a dedicated cluster offers key benefits:
-
Availability during target cluster outages
- The independent self-monitoring server remains operational even if the target cluster fails
-
Reduced target cluster load
- Offloading monitoring eliminates resource contention. Monitoring processes are resource-intensive by nature. Separate infrastructure ensures stable production cluster performance
Disadvantages
While this is the recommended default configuration, consider these factors:
-
Additional infrastructure resources
- Deploying a separate cluster requires a dedicated server or additional computing capacity. This demands more resources and may lead to additional costs.
-
Increased configuration complexity
- To get self-monitoring data from the target cluster, you need to configure the
cross-cluster searchmechanism (for configuration information, see useful links). This adds steps to the configuration process and requires special attention when maintaining the system
- To get self-monitoring data from the target cluster, you need to configure the
Type II Architecture (Primary Cluster)
In the second architecture type, all previously described operations occur within a single cluster where self-monitoring is deployed.
Advantages
-
No need for additional servers or their configuration
- Eliminates the requirement to allocate and configure extra resources for self-monitoring servers, potentially reducing infrastructure costs and simplifying deployment
-
Immediate access to self-monitoring data on the primary cluster
- Self-monitoring data becomes instantly available within the target cluster, facilitating easier access and analysis for system operators and administrators
Disadvantages
-
Self-monitoring unavailability during cluster failures
- If the primary cluster experiences a critical failure, the self-monitoring server also becomes unavailable, potentially complicating system status and performance monitoring
-
Additional load on the primary cluster
- Running self-monitoring on the primary cluster may impose extra resource demands, which could impact application or service performance
Data Collection
The data collected by self-monitoring can be categorized into two types:
- Log files
- Metrics and other statistics
Log Collection
Log collection is performed from all nodes of the SM Data Storage, SM Data Collector, and SM Master Node in the target cluster. Smart Beat agents are installed and configured on the target cluster hosts (installation instructions can be found in the useful links section). During configuration, the Smart Beat Manager address of the self-monitoring server is specified, which contains the necessary configurations for data collection and transmission agents. These components will be installed and running on the cluster hosts. For log collection, Filebeat is configured to collect the following logs:
For SM Data Storage hosts:
- Cluster logs
- sme logs
- sme-re logs
- job scheduler logs
For SM Data Collector hosts:
- SM Data Collector logs
- SM Data Collector pipeline logs
Metrics Collection
For collecting metrics from SM Data Storage, the http_poller input plugin in SM Data Collector is used. This plugin periodically polls specified REST endpoints.
In the self-monitoring pipeline templates, you can see that the http_poller plugin can send identical requests to all master nodes of the target cluster and then use the throttle filter plugin to filter duplicate responses. This ensures data retrieval even if one or more master nodes fail, as requests will be executed to the remaining operational nodes.
For collecting metrics from SM Data Collector, Metricbeat is used.
Configuration
The self-monitoring package includes scripts for pipeline generation, agent configurations, and automation of other necessary operations:
-
generate_pipelines.py: This script generates pipelines and agent configurations. It automatically creates the required elements for data collection and agent setup -
generate_opensearch_configs.py: This script generates Index State Management (ISM) policies, creates index templates, and copies dashboards. It can also connect to SM Data Storage and create corresponding policies, index templates, and indices when needed -
import_certs.sh: This script adds host certificates to the truststore, which is essential for establishing secure TLS connections when SM Data Collector accesses target hosts via API
Configuration file
All the scripts mentioned above extract the settings from the configuration file config.ini.
The fields of the configuration file are described in the table below.:
| Field | Parameter |
|---|---|
| rest | Data for loading ISM policies, index templates, and index creation in SM Data Storage (for the script
|
| ism | Data for configuring the ISM policy for indexes with self-monitoring data:
|
| index_templates | Data for creating index templates with self-monitoring data:
|
| poller | Data for the
|
| logstash | Configuration parameters of the SM Data Collector on the self-monitoring server:
|
| beats | Agent Configuration Settings:
|
Installation
The SM Master Node, SM Data Storage, SM Data Collector, and Smart Beat Manager components (installation information can be found in the Useful Links section) must be properly installed and configured before proceeding with this guide.
The selfmonitoring component is included in the base Smart package and is located in the${SM_INSTALLER}/utils/selfmonitoring/ directory. We recommend using Python, which is also included in the installer package and located at ${SM_INSTALLER}/utils/python/bin/python3.
Configuration Setup
Populate the config.ini configuration file with actual parameters.
Most parameters related to paths, polling settings, ISM, and index templates can remain unchanged. Required modifications typically involve IP addresses only. However, we strongly recommend carefully reviewing each parameter before use to ensure all values are correct and meet your system's current requirements.
Running Scripts
Generating Files for SM Data Storage
When executed without arguments, the script will generate ISM policies, index templates, and dashboards, saving them to the ${SM_INSTALLER}/utils/selfmonitoring/generated_data directory:
${SM_INSTALLER}/utils/python/bin/python3 ${SM_INSTALLER}/utils/selfmonitoring/generate_opensearch_configs.py
When executed with the --upload argument, the script will both generate files and upload the content to the self-monitoring SM Data Storage (also creating the actual indices):
${SM_INSTALLER}/utils/python/bin/python3 ${SM_INSTALLER}/utils/selfmonitoring/generate_opensearch_configs.py --upload
You may first run the script without the --upload flag to review the output, then rerun with --upload to upload to SM Data Storage after verifying the results.
Generating Files for SM Data Collector and Smart Beat Manager
Execute the ${SM_INSTALLER}/utils/selfmonitoring/generate_pipelines.py:
${SM_INSTALLER}/utils/python/bin/python3 ${SM_INSTALLER}/utils/selfmonitoring/generate_pipelines.py
After execution, the ${SM_INSTALLER}/utils/selfmonitoring/generated_data directory will contain new subdirectories with data for SM Data Collector and Smart Beat Manager.
Certificate Import
Execute the ${SM_INSTALLER}/utils/selfmonitoring/import_certs.sh:
cd ${SM_INSTALLER}/utils/selfmonitoring/ && chmod +x import_certs.sh
sudo -u logstash ./import_certs.sh
The script must be executed under the logstash user account.
Script execution flow:
- Creating a new
truststore: When launched, the script will prompt twice for a new password for the certificate store. Remember this password as it will be required in subsequent steps - Certificate retrieval: The script will connect to each host and retrieve certificates. When prompted
Trust this certificate? [no]:, answeryes - Password storage: The script will request passwords for
ts_pwdandos_pwdtokens. Forts_pwd: Enter the password created in step 1. Foros_pwd: Enter the SM Data Storage user password specified inconfig.ini
If authentication credentials differ between target and monitoring clusters, and a different token is specified in the logstash.pwd_token configuration field, add it manually on the monitoring cluster's SM Data Collector server:
sudo -u logstash ${LOGSTASH_HOME}/bin/logstash-keystore add <TOKEN_NAME>
-
If a truststore has already been created on the SM Data Collector at the time of the script launch, following the path specified in the configuration, the script will ask you for its current password once
-
If no keystore exists when running the script, you'll see:
The keystore password is not set... Continue without password protection on the keystore? [y/N]Answery. For initialized keystores. With password: Script will prompt for it. Without password: No additional input required
Deploying Generated Files
After completing previous steps, all required files (build artifacts) will be available in ${SM_INSTALLER}/utils/selfmonitoring/generated_data Below directory references are relative to this location unless absolute paths are specified.
SM Data Storage Configuration
Create ISM policies and index templates using files in the ism and index_templates directories respectively. Then create indices.
Skip this step if you automatically uploaded SM Data Storage configurations (by running generate_opensearch_configs.py with the --upload flag)
SM Data Collector Configuration
- Copy pipelines from
${SM_INSTALLER}/utils/selfmonitoring/generated_data/logstash/pipelines/to${LOGSTASH_HOME}/config/conf.d/ - Transfer scripts from
${SM_INSTALLER}/utils/selfmonitoring/generated_data/logstash/scripts/to the directory specified in thelogstash.scripts_pathconfiguration parameter (default:${LOGSTASH_HOME}/config/conf.d/scripts/) - Append the contents of
pipelines.ymlto${LOGSTASH_HOME}/config/pipelines.yml - Modify ownership of transferred files
After completing the steps, restart the SM Data Collector service:
sudo chown -R logstash:logstash ${LOGSTASH_HOME}/
sudo systemctl restart logstash.service
Configuration Smart Beat Manager
-
Copy contents from
${SM_INSTALLER}/utils/selfmonitoring/generated_data/sbmto${SBM_HOME}/apps/ -
Download required binaries: filebeat and metricbeat (contact technical support if download issues occur), place them in
${SBM_HOME}/binaries/ -
Place them in
${SBM_HOME}/etc/serverclasses.yml. To include the following groups:linux_selfmonandlinux_selfmon_logstash:
groups:
- name: linux_selfmon
filters:
- "<IP 1st node opensearch>"
- "<IP 2nd node opensearch>"
...
- "<IP N-th node opensearch>"
apps:
- filebeat_selfmon
binaries:
- filebeat-oss-8.9.2-linux-x86_64.tar.gz
- name: linux_selfmon_logstash
filters:
- "<IP 1-st logstash>"
- "<IP 2-nd logstash>"
...
- "<IP N-th logstash>"
apps:
- filebeat_logstash
- metricbeat_logstash
binaries:
- filebeat-oss-8.9.2-linux-x86_64.tar.gz
- metricbeat-oss-8.9.2-linux-x86_64.tar.gz
Pay special attention to the filters and binaries sections. In filters: For the first group: Specify IP addresses of all target cluster nodes. For the second group: Specify IP addresses of all Logstash instances in the target cluster In the binaries section: Verify the archive names match those you downloaded to ${SBM_HOME}/binaries/.
After completing these steps, restart the Smart Beat Manager service:
sudo systemctl restart smartBeatManager.service
Dashboard Deployment
Smart Beat must be installed on all SM Data Collector, SM Data Storage, and SM Master Node instances in the target cluster (installation instructions available in Useful Links). Configure Smart Beat to use the monitoring cluster's Smart Beat Manager server.
- In the monitoring web interface: Navigate to
Dashboards(Main menu-General-Dashboards) Create new dashboards using JSON files from thedashboardsdirectory - Update node selection filters in dashboards with current cluster data
- In the
Module Settingssection, go toIndex Templates(Main Menu-System Settings-Module Settings-Index Settings Templates) and create templates similar to index aliases (clusterstats,clusterhealth, etc.)
The self-monitoring installation is now complete.
For dedicated monitoring cluster architectures, configure cross-cluster search (configuration details available in Useful Links) to enable queries from the target cluster.