Skip to main content
Version: 6.0

Basics of Smart Monitor Language (SML)

Smart Monitor Language (SML) is a specialized query language for the Smart Monitor platform, designed for searching, processing, and analyzing machine data.

SML allows describing analytical queries as a sequence of commands combined into a single data processing pipeline. This approach ensures query clarity, reusability of analytical scenarios, and a unified way of working with various data sources.

The purpose of this article is to provide an overview of Smart Monitor Language syntax and basic principles of working with it, as well as to show how to write correct and optimized SML queries. The article covers basic language constructs, typical commands and query patterns, common usage scenarios, and practical recommendations for performance improvement and avoiding typical errors.

Architectural Model and SML Operation Principles

Smart Monitor Language is built on a pipeline data processing model. An SML query represents a sequence of commands separated by the vertical bar symbol (|), where each command takes a set of events as input and passes the result of its processing to the next command. This architecture ensures declarative description of analytical scenarios and allows combining search, filtering, transformation, and data aggregation operations in a single query.

SML Command Structure

Each command in SML has a strict structure: an operation name followed by its arguments and/or parameters.

General Command Template

<command> [positional arguments] [parameters] [clauses/expressions]

Key elements:

  • command — operation name (source, search, peval, aggs, join, etc.)

  • positional arguments — mandatory or optional values that follow immediately after the command name.
        Example: source nginx-1 — here the source name nginx-1 is a positional argument

  • named parameters — settings in key=value format that define command behavior.
        Example: source nginx-1 qsize=5000 — parameter qsize limits the sample size

  • keywords and clauses — syntax constructs specific to particular commands.
        Example: by in aggs count by field, on in join ... on a=b, span in timeaggs span=1h

  • expressions — logical conditions or mathematical formulas.
        Example: search status >= 500 in the search command or count * 100 in eval

  • subqueries — nested pipelines enclosed in square brackets [...]. Used in data matching commands such as join or append

Writing Rules and Data Types

To ensure correct interpretation of the query by the SME (Smart Monitor Engine) analytical engine, the following rules must be observed:

  1. String values containing spaces or special characters must be enclosed in single (' ') or double (" ") quotes. Otherwise, quotes are optional

  2. Numeric values (100, 0.5) and boolean constants (true, false) are written without quotes

  3. Dot notation is used to access nested structures (e.g., user.agent.browser)

  4. Command names are case-insensitive, but it is recommended to use lowercase for query text consistency

Basic Syntax and Query Structure

Any SML query is built according to strict rules that determine how data will be extracted and processed. Understanding query structure helps avoid syntax errors and write readable query text.

Query Structure

A query always starts with defining the data source (generation or loading command), followed by a pipeline chain.

The structure of a typical SML query is presented below:

source log-index qsize=5000  /* 1. Data loading */
| search log_level="error" /* 2. Filtering */
| peval host=lower(host) /* 3. Transformation/Calculation */
| aggs count by host /* 4. Aggregation */

This query loads data from the log-index source, filters errors, converts the host field value to lowercase, and performs aggregation of event counts by each host.

info

Smart Monitor does not have a strict requirement to create a new pipeline (|) for each command of the same type (for example, in peval multiple calculations can be listed separated by commas), however, for better query readability, it is recommended to place each command on a separate line.

Comments

SML supports multi-line comments in C programming language style. They are ignored during query execution and are extremely useful for documenting complex logic.

/* This block filters only successful requests
and aggregates event count by response code
*/
| search status < 400
| aggs count by status

Data Sources

Smart Monitor, thanks to the Search Anywhere concept, allows searching in various sources. Currently, four types of storage are supported:

  • OpenSearch
  • Elasticsearch
  • ClickHouse
  • Hadoop

Regardless of whether a search, analytical, or distributed storage is used, the basic SML query structure remains unchanged. This allows transferring analytical scenarios between data sources without changing the overall query logic.

To extract from OpenSearch and Elasticsearch, the source command is used, followed by the index name or index pattern, for example:

source employee_list-0073
source employee_list

To get data from ClickHouse, the clicksource or source command is used. Data access follows the scheme db_name.table_name, for example:

clicksource 'hr.employee_list'
| search status="Dismissed"

To extract from Hadoop, the hdhsource or source command is used. The data access scheme is similar to ClickHouse:

hdhsource 'hr.employee_list'
| search status="Dismissed"

When working with OpenSearch, Smart Monitor allows combining data from multiple sources in one query. The data merging mode when querying multiple sources is configured by the optional append parameter after the source command:

source sm_cs_iam_indexes, sm_servers_indexes append = true
tip

When escaping special characters in the source name, single quotes can be used, for example:

source '*-antifraud-event-*'
tip

The * symbol in the source name works as a mask and allows querying multiple indexes at once. For example:

source .smos_internal-*

This query will return data from all indexes of the form .smos_internal-2026.12, .smos_internal-2026.13, etc.

More information about the source command can be found in the documentation article.

After specifying the data source, data search, filtering, and aggregation are performed.

Working with Dictionaries

Smart Monitor provides a lookup mechanism that, in addition to enriching events with additional information from other sources, can be used as a standalone data source.

To load data from a dictionary in SML, the inputlookup command is used. It allows reading the contents of a dictionary and including it in the data processing pipeline.

Example of loading and using data from a dictionary:

| inputlookup whitelist_lookup 
| aggs count(user_ip) as ip

Generating Test Data

The makeresults command is used to create an artificial set of events directly in an SML query, without accessing external data sources. It generates one or more empty events that can then be supplemented with fields using calculation and transformation commands.

| makeresults
| eval numbers = {"5", "9", "7"}
| stats max(numbers) as maximum

This example will return the maximum number from the multivalue field numbers.

Time Parameters

Search results in Smart Monitor are aggregated over the time interval specified in the filter on the search page, or in the earliest and latest parameters of the source command. Results are sorted by timestamp value in descending order. Based on the obtained data, a histogram is built for the specified period.

Note!

The histogram is displayed only if an index template is configured for the index. If the index template is not configured but there is a need to get time-distributed events, then immediately after source the name of the field with the timestamp must be explicitly specified — using the timefield argument, for example:

source nginx-logs-* timefield=@timestamp 
| timeaggs count

If the timestamp field is not specified and the template is not configured, Smart Monitor will display the number of found documents, but the events themselves will not be displayed. For example, the query source .smos_incident-* will show N documents were found, but the event list will remain empty. To display events, the correct time field must be specified: source .smos_incident-* timefield=search_params.earliest_time.

An example of executing a search query and building a histogram is shown in the image below.

Detailed information about working with time ranges in Smart Monitor is contained in the corresponding documentation section.

qsize parameter

An important parameter often used when executing search queries is the qsize parameter, which limits the volume of data processed in memory when executing a query. For example, an SML query of the following form:

source internal_audit-* qsize=2000

will limit the search result to the first 2000 events.

Attention!

Increasing the qsize parameter increases the load on the system, particularly increasing RAM consumption.


Command Execution Levels in Smart Monitor

Based on the execution level, commands in Smart Monitor are divided into two main groups:

  1. Commands that perform operations on data using internal storage mechanisms

  2. Commands executed "in memory" using the SME (Smart Monitor Engine) data processing engine

The first type of commands has an advantage over the second in execution speed due to the use of internal storage mechanisms. However, their application is limited: such commands are allowed in a query only if before them in the pipeline there are only data extraction commands from sources and no commands executed "in memory". These include source, search, peval, timeaggs, and aggs.

Note!

When using commands executed "in memory", query results are limited to the first thousand. The volume of processed data can be increased by configuring the qsize parameter for the source command.


Query Construction Features for text and keyword Data Types

In OpenSearch and Elasticsearch data sources, string fields are often indexed in two ways simultaneously. This is called the multi-field mechanism. To work effectively in SML, it is important to distinguish their purpose.

Field TypeProcessing MechanismRecommended SML Usage
text (base field)Split into individual tokens (words).Full-text search, substring search, and partial matches.
keyword (suffix .keyword)Stored as a single unbreakable string.Exact match, aggregation, grouping.

The choice between the base field name and its version with the .keyword suffix directly determines the strictness of data filtering when using the search command. In addition, .keyword fields create less search load: unlike text, they do not undergo full-text analysis and are processed as a single string.

Note!

If a field is already defined in the mapping with type keyword, the .keyword suffix should not be added — such a sub-field does not exist, and the query will return an empty result or an error. The .keyword suffix is relevant only for text fields that have a multi-field.

For example, a query of the following form will return documents containing the word memory in the event.original field:

source linux-logs-*
| search event.original="memory"

At the same time, the following query will return documents where the event.original field contains only the word memory and nothing else:

source linux-logs-*
| search event.original.keyword="memory"

Macros

Macros are reusable SML query fragments that simplify writing complex queries and standardize analytical logic. In cases where the same long expression or complex filter is frequently used, they can be saved as a macro and called by a short name.

To insert a macro into a query, the backtick symbol is used. When executing a query, Smart Monitor automatically replaces the macro name with its full content.

source nginx-*
| `filter_errors` /* Calling a macro containing filtering logic */
| aggs count by host

Filtering and Logical Conditions

The search command in SML is used to search and filter data. Search conditions are specified in the field=value scheme.

Example of executing a search:

source internal_audit-* 
| search log_level="info"

SML supports AND, OR, NOT operators. By default, the AND operator acts between conditions in search.

The Smart Monitor query language supports Wildcards. For commands such as eval, where, like, the % symbol is used. For the search command, the * symbol is used. Regular expression search via regex and subnet mask search (cidr) are also supported.

In addition to field search, Smart Monitor also provides full-text search. For example, the query:

source internal_audit-* 
| search "warning"

will perform a search for the keyword warning in all fields of all documents in the internal_audit-* index.

The prefix ~ before quotes (~"phrase") searches for individual words without full matching. For example, the following query:

source internal_audit-* 
| search ~"getting access"

will search for all documents containing the words getting and/or access in any of the text fields of the documents.

More information about the full-text search mechanism in Smart Monitor can be found in a separate article in the documentation.

where

The where command is also used to filter data in search queries. Conditions are written in comparison format similar to most programming languages, for example, where count>5 or where user=="admin".

Note!

In SML, where uses double equality == unlike single = in search.

The where command supports functions such as like(), match(), cidrmatch(), in(), and others.

Operator Priority

In search, the AND operator is executed before OR, in where it is the opposite, so for complex conditions it is recommended to use parentheses, for example:

| search host="mail" AND NOT (code="4625" OR code="4624")

Comparison Operators and Quotes

SML supports standard operators: =, !=, <, >, <=, >=. When comparing numbers, ordinary arithmetic works, when comparing strings — full value matching. To search for part of a substring in a search query, wildcard symbols should be used. Search values must be enclosed in double quotes (") if they contain separators or special characters. The backslash (\) in search queries must be escaped with a second backslash, for example:

| search app_path="C:\\Windows\\cmd.exe"
tip

Detailed information about building expressions in SML queries is contained in a special documentation section Expressions.


Aggregation and Statistics

aggs and stats

Smart Monitor uses two commands for performing aggregating calculations aggs and stats. The aggs command, being a command that performs operations on data using internal storage mechanisms, has an advantage over stats in speed and volume of processed data. aggs has most of the functions implemented in stats, including mathematical operations (max, min, sum, perc), operations for working with array data (count, values, earliest), and other functions. To process an unlimited number of unique values, the composite=true parameter can be used.

timeaggs and timechart

Commands timechart and timeaggs in Smart Monitor perform the same function — data aggregation over time. Similar to aggs and stats commands, these commands are executed at different levels - timeaggs at the storage level, and timechart in memory, so timeaggs executes faster and is preferable in most scenarios. The timeaggs command has similar functionality to timechart and includes such mandatory functions as count, avg, dc, etc., as well as some optional arguments (span, timefield, limit, etc.).

eventstats and streamstats

Smart Monitor also provides commands eventstats and streamstats for performing statistical operations. If eventstats saves results in a new field, then streamstats works with data in a streaming manner.


Field Transformation and Calculation

peval and eval

The peval command is an analog of the eval command, but works using internal storage mechanisms.

eval in Smart Monitor implements a large number of functions for transforming and calculating data, including working with time, cryptographic, mathematical, and other operations.

Example of using peval/eval commands:

| peval agent = agent.keyword + port 

In the above example, a new field agent will be created as a concatenation of the fields agent.keyword and port.

Note!

An important feature of the eval command in SML is the impossibility of using the command in statistical expressions, such as | stats count(eval(status="404")) AS error_count. In Smart Monitor, peval and eval commands are standalone and cannot be embedded as arguments in other commands — except for subqueries in square brackets [...].

Other Processing Commands

In addition to the listed ones, Smart Monitor implements other data processing commands. Among them, for example, the following commands:

  • fields – keeps/excludes fields

  • dedup – removes duplicates by fields

  • sort - performs data sorting

  • head - returns the first N query results

  • mvexpand – expands multivalue fields into multiple events

  • rename – renames fields


SML Query Optimization

The performance of SML queries in Smart Monitor largely depends on the order of commands and the volume of data processed at each stage of the pipeline. The main optimization principle is maximum early data filtering — search conditions should be specified immediately after indicating the source to reduce the number of events before performing subsequent operations.

The second important factor in query optimization is the use of commands executed at the storage level (search, peval, aggs and timeaggs), as they use internal data source mechanisms and work significantly faster than commands executed in the analytical engine memory. This is especially important for aggregations and working with large volumes of data.

tip

Detailed recommendations and examples of SML query optimization are provided in the article SML Optimization Tips.


Diagnostics and Debugging

Effective use of Smart Monitor Language assumes mastery of diagnostic tools and query debugging methods that allow localizing errors in pipeline chains and minimizing excessive load on platform resources.

Typical SML Query Errors

When executing queries in Smart Monitor, errors can occur at different levels: syntactic (query parsing), semantic (condition logic), or performance (system resources).

Syntactic Errors

  1. Absence of vertical bar (|) for a new command

Example:

source winevents
search agent.id = "5436647"
  1. Unclosed parentheses in the query. If a parenthesis is opened but not closed in the query, a syntactic error will occur

Example:

source sm_users
| peval user.name=mvindex(split(user.name, "@", 0)
  1. Incorrect use of quotes or special characters. Strings with spaces or special characters must be in quotes (" " or ' '). The backslash (\) requires escaping (\\).
source *-internal*
| search user_name = Ivan Ivanov
  1. Typos in command or parameter names

Semantic Errors

  1. Using the wrong comparison operator. The most common error is using the = operator in the where command
| where status = 500
  1. Incorrect expectations from full-text search. Using the search command without specifying a field performs full-text search on all text fields, which can lead to a large number of irrelevant results
| search "error"

In most cases, it is preferable to explicitly specify the search field.

Errors of this class occur when using field types incorrectly, especially when working with OpenSearch and Elasticsearch.

  1. Aggregation by text type fields

Fields of type text are designed for full-text search and are not suitable for aggregations.

Note!

However, this approach is justified only for fields with a limited set of values (status, type, host, etc.). Fields such as message or event.original, containing arbitrary text, are better split into specific fields at the parser level — aggregation by them via .keyword can lead to performance problems.

| aggs count by message

For aggregations, a field with the .keyword suffix should be used.

  1. Comparing strings and numbers without type conversion

Comparing string values with numeric conditions can lead to incorrect results or an empty selection.

Smart Monitor executes commands at different levels — at the storage level or in the analytical engine memory. Violating the sequence of applying commands of different execution levels leads to errors and incorrect results.

  1. Using storage-level commands after in-memory commands

If a command working in memory is used before a storage-level command, the operation will not be executed:

| eval host_lc = lower(host)
| aggs count by host_lc

In this case, using the eval command excludes the use of the aggs command working at the storage level. The optimal option is to use peval instead of eval. An alternative, but not the best option, is to use stats after eval.

  1. Problems with data volume limitation

When using in-memory commands, query results are limited to the first thousand events by default, unless the qsize parameter is explicitly specified in the query.

Errors Due to Performance Degradation

When executing heavy queries with an excessively high value of returned documents in the qsize parameter and complex aggregations, the following errors may occur:

  • Failed to execute phase [fetch] — error at the stage of obtaining documents from OpenSearch/Elasticsearch shards when trying to load a large number of results into memory

  • Command RAM limit exceeded — the query exceeded the permissible RAM limit for execution

  • Request failed with status code 502 — gateway error, may mean that the server did not have time to process a heavy query or the connection was broken due to overload or timeout

  • Search cancelled - search timeout exceeded — query execution was interrupted due to exceeding the permissible execution time

Another cause of performance degradation not directly related to the query is mapping explosion: a situation where an index contains too many dynamically created fields. In this case, slow query performance is a consequence of a problem at the index level, not the SML query.

Step-by-Step Debugging

Since an SML query is executed sequentially, the error is easiest to find by checking each pipeline stage separately. If a written query does not work or works differently than expected, it is worth using the sequential logic building method.

  1. Source Validation

The initial diagnostic stage involves checking the availability of data in the selected source and time interval. For this, a basic source command is executed without additional filters and transformations:

source nginx-logs-*

If there are no results, the correctness of the index name (pattern) and search time range settings should be checked.

  1. Sequential Pipeline Building Method

Error localization is achieved by sequentially adding commands to the query. Intermediate results are checked after each new stage:

  • filtering and initial selection (search, where). The accuracy of event inclusion in the sample is checked. Errors at this level lead to an empty result or data redundancy

  • extracting data from unstructured fields (rex, spath). When working with logs or JSON, the correctness of regular expressions and parsers is checked. If a field is not extracted at this stage, all subsequent commands will not be able to access it

  • field transformation and calculation (eval / peval, rename). The correctness of mathematical operations, string concatenation, type conversion, etc. is checked

  • data enrichment (lookup, join, append). The correctness of matching with external dictionaries is checked. At this stage, mismatches in field names and types in the main stream and dictionary are identified

  • aggregation operations and statistics calculation (aggs / stats, timeaggs / timechart). The correctness of grouping and function accuracy is checked. Errors are often related to incorrect function selection (for example, count instead of dc)

  • data set manipulation (dedup, sort, mvexpand). The correctness of changing the event set structure is checked: removing duplicates, sorting, or expanding multivalue fields

  • final formatting (table, fields). The composition of returned data is checked at the final stage

If the query stops working correctly at a certain stage, this allows unambiguously determining the problematic command.

  1. Using Comments to Isolate Blocks

To temporarily exclude parts of a query without deleting them, multi-line comments /* ... */ are used. This allows limiting query execution to certain commands:

source internal_audit-* 
| search status="error"
/* | eval error_code = upper(code)
| aggs count by error_code
*/

In this example, execution will end after the search command, which will allow seeing the intermediate set of events.

  1. Limiting Query Results

When working with a large number of fields to debug values in only certain ones, it is recommended to limit the field set with fields and table commands.

source users
| peval full_name = first_name + " " + last_name
| table @timestamp, first_name, last_name, full_name

Query Statistics

Smart Monitor allows analyzing statistics of executed queries. To display query statistics, the statistics display mode (INFO or DEBUG) must be selected and the line with summary information about the number of found documents and query execution time must be clicked.

Query statistics in DEBUG mode provides information about command execution duration, command type, number of documents in processing, size of processed documents in megabytes, and the amount of RAM allocated for command execution.

This information helps understand at which stage query execution slows down and resource usage increases.

tip

The INFO mode is used for a general assessment of search progress, providing a lower degree of data detail compared to the extended DEBUG mode.

Query Performance Monitoring

The Self-Monitoring module provides a "Query Performance Monitoring" dashboard with informative visualizations about query execution phases and efficiency. Based on this data, query performance in Smart Monitor can be evaluated.