lookup
Description
Allows retrieving data from a predefined lookup table. The command compares specified fields in the event and the lookup table. Upon a full match, the event will be enriched with the specified fields from the lookup table.
Syntax
lookup [dedup=<boolean>] [<execution-type>] [system=<boolean>] <lookup-name> ( <lookup-field> [AS <event-field>] )... [ OUTPUT | OUTPUTNEW (<lookup-destfield> [AS <event-destfield>] )... ]
Required Arguments
| Parameter | Syntax | Description |
|---|---|---|
lookup-name | <lookup-name> | The name of the predefined lookup. |
Optional Arguments
| Parameter | Syntax | Default | Description |
|---|---|---|---|
dedup | dedup=<boolean> | false | Indicates whether duplicates should be removed from the data compared to the lookup. |
<execution-type> | type=ELK [packsize=<integer>] | FAST | DEFAULT | Specifies the execution type of the command. The ELK type indicates that the command is executed by Elasticsearch/OpenSearch, with a default packsize of 10,000. The FAST type represents an optimized, faster execution version of the command in DEFAULT mode. |
system | system=<boolean> | false | When set to true, a system lookup is performed; otherwise, a user lookup is used. |
lookup-field | <string> | Several fields can be specified, separated by spaces. | |
event-field | <string> | The name of the field in the main result for comparison with the lookup-field. This is used when lookup-field and event-field have different names. | |
lookup-destfield | <string> | The name of the resulting field with obtained data. | |
OUTPUT | OUTPUTNEW | OUTPUT | OUTPUTNEW | OUTPUT | The OUTPUT indicator means that existing data in the source query field will be overwritten by the lookup data. The OUTPUTNEW indicator means that data will be overwritten only if the source query field is empty but has corresponding data in the lookup. |
Execution Modes
DEFAULT- The default mode. Recommended for use only with a small incoming event stream and small lookupsFAST- A version of the standard algorithm optimized for speed. Recommended for use in most casesERGO- A version of the standard algorithm optimized for RAM usage. Recommended for use when comparing across multiple fieldsSOURCE- Performs matching using the storage engine's capabilities. Additionally, you can specify thepacksizevalue, which determines the batch size for queries to the storage
When loading lookups into RAM in FAST and ERGO modes, the user-defined Data Request Limit is applied. If the lookup size exceeds this limit, you can increase it or use the SOURCE mode.
SOURCE ModeFor the SOURCE execution type, it is recommended not to set a high max_initial_query_len. The packsize parameter indicates how many events will be processed by the engine per query in each iteration. For example, if max_initial_query_len = 10,000 and packsize = 5,000, then during one iteration of 10,000 events, two queries (max_initial_query_len/packsize) will be sent. If max_initial_query_len is greater than 20,000, it is recommended to run the command with a packsize parameter two or three times smaller than max_initial_query_len.
Working with Multivalue
Multivalue is the ability for an attribute to have multiple values instead of a single one. A multivalue can be thought of as a list of values. Examples of multivalue include a list of user roles.
When the lookup command is executed, both fields from search results (multivalue on the left) and fields from lookup tables (multivalue on the right) can be of the multivalue type. Thus, there are three cases where the lookup command involves multivalue:
- multivalue on the left
- multivalue on the right
- multivalue on both left and right
Multivalue Value on the left
A value from the lookup table is matched with the search result if the value from the lookup table is contained within the multivalue value of the search result.
| Left Value | Right Value | Value Intersection | Match |
|---|---|---|---|
[1, 2, 3, 4, 5] | 3 | 3 | Yes |
[1, 2, 3, 4, 5] | 6 | - | No |
Multivalue Value on the Right
A value from the lookup table is matched with the search result if the search result is contained within the multivalue value from the lookup table.
| Left Value | Right Value | Value Intersection | Match |
|---|---|---|---|
3 | [1, 2, 3, 4, 5] | 3 | Yes |
6 | [1, 2, 3, 4, 5] | - | No |
Multivalue Value on Both Left and Right
A value from the lookup table is matched with the search result if the intersection of the multivalue values on the left and right is not empty.
| Left Value | Right Value | Value Intersection | Match |
|---|---|---|---|
[1, 2, 3, 4, 5] | [3] | [3] | Yes |
[1, 2, 3, 4, 5] | [1, 2, 5] | [1, 2, 5] | Yes |
[1, 2, 3, 4, 5] | [1, 2, 6] | [1, 2] | Yes |
[1, 2, 3] | [4, 5, 6] | - | No |
[1, 2, 3] | [4] | - | No |
Query Examples
source tables
| search database = "hr"
| lookup my_click_db name as database OUTPUT engine
In the first example, for each incoming event, the field database from the event is matched against the field name in each row of the reference table. If these values match, a field engine will be added to the event. Its value will be taken from the corresponding row of the reference table. If the event already had an engine field, it will be overwritten with the new value.
source tables
| search database = "hr"
| lookup type=SOURCE packsize=5000 my_click_db name as database OUTPUTNEW engine
In the second example, a similar matching is performed in SOURCE mode. This means the matching will be executed by the storage engine. The parameter packsize=5000 specifies that queries to the storage will be sent in batches of 5000. The OUTPUTNEW option specifies that the engine field will not be overwritten if it already exists in the event.