Azure Data Explorer Storage

-->

Azure Data Explorer Cold Storage
Azure Data Explorer Storage Costs
Azure Data Explorer Table Storage

Send data directly to Azure Sentinel and ADX in parallel. You may want to retain any data with security value in Azure Sentinel to use in detections, incident investigations, threat hunting, UEBA, and so on. Keeping this data in Azure Sentinel mainly benefits Security Operations Center (SOC) users, where typically, 3-12 months of storage are enough.
Azure Data Explorer integrates with other major services to provide an end-to-end solution that includes data collection, ingestion, storage, indexing, querying, and visualization. It has a pivotal role in the data warehousing flow by executing the EXPLORE step of the flow on terabytes of diverse raw data.

Azure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. Azure Data Explorer offers ingestion (data loading) from Event Hubs, IoT Hubs, and blobs written to blob containers.

Combine internal data with partner data for new insights. Share and receive data in any format to or from Azure Synapse Analytics, Azure SQL Database, Azure Blob Storage, Azure Data Lake Storage and Azure Data Explorer. Data Share will support more Azure data stores in the future. See supported data stores. The Azure Data Factory (ADF) team has created an extension for Storage Explorer. This extension allows you to leverage ADF's ability to perform high throughput, server-side transfers from Amazon S3, Azure Blob Storage, or Azure Data Lake Gen2 Storage to Azure Blob Storage or Azure Data Lake Gen2 Storage.

In this article, you learn how to ingest blobs from your storage account into Azure Data Explorer using an Event Grid data connection. You'll create an Event Grid data connection that sets an Azure Event Grid subscription. The Event Grid subscription routes events from your storage account to Azure Data Explorer via an Azure Event Hub. Then you'll see an example of the data flow throughout the system.

For general information about ingesting into Azure Data Explorer from Event Grid, see Connect to Event Grid. To create resources manually in the Azure portal, see Manually create resources for Event Grid ingestion.

Prerequisites

An Azure subscription. Create a free Azure account.
A cluster and database.
A storage account.
- Event Grid notification subscription can be set on Azure Storage accounts for BlobStorage, StorageV2, or Data Lake Storage Gen2.

Create a target table in Azure Data Explorer

Create a table in Azure Data Explorer where Event Hubs will send data. Create the table in the cluster and database prepared in the prerequisites.

In the Azure portal, under your cluster, select Query.
Copy the following command into the window and select Run to create the table (TestTable) that will receive the ingested data.
Copy the following command into the window and select Run to map the incoming JSON data to the column names and data types of the table (TestTable).

Create an Event Grid data connection in Azure Data Explorer

Now connect the storage account to Azure Data Explorer, so that data flowing into the storage is streamed to the test table.

Under the cluster you created, select Databases > TestDatabase.
Select Data ingestion > Add data connection.

Data connection - Basics tab

Select the connection type: Blob storage.

Fill out the form with the following information:

Setting	Suggested value	Field description
Data connection name	test-grid-connection	The name of the connection that you want to create in Azure Data Explorer.
Storage account subscription	Your subscription ID	The subscription ID where your storage account is.
Storage account	gridteststorage1	The name of the storage account that you created previously.
Event type	Blob created or Blob renamed	The type of event that triggers ingestion. Blob renamed is supported only for ADLSv2 storage. Supported types are: Microsoft.Storage.BlobCreated or Microsoft.Storage.BlobRenamed.
Resources creation	Automatic	Define whether you want Azure Data Explorer to create an Event Grid Subscription, an Event Hub namespace, and an Event Hub for you. To create resources manually, see Manually create resources for Event Grid ingestion

Select Filter settings if you want to track specific subjects. Set the filters for the notifications as follows:
- Prefix field is the literal prefix of the subject. As the pattern applied is startswith, it can span multiple containers, folders, or blobs. No wildcards are allowed.
  - To define a filter on the blob container, the field must be set as follows: /blobServices/default/containers/[container prefix].
  - To define a filter on a blob prefix (or a folder in Azure Data Lake Gen2), the field must be set as follows: /blobServices/default/containers/[container name]/blobs/[folder/blob prefix].
- Suffix field is the literal suffix of the blob. No wildcards are allowed.
- Case-Sensitive field indicates whether the prefix and suffix filters are case-sensitive
- For more information about filtering events, see Blob storage events.
Select Next: Ingest properties.

Data connection - Ingest properties tab

Fill out the form with the following information. Table and mapping names are case-sensitive:

Ingest properties:

Setting	Suggested value	Field description
Table name	TestTable	The table you created in TestDatabase.
Data format	JSON	Supported formats are Avro, CSV, JSON, MULTILINE JSON, ORC, PARQUET, PSV, SCSV, SOHSV, TSV, TXT, TSVE, APACHEAVRO, RAW, and W3CLOG. Supported compression options are Zip and GZip.
Mapping	TestMapping	The mapping you created in TestDatabase, which maps incoming JSON data to the column names and data types of TestTable.
Advanced settings	My data has headers	Ignores headers. Supported for *SV type files.

Note

You don't have to specify all Default routing settings. Partial settings are also accepted.

Select Next: Review + Create

Data connection - Review + Create tab

Review the resources that were auto created for you and select Create.

Deployment

Azure Data Explorer Cold Storage

Wait until the deployment is completed. If your deployment failed, select Operation details next to the failed stage to get more information for the failure reason. Select Redeploy to try to deploy the resources again. You can alter the parameters before deployment.

Generate sample data

Now that Azure Data Explorer and the storage account are connected, you can create sample data.

Upload blob to the storage container

We'll work with a small shell script that issues a few basic Azure CLI commands to interact with Azure Storage resources. This script does the following actions:

Creates a new container in your storage account.
Uploads an existing file (as a blob) to that container.
Lists the blobs in the container.

You can use Azure Cloud Shell to execute the script directly in the portal.

Save the data into a file and upload it with this script:

Note

To achieve the best ingestion performance, the uncompressed size of the compressed blobs submitted for ingestion must be communicated. Because Event Grid notifications contain only basic details, the size information must be explicitly communicated. The uncompressed size information can be provided by setting the rawSizeBytes property on the blob metadata with the uncompressed data size in bytes.

Rename blob

If you are ingesting data from ADLSv2 storage and have defined Blob renamed as the event type for the data connection, the trigger for blob ingestion is blob renaming. To rename a blob, navigate to the blob in Azure portal, right click on the blob and select Rename:

Ingestion properties

You can specify the ingestion properties of the blob ingestion via the blob metadata.

Note

Azure Data Explorer won't delete the blobs post ingestion.Retain the blobs for three to five days.Use Azure Blob storage lifecycle to manage blob deletion.

Review the data flow

Note

Azure Data Explorer has an aggregation (batching) policy for data ingestion designed to optimize the ingestion process.By default, the policy is configured to 5 minutes.You'll be able to alter the policy at a later time if needed. In this article you can expect a latency of a few minutes.

In the Azure portal, under your event grid, you see the spike in activity while the app is running.
To check how many messages have made it to the database so far, run the following query in your test database.
To see the content of the messages, run the following query in your test database.
The result set should look like the following image:

Clean up resources

If you don't plan to use your event grid again, clean up the Event Grid Subscription, Event Hub namespace, and Event Hub that were auto-created for you, to avoid incurring costs.

In Azure portal, go to the left menu and select All resources.
Search for your Event Hub Namespace and select Delete to delete it:
In the Delete resources form, confirm the deletion to delete the Event Hub Namespace and Event Hub resources.
Go to your storage account. In the left menu, select Events:
Below the graph, Select your Event Grid Subscription and then select Delete to delete it:
To delete your Event Grid data connection, go to your Azure Data Explorer cluster. On the left menu, select Databases.
Select your database TestDatabase:
On the left menu, select Data ingestion:
Select your data connection test-grid-connection and then select Delete to delete it.

Next steps

-->

By default, logs ingested into Azure Sentinel are stored in Azure Monitor Log Analytics. This article explains how to reduce retention costs in Azure Sentinel by sending them to Azure Data Explorer (ADX) for long-term retention.

Storing logs in ADX reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.

About Azure Data Explorer

ADX is a big data analytics platform that is highly optimized for log and data analytics. Since ADX uses Kusto Query Language (KQL) as its query language, it's a good alternative for Azure Sentinel data storage. Using ADX for your data storage enables you to run cross-platform queries and visualize data across both ADX and Azure Sentinel.

For more information, see the ADX documentation and blog.

When to integrate with ADX

Azure Sentinel provides full SIEM and SOAR capabilities, quick deployment and configuration, as well as advanced, built-in security features for SOC teams. However, the value of storing security data in Azure Sentinel may drop after a few months, once SOC users don't need to access it as often as they access newer data.

If you only need to access specific tables occasionally, such as for periodic investigations or audits, you may consider that retaining your data in Azure Sentinel is no longer cost-effective. At this point, we recommend storing data in ADX, which costs less, but still enables you to explore using the same KQL queries that you run in Azure Sentinel.

You can access the data in ADX directly from Azure Sentinel using the Log Analytics ADX proxy feature. To do so, use cross cluster queries in your log search or workbooks.

Important

Core SIEM capabilities, including Analytic rules, UEBA, and the investigation graph, do not support data stored in ADX.

Note

Integrating with ADX can also enable you to have control and granularity in your data. For more information, see Design considerations.

Send data directly to Azure Sentinel and ADX in parallel

You may want to retain any data with security value in Azure Sentinel to use in detections, incident investigations, threat hunting, UEBA, and so on. Keeping this data in Azure Sentinel mainly benefits Security Operations Center (SOC) users, where typically, 3-12 months of storage are enough.

You can also configure all of your data, regardless of its security value, to be sent to ADX at the same time, where you can store it for longer. While sending data to both Azure Sentinel and ADX at the same time results in some duplication, the cost savings can be significant as you reduce the retention costs in Azure Sentinel.

Tip

This option also enables you to correlate data spread across data stores, such as to enrich the security data stored in Azure Sentinel with operational or long-term data stored in ADX. For more information, see Cross-resource query Azure Data Explorer by using Azure Monitor.

The following image shows how you can retain all of your data in ADX, while sending only your security data to Azure Sentinel for daily use.

For more information about implementing this architecture option, see Azure Data Explorer monitoring.

Export data from Log Analytics into ADX

Instead of sending your data directly to ADX, you can choose to export your data from Log Analytics into ADX via an Azure Event Hub or Azure Data Factory.

Data export architecture

The following image shows a sample flow of exported data through the Azure Monitor ingestion pipeline. Your data is directed to Log Analytics by default, but you can also configure it to export to an Azure Storage Account or Event Hub.

When configuring the data export rules, select the types of logs you want to export. Once configured, new data arriving at the Log Analytics ingestion endpoint, and targeted to your workspace for the selected tables, is exported to your Storage Account or Event hub.

When configuring data for export, note the following considerations:

Consideration	Details
Scope of data exported	Once export is configured for a specific table, all data sent to that table is exported, with no exception. Exported a filtered subset of your data, or limiting the export to specific events, is not supported.
Location requirements	Both the Azure Monitor / Azure Sentinel workspace, and the destination location (an Azure Storage Account or Event Hub) must be located in the same geographical region.
Supported tables	Not all tables are supported for export, such as custom log tables, which are not supported. For more information, see Log Analytics workspace data export in Azure Monitor and the list of supported tables.

Data export methods and procedures

Use one of the following procedures to export data from Azure Sentinel into ADX:

Via an Azure Event Hub. Export data from Log Analytics into an Event Hub, where you can ingest it into ADX. This method stores some data (the first X months) in both Azure Sentinel and ADX.
Via Azure Storage and Azure Data Factory. Export your data from Log Analytics into Azure Blob Storage, then Azure Data Factory is used to run a periodic copy job to further export the data into ADX. This method enables you to copy data from Azure Data Factory only when it nears its retention limit in Azure Sentinel / Log Analytics, avoiding duplication.

This section describes how to export Azure Sentinel data from Log Analytics into an Event Hub, where you can ingest it into ADX. Similar to sending data directly to Azure Sentinel and ADX in parallel, this method includes some data duplication as the data is streamed into ADX as it arrives in Log Analytics.

The following image shows a sample flow of exported data into an Event Hub, from where it's ingested into ADX.

The architecture shown in the previous image provides the full Azure Sentinel SIEM experience, including incident management, visual investigations, threat hunting, advanced visualizations, UEBA, and more, for data that must be accessed frequently, every X months. At the same time, this architecture also enables you to query long-term data by accessing it directly in ADX, or via Azure Sentinel thanks to the ADX proxy feature. Queries to long-term data storage in ADX can be ported without any changes from Azure Sentinel to ADX.

Note

When exporting multiple data tables into ADX via Event Hub, keep in mind that Log Analytics data export has limitations for the maximum number of Event Hubs per namespace. For more information about data export Log Analytics workspace data export in Azure Monitor.

For most customers, we recommend using the Event Hub Standard tier. Depending on the amount of tables you need to export and the amount of traffic to those tables, you may need to use Event Hub Dedicated tier. For more information, see Event Hub documentation.

Tip

For more information about this procedure, see Tutorial: Ingest and query monitoring data in Azure Data Explorer.

To export data into ADX via an Event Hub:

Configure the Log Analytics data export to an Event Hub. For more information, see Log Analytics workspace data export in Azure Monitor.
Create an ADX cluster and database. For more information, see:
Create target tables. The raw data is first ingested to an intermediate table, where the raw data is stored, manipulated, and expanded.
An update policy, which is similar to a function applied to all new data, is used to ingest the expanded data into the final table, which has the same schema as the original table in Azure Sentinel.
Set the retention on the raw table to 0 days. The data is stored only in the properly formatted table, and deleted in the raw table as soon as it's transformed.
For more information, see Ingest and query monitoring data in Azure Data Explorer.
Create table mapping. Map the JSON tables to define how records land in the raw events table as they come in from an Event Hub. For more information, see Create the update policy for metric and log data.
Create an update policy and attach it to the raw records table. In this step, create a function, called an update policy, and attach it to the destination table so that the data is transformed at ingestion time.
Note
This step is required only when you want to have data tables in ADX with the same schema and format as in Azure Sentinel.
For more information, see Connect an Event Hub to Azure Data Explorer.
Create a data connection between the Event Hub and the raw data table in ADX. Configure ADX with details of how to export the data into the Event Hub.
Use the instructions in the Azure Data Explorer documentation and specify the following details:
- Target. Specify the specific table with the raw data.
- Format. Specify .json as the table format.
- Mapping to be applied. Specify the mapping table created in step 4 above.
Modify retention for the target table. The default Azure Data Explorer retention policy may be far longer than you need.
Use the following command to update the retention policy to one year:

This section describes how to export Azure Sentinel data from Log Analytics into Azure Storage, where Azure Data Factory can run a regular job to export the data into ADX.

Using Azure Storage and Azure Data Factory enables you to copy data from Azure Storage only when it's close to the retention limit in Azure Sentinel / Log Analytics. There is no data duplication, and ADX is used only to access data that's older than the retention limit in Azure Sentinel.

Tip

While the architecture for using Azure Storage and Azure Data Factory for your legacy data is more complex, this method can offer larger cost savings overall.

The following image shows a sample flow of exported data into an Azure Storage, from where Azure Data Factory runs a regular job to further export it into ADX.

To export data into ADX via an Azure Storage and Azure Data Factory:

Azure Data Explorer Storage Costs

Configure the Log Analytics data export to an Event Hub. For more information, see Log Analytics workspace data export in Azure Monitor.
Create an ADX cluster and database. For more information, see:
Create target tables. The raw data is first ingested to an intermediate table, where the raw data is stored, manipulated, and expanded.
An update policy, which is similar to a function applied to all new data, is used to ingest the expanded data into the final table, which has the same schema as the original table in Azure Sentinel.
Set the retention on the raw table to 0 days. The data is stored only in the properly formatted table, and deleted in the raw table as soon as it's transformed.
For more information, see Ingest and query monitoring data in Azure Data Explorer.
Create table mapping. Map the JSON tables to define how records land in the raw events table as they come in from an Event Hub. For more information, see Create the update policy for metric and log data.
Create an update policy and attach it to the raw records table. In this step, create a function, called an update policy, and attach it to the destination table so that the data is transformed at ingestion time.
Note
This step is required only when you want to have data tables in ADX with the same schema and format as in Azure Sentinel.
For more information, see Connect an Event Hub to Azure Data Explorer.
Create a data connection between the Event Hub and the raw data table in ADX. Configure ADX with details of how to export the data into the Event Hub.
Use the instructions in the Azure Data Explorer documentation and specify the following details:
- Target. Specify the specific table with the raw data.
- Format. Specify .json as the table format.
- Mapping to be applied. Specify the mapping table created in step 4 above.
Set up the Azure Data Factory pipeline:
- Create linked services for Azure Storage and Azure Data Explorer. For more information, see:
  - Copy data to or from Azure Data Explorer by using Azure Data Factory.
- Create a dataset from Azure Storage. For more information, see Datasets in Azure Data Factory.
- Create a data pipeline with a copy operation, based on the LastModifiedDate properties.
  For more information, see Copy new and changed files by LastModifiedDate with Azure Data Factory.

Azure Data Explorer Table Storage

Design considerations

When storing your Azure Sentinel data in ADX, consider the following elements:

Consideration	Description
Cluster size and SKU	Plan carefully for the number of nodes and the VM SKU in your cluster. These factors will determine the amount of processing power and the size of your hot cache (SSD and memory). The bigger the cache, the more data you will be able to query at a higher performance. We encourage you to visit the ADX sizing calculator, where you can play with different configurations and see the resulting cost. ADX also has an autoscale capability that makes intelligent decisions to add/remove nodes as needed based on cluster load. For more information, see Manage cluster horizontal scaling (scale out) in Azure Data Explorer to accommodate changing demand.
Hot/cold cache	ADX provides control over the data tables that are in hot cache, and return results faster. If you have large amounts of data in your ADX cluster, you may want to break down tables by month, so that you have greater granularity on the data that's present in your hot cache. For more information, see Cache policy (hot and cold cache)
Retention	In ADX, you can configure when data is removed from a database or an individual table, which is also an important part of limiting storage costs. For more information, see Retention policy.
Security	Several ADX settings can help you protect your data, such as identity management, encryption, and so on. Specifically for role-based access control (RBAC), ADX can be configured to restrict access to databases, tables, or even rows within a table. For more information, see Security in Azure Data Explorer and Row level security.
Data sharing	ADX allows you to make pieces of data available to other parties, such as partners or vendors, and even buy data from other parties. For more information, see Use Azure Data Share to share data with Azure Data Explorer.
Other cost components	Consider the other cost components for the following methods: Exporting data via an Azure Event Hub: - Log Analytics data export costs, charged per exported GBs. - Event hub costs, charged by throughput unit. Export data via Azure Storage and Azure Data Factory: - Log Analytics data export, charged per exported GBs. - Azure Storage, charged by GBs stored. - Azure Data Factory, charged per copy of activities run.

Next steps

Regardless of where you store your data, continue hunting and investigating using Azure Sentinel.

For more information, see: