Edit

Tutorial: Write to a Delta table stored in Azure Data Lake Storage Gen2

This tutorial shows how to create a Stream Analytics job that writes to a Delta table in Azure Data Lake Storage Gen2. In this tutorial, you learn how to:

  • Deploy an event generator that sends sample data to your event hub
  • Create a Stream Analytics job
  • Configure Azure Data Lake Storage Gen2 with a Delta table
  • Run the Stream Analytics job

Prerequisites

Before you start, complete the following steps:

Create a Stream Analytics job

  1. Sign in to the Azure portal.

  2. Select All services in the left menu.

  3. Move the mouse over Stream Analytics jobs in the Analytics section, and select + (plus).

    Screenshot that shows the selection of Stream Analytics jobs in the All services page.

  4. Select Create a resource in the upper left corner of the Azure portal.

  5. Select Analytics > Stream Analytics job from the results list.

  6. On New Stream Analytics job, follow these steps:

    1. For Subscription, select your Azure subscription.
    2. For Resource group, select the same resource that you used earlier in the TollApp deployment.
    3. For Name, enter a name for the job. Stream Analytics job name can contain alphanumeric characters, hyphens, and underscores only. It must be between 3 and 63 characters long.
    4. For Hosting environment, confirm that Cloud is selected.
    5. For Stream units, select 1. Streaming units represent the computing resources that are required to execute a job. To learn about scaling streaming units, see understanding and adjusting streaming units.

    Screenshot that shows the Create Stream Analytics job page.

  7. Select Review + create at the bottom of the page.

  8. On Review + create, review settings, and select Create to create a Stream Analytics job.

  9. On the deployment page, select Go to resource to go to the Stream Analytics job page.

Configure job input

The next step is to define an input source for the job to read data by using the event hub created in the TollApp deployment.

  1. Find the Stream Analytics job created in the previous section.

  2. In the Job Topology section of the Stream Analytics job, select Inputs.

  3. Select + Add input and Event hub.

    Screenshot that shows the Inputs page.

  4. Fill out the input form with the following values created through TollApp Azure Template:

    1. For Input alias, enter entrystream.

    2. Choose Select Event Hub from your subscriptions.

    3. For Subscription, select your Azure subscription.

    4. For Event Hub namespace, select the event hub namespace you created in the previous section.

    5. Use default options on the remaining settings and select Save.

      Screenshot that shows the selection of the input event hub.

Configure job output

The next step is to define an output sink where the job can write data. In this tutorial, you write output to a Delta table in Azure Data Lake Storage Gen2.

  1. In the Job Topology section of the Stream Analytics job, select the Outputs option.

  2. Select + Add output > Blob storage/ADLS Gen2.

    Screenshot that shows the Outputs page.

  3. Fill the output form with the following details and select Save:

    1. For Output alias, enter DeltaOutput.

    2. Choose Select Blob storage/ADLS Gen2 from your subscriptions.

    3. For Subscription, select your Azure subscription.

    4. For Storage account, choose the ADLS Gen2 account (the one that starts with tollapp) you created.

    5. For container, select Create new and provide a unique container name.

    6. For Event Serialization Format, select Delta Lake. Although Delta Lake is listed as one of the options here, it isn't a data format. Delta Lake uses versioned Parquet files to store your data. To learn more about Delta lake.

    7. For Delta table path, enter tutorial folder/delta table.

    8. Use default options on the remaining settings and select Save.

      Screenshot that shows configuration of the output.

Create queries

At this point, you set up a Stream Analytics job to read an incoming data stream. The next step is to create a query that analyzes the data in real time. The queries use a SQL-like language that has some extensions specific to Stream Analytics.

  1. Select Query under Job topology in the left menu.

  2. Enter the following query into the query window. In this example, the query reads the data from Event Hubs and copies selected values to a Delta table in ADLS Gen2.

     SELECT State, CarModel.Make, TollAmount
     INTO DeltaOutput
     FROM EntryStream TIMESTAMP BY EntryTime
    
  3. Select Save query on the toolbar.

    Screenshot that shows query for the job.

Start the Stream Analytics job and check the output

  1. Return to the job overview page in the Azure portal, and select Start.

    Screenshot that shows the selection of Start job button on the Overview page.

  2. On the Start job page, confirm that Now is selected for Job output start time, and then select Start at the bottom of the page.

    Screenshot that shows the selection of Start job page.

  3. After a few minutes, in the portal, find the storage account and the container that you configured as output for the job. You can now see the delta table in the folder specified in the container. The job takes a few minutes to start for the first time. After it starts, it continues to run as the data arrives.

    Screenshot that shows output data files in the container.

Clean up resources

When you no longer need the resources, delete the resource group, the Stream Analytics job, and all related resources. Deleting the job stops billing for the streaming units the job consumes. If you plan to use the job in the future, you can stop it and restart it later when you need. If you aren't going to continue to use this job, delete all resources that you created in this tutorial by using the following steps:

  1. From the left-hand menu in the Azure portal, select Resource groups and then select the name of the resource you created.
  2. On your resource group page, select Delete, type the name of the resource to delete in the text box, and then select Delete.

Next steps

In this tutorial, you created a simple Stream Analytics job, filtered the incoming data, and wrote results in a Delta table in ADLS Gen2 account. To learn more about Stream Analytics jobs, see: