Exercise - Bring your own SDOH datasets into the lakehouse

Completed

In this exercise, you learn how to prepare a dataset for ingestion into the lakehouse as a social determinant dataset.

Bring your own SDOH data

To bring your own social determinants of health (SDOH) data, follow these steps:

  1. Open the SDOH_State_Level_DATASET_Starter.xlsx file located in the lab resource folder.

  2. Confirm that the Sensitivity setting on the spreadsheet isn't marked as Confidential. If it’s marked as Confidential, the spreadsheet fails during the ingestion process.

  3. Edit the worksheet name by replacing the spaces with an underscore (_). This step ensures that the name is properly formatted for use during ingestion.

    The following image shows the correct format for the worksheet name.

    Screenshot of a new sheet in the SDOH dataset Excel file.

  4. Select New sheet and then create three new worksheets with the following names:

    • Layout
    • LocationConfiguration
    • DataSetMetadata

    Note

    If you paste these names, make sure that they don't have an extra trailing space at the end.

    You now have four worksheets, as shown in the following screenshot.

    Screenshot of the four new Excel sheets with SDOH dataset.

  5. Select the Layout worksheet and then add the following headers and rows.

    Category SubCategory SocialDeterminantDescription SocialDeterminantName Units HarmonizationKey
    Socioeconomic Schools % drop out of school PercentDropout Percentile Socioeconomic
    Socioeconomic Schools % graduated PercentGraduated Percentile Socioeconomic
    Socioeconomic Schools % financial aid PercentFinancialAid Percentile Socioeconomic

    The Layout worksheet should resemble the following image.

    Screenshot of the new layout worksheet with the SDOH dataset.

  6. Select the LocationConfiguration worksheet and then add the header and rows as follows.

    ColumnName StandardColumnName AssociatedWithSDOHValue
    SchoolDistrict SCHOOL_DISTRICT TRUE
    City CITY FALSE
    State STATE FALSE

    The LocationConfiguration worksheet should resemble the following image.

    Screenshot of the new layout sheet with the SDOH dataset.

  7. Select the DataSetMetadata worksheet and then add the header and rows as follows.

    DataSetName PublisherName PublishedDate ValidUntil
    Training State School Data Fabrikam Statistics 4/27/2023

    The DataSetMetadata worksheet should resemble the following screenshot.

    Screenshot of the new dataset-metadata sheet with the SDOH dataset.

  8. Save your changes.

  9. Go to https://app.powerbi.com/ and sign in.

  10. Select the FL HC Cloud workspace that you created. Replace FL with your initials.

  11. Locate and select the FLHeathcare_msft_bronze lakehouse. Replace FL with your initials.

  12. Expand Files > Ingest > SDOH > XLSX and then select the ellipsis (...) on the node. Select New subfolder.

  13. Enter Fabrikam as the folder name and then select Create. In this example, you conform to the unified folder structure and create a new publisher.

  14. Select the ellipsis (...) button on the Fabrikam folder and then select Upload > Upload files.

    Screenshot of selecting Upload files.

  15. Select the folder icon. Locate and select the Excel file that you edited. Select Upload.

  16. After the upload completes, close the Upload files pane.

  17. Select the workspace that you created. Locate and select the FLHealthcare_msft_sdoh_ingest data pipeline.

  18. Review the pipeline activities and then select Run.

    Screenshot of the pipeline with notebooks and the Run command selected.

  19. Wait for the pipeline run to complete.

    Screenshot of the pipeline run result.

Query the silver lakehouse

To query the silver lakehouse, follow these steps:

  1. Select Lakehouse in the upper-right corner and then switch to SQL analytics endpoint.

    Screenshot of the SQL analytics endpoint selected.

  2. Select New SQL query.

  3. Paste the query as follows.

    SELECT
        sd.SocialDeterminantId,
        sd.SocialDeterminantName,
        sd.SocialDeterminantValue,
        sd.HarmonizationKey,
        loc.SocialDeterminantCategoryName AS Category,
        sub.SocialDeterminantSubCategoryName AS SubCategory,
        uom.UnitOfMeasureName AS MeasurementUnit
    FROM SocialDeterminant sd
    JOIN SocialDeterminantSubCategory sub ON sd.SocialDeterminantSubCategoryId = sub.SocialDeterminantSubCategoryId
    JOIN SocialDeterminantCategory loc ON sub.SocialDeterminantCategoryId = loc.SocialDeterminantCategoryId
    JOIN UnitOfMeasure uom ON sd.UnitOfMeasureId = uom.UnitOfMeasureId
    WHERE sd.HarmonizationKey = 'Socioeconomic'
    
  4. Select Run.

    Screenshot showing the SQL query with the Run command selected.

    The query results are as follows.

    Screenshot of the SQL query results.