Exercise - Explore SDOH data in lakehouses

Completed

In this exercise, you explore the data that you loaded into OneLake. You can use data in the silver lakehouse to understand the extent to which the food environment and social economic makeup of an area impacts the number of patients who have diabetes.

Analyze data from the silver lakehouse

Now, you can query the fast-food restaurants and median household income from an area to view relevant metrics.

  1. Select the FL HC Cloud workspace that you created.

  2. Locate and open the FLHealthcare_msft_silver lakehouse.

  3. Select Open notebook > New notebook.

    Screenshot of the New notebook option selected.

  4. Paste the following query. Replace FL with your initials.

    %%sql
    SELECT
      sd.SocialDeterminantName,
      sd.SocialDeterminantValue,
      sd.SocialDeterminantDescription,
      parsedJson.CountyName AS CountyName,
      parsedJson.CountyFIPS AS CountyFIPS,
      parsedJson.StateName AS StateName
    FROM
      FLHealthcare_msft_silver.SocialDeterminant sd
    LATERAL VIEW json_tuple(sd.LocationJson, 'STATENAME', 'COUNTYNAME', 'COUNTYFIPS') 
      parsedJson AS StateName, CountyName, CountyFIPS
    WHERE
      sd.SocialDeterminantName IN ('GROC16', 'FFR16', 'ACS_MEDIAN_HH_INC')
    AND sd.SocialDeterminantValue IS NOT NULL;
    
  5. Select Run cell.

    Screenshot of the SQL query with the run option.

  6. The results display as follows. Select + New chart.

    Screenshot of the New chart option selected.

    The first suggested chart resembles the following image.

    Screenshot of the new suggested pie chart for the query results.

    Now, you add the patient into the query and correlate the patient records to the social determinants of health (SDOH) data to get a summary at the state level.

  7. Select Add code cell.

    Screenshot of the Add code cell option highlighted.

  8. Paste the following query. Replace FL with your initials.

    %%sql
      WITH base_query AS (
      SELECT
          sd.SocialDeterminantName,
          CAST(sd.SocialDeterminantValue AS DOUBLE) AS SocialDeterminantValue, 
        -- Ensure Numeric Data
          UPPER(sd.StateName) AS State, 
        -- Standardized State Format
          sd.CountyName AS County,
          sd.CountyFIPS
      FROM (
          SELECT
              SocialDeterminantName,
              SocialDeterminantValue,
              SocialDeterminantDescription,
              parsedJson.CountyName AS CountyName,
              parsedJson.CountyFIPS AS CountyFIPS,
              UPPER(parsedJson.StateName) AS StateName 
          -- Standardized Case
          FROM
              FLHealthcare_msft_silver.SocialDeterminant sd
          LATERAL VIEW json_tuple(sd.LocationJson, 'STATENAME', 'COUNTYNAME', 'COUNTYFIPS') 
              parsedJson AS StateName, CountyName, CountyFIPS
      ) sd
          WHERE sd.SocialDeterminantName IN ('GROC16', 'FFR16', 'ACS_MEDIAN_HH_INC', 'SAIPE_PCT_POV')
          AND sd.SocialDeterminantValue IS NOT NULL
      ),
      social_determinants AS (
          SELECT
              State,
              MAX(CASE WHEN SocialDeterminantName = 'GROC16' THEN SocialDeterminantValue END) AS Grocery_stores_2016,
              MAX(CASE WHEN SocialDeterminantName = 'FFR16' THEN SocialDeterminantValue END) AS Fast_food_restaurants_2016,
              MAX(CASE WHEN SocialDeterminantName = 'ACS_MEDIAN_HH_INC' THEN SocialDeterminantValue END) AS Median_household_income,
              MAX(CASE WHEN SocialDeterminantName = 'SAIPE_PCT_POV' THEN SocialDeterminantValue END) AS Estimated_percentage_of_people_of_all_ages_in_poverty
          FROM base_query
          GROUP BY State
          ),
          ExpandedPatients AS (
          SELECT
              p.idOrig AS id_orig,
              address_item.postalCode AS postalCode,
              UPPER(address_item.state) AS state -- Standardized Case
          FROM
              FLHealthcare_msft_silver.Patient p
          LATERAL VIEW explode(p.address) exploded_address AS address_item
          ),
      patient_conditions AS (
          SELECT
              ep.state AS State,
              COUNT(DISTINCT CASE WHEN c.code.text LIKE '%Asthma%' THEN ep.id_orig END) AS Total_Asthma_Patients,
              COUNT(DISTINCT CASE WHEN c.code.text LIKE '%Diabetes%' THEN ep.id_orig END) AS Total_Diabetes_Patients,
              COUNT(DISTINCT CASE WHEN c.code.text LIKE '%Hypertension%' THEN ep.id_orig END) AS Total_Hypertension_Patients
          FROM
              ExpandedPatients ep
          JOIN FLHealthcare_msft_silver.Condition c 
              ON ep.id_orig = c.subject.idOrig
          GROUP BY ep.state
          )
          SELECT
              sd.State,
              COALESCE(sd.Grocery_stores_2016, 0) AS Grocery_stores_2016,
              COALESCE(sd.Fast_food_restaurants_2016, 0) AS Fast_food_restaurants_2016,
              COALESCE(sd.Median_household_income, 0) AS Median_household_income, -- Ensure No NULLs
              COALESCE(sd.Estimated_percentage_of_people_of_all_ages_in_poverty, 0) AS Estimated_percentage_of_people_of_all_ages_in_poverty,
              COALESCE(pc.Total_Asthma_Patients, 0) AS Total_Asthma_Patients,
              COALESCE(pc.Total_Diabetes_Patients, 0) AS Total_Diabetes_Patients,
              COALESCE(pc.Total_Hypertension_Patients, 0) AS Total_Hypertension_Patients
          FROM
              social_determinants sd
          LEFT JOIN
              patient_conditions pc
          ON
              sd.State = pc.State;
    
  9. Select Run cell.

  10. The following results display. Select + New chart.

    Screenshot of the new chart for the query results selected.

    The first suggested chart displays as follows.

    Screenshot of the second chart for the query results selected.

  11. From the Suggested charts area, select Avg of Fast_food_restaurants_2016.

    Screenshot of the new suggested chart for the query results.

    The chart displays as follows.

    Screenshot of the new suggested average chart for the query results.

  12. Select the Median_household_income by Total_Asthma_Patients chart. This chart is a good example of combining the patient condition with the social determinants data.

    Screenshot of the new Median household income by Total Asthma Patients chart.