An Apache Spark-based analytics platform optimized for Azure.
When using Microsoft Purview with Azure Databricks Unity Catalog, the synchronization of column-level tags operates as follows:
- Automatic Synchronization: Unity Catalog column-level tags are not automatically synchronized to Microsoft Purview during the next scheduled scan. Instead, updates to column-level tags in Databricks Unity Catalog will be reflected in Microsoft Purview based on the type of scan you perform.
- Scan Type: To reflect updated tags, you will need to perform an incremental scan. A full scan is not necessary unless you want to capture all metadata again. Incremental scans are designed to capture changes since the last scan, which includes updates to tags.
- Mapping of Tags: Unity Catalog tags are not directly mapped to Purview classifications or custom metadata attributes. They are treated as separate metadata fields within Purview. Therefore, you should manage and review these tags independently within both systems.
- Limitations and Prerequisites: There are certain prerequisites for column-level tag updates to propagate correctly. Ensure that you have the necessary permissions set up in both Azure Databricks and Microsoft Purview. Additionally, if you are using private endpoints, make sure that they are correctly configured to allow communication between the two services. This includes ensuring that Purview has access to the internal DBFS storage location of the Azure Databricks workspace being scanned.
For best practices, regularly schedule incremental scans to keep the metadata in sync and review permissions and configurations to avoid any issues with tag propagation.