Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Important
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.
This page shows how to use a registered community connector to ingest data from a supported source into Azure Databricks. To create a custom connector for a source that isn't supported yet, see Create a custom connector.
Requirements
- A Azure Databricks workspace with Unity Catalog enabled
- A connection for the source you want to ingest, or permissions to create a connection
- Write access to a catalog and schema for the ingested tables
Create an ingestion pipeline
To use a registered community connector:
In the sidebar of your Azure Databricks workspace, click +New > Add or upload data, then select the source under Community connectors.
Click + Create connection or select an existing connection, then click Next.
For Pipeline name, enter a name for the pipeline.
For Event log location, enter a catalog name and a schema name. Azure Databricks stores the pipeline event log here. Ingested tables are also written here by default.
For Root path, enter your workspace path (for example,
/Workspace/Users/<your-email>/connectors). Azure Databricks clones and stores the connector source code here.Click Create pipeline.
In the pipeline editor, open
ingest.pyand update the objects field to include the tables you want to ingest. For example:from databricks.labs.community_connector.pipeline import ingest pipeline_spec = { "connection_name": "my_stripe_connection", # Required: UC connection name "objects": [ {"table": {"source_table": "charges"}}, {"table": {"source_table": "customers", "destination_table": "stripe_customers"}}, ], } ingest(spark, pipeline_spec)Run the pipeline manually or schedule it.
Pipeline configuration options
You can configure the following options in ingest.py:
| Option | Description |
|---|---|
connection_name |
Required. The name of the connection that stores authentication credentials for the source. |
objects |
Required. A list of tables to ingest. Each entry has the format {"table": {"source_table": "..."}}. You can also specify an optional destination_table inside the table object. |
destination_catalog |
The catalog where ingested tables are written. Defaults to the catalog set during pipeline creation. |
destination_schema |
The schema where ingested tables are written. Defaults to the schema set during pipeline creation. |
scd_type |
The slowly changing dimension strategy: SCD_TYPE_1, SCD_TYPE_2, or APPEND_ONLY. Defaults to SCD_TYPE_1. |
primary_keys |
Override the default primary keys for a table. Provide a list of column names. |