Del via


read (DataSourceStreamReader)

Generates data for a given partition and returns an iterator of tuples or rows.

This method is invoked once per partition to read the data. Implementing this method is required for stream readers. You can initialize any non-serializable resources required for reading data from the data source within this method.

Added in Databricks Runtime 15.2

Syntax

read(partition: InputPartition)

Parameters

Parameter Type Description
partition InputPartition The partition to read. It must be one of the partition values returned by partitions().

Returns

Iterator[Tuple] or Iterator[RecordBatch]

An iterator of tuples or rows. Each tuple or row will be converted to a row in the final DataFrame. It can also return an iterator of PyArrow RecordBatch objects if the data source supports it.

Notes

This method is static and stateless. Do not access mutable class members or keep in-memory state between different invocations of read().