Del via


partitions (DataSourceReader)

Returns a sequence of partitions for this data source.

Partitions are used to split data reading operations into parallel tasks. If this method returns N partitions, the query planner will create N tasks. Each task will execute read() in parallel, using the respective partition value to read the data.

This method is called once during query planning. By default, it returns a single partition with the value None. Subclasses can override this method to return multiple partitions.

It's recommended to override this method for better performance when reading large datasets.

Syntax

partitions()

Returns

Sequence[InputPartition]

A sequence of partitions for this data source. Each partition value must be an instance of InputPartition or a subclass of it.

Notes

All partition values must be picklable objects.

Examples

Returns a list of integers:

def partitions(self):
    return [InputPartition(1), InputPartition(2), InputPartition(3)]

Returns a list of strings:

def partitions(self):
    return [InputPartition("a"), InputPartition("b"), InputPartition("c")]

Returns a list of ranges:

class RangeInputPartition(InputPartition):
    def __init__(self, start, end):
        self.start = start
        self.end = end

def partitions(self):
    return [RangeInputPartition(1, 3), RangeInputPartition(5, 10)]