Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Returns a sequence of partitions for this data source.
Partitions are used to split data reading operations into parallel tasks. If this method returns N partitions, the query planner will create N tasks. Each task will execute read() in parallel, using the respective partition value to read the data.
This method is called once during query planning. By default, it returns a single partition with the value None. Subclasses can override this method to return multiple partitions.
It's recommended to override this method for better performance when reading large datasets.
Syntax
partitions()
Returns
Sequence[InputPartition]
A sequence of partitions for this data source. Each partition value must be an instance of InputPartition or a subclass of it.
Notes
All partition values must be picklable objects.
Examples
Returns a list of integers:
def partitions(self):
return [InputPartition(1), InputPartition(2), InputPartition(3)]
Returns a list of strings:
def partitions(self):
return [InputPartition("a"), InputPartition("b"), InputPartition("c")]
Returns a list of ranges:
class RangeInputPartition(InputPartition):
def __init__(self, start, end):
self.start = start
self.end = end
def partitions(self):
return [RangeInputPartition(1, 3), RangeInputPartition(5, 10)]