Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Writes an iterator of PyArrow RecordBatch objects to the sink.
This method is called once on each executor to write data to the data source. It accepts an iterator of PyArrow RecordBatch objects and returns a single row representing a commit message, or None if there is no commit message.
The driver collects commit messages, if any, from all executors and passes them to the commit() method if all tasks run successfully. If any task fails, the abort() method will be called with the collected commit messages.
Syntax
write(iterator: Iterator[RecordBatch])
Parameters
| Parameter | Type | Description |
|---|---|---|
iterator |
Iterator[RecordBatch] | An iterator of PyArrow RecordBatch objects representing the input data. |
Returns
WriterCommitMessage
A serializable commit message.
Examples
from dataclasses import dataclass
@dataclass
class MyCommitMessage(WriterCommitMessage):
num_rows: int
def write(self, iterator: Iterator["RecordBatch"]) -> "WriterCommitMessage":
total_rows = 0
for batch in iterator:
total_rows += len(batch)
return MyCommitMessage(num_rows=total_rows)