Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Creates a WindowSpec with the frame boundaries defined, from start (inclusive) to end (inclusive).
Both start and end are relative positions from the current row. For example, 0 means "current row", -1 means the row before the current row, and 5 means the fifth row after the current row.
A row-based boundary is based on the position of the row within the partition. An offset indicates the number of rows above or below the current row where the frame starts or ends.
Syntax
Window.rowsBetween(start, end)
Parameters
| Parameter | Type | Description |
|---|---|---|
start |
int | Boundary start, inclusive. The frame is unbounded if this is Window.unboundedPreceding, or any value less than or equal to -9223372036854775808. |
end |
int | Boundary end, inclusive. The frame is unbounded if this is Window.unboundedFollowing, or any value greater than or equal to 9223372036854775807. |
Returns
WindowSpec
Notes
Use Window.unboundedPreceding, Window.unboundedFollowing, and Window.currentRow to specify special boundary values rather than using integral values directly.
Examples
from pyspark.sql import Window, functions as sf
df = spark.createDataFrame(
[(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
# Calculate the sum of id from the current row to current row + 1 in each category partition.
window = Window.partitionBy("category").orderBy("id").rowsBetween(Window.currentRow, 1)
df.withColumn("sum", sf.sum("id").over(window)).sort("id", "category", "sum").show()
# +---+--------+---+
# | id|category|sum|
# +---+--------+---+
# | 1| a| 2|
# | 1| a| 3|
# | 1| b| 3|
# | 2| a| 2|
# | 2| b| 5|
# | 3| b| 3|
# +---+--------+---+