Del via


rowsBetween (Window)

Creates a WindowSpec with the frame boundaries defined, from start (inclusive) to end (inclusive).

Both start and end are relative positions from the current row. For example, 0 means "current row", -1 means the row before the current row, and 5 means the fifth row after the current row.

A row-based boundary is based on the position of the row within the partition. An offset indicates the number of rows above or below the current row where the frame starts or ends.

Syntax

Window.rowsBetween(start, end)

Parameters

Parameter Type Description
start int Boundary start, inclusive. The frame is unbounded if this is Window.unboundedPreceding, or any value less than or equal to -9223372036854775808.
end int Boundary end, inclusive. The frame is unbounded if this is Window.unboundedFollowing, or any value greater than or equal to 9223372036854775807.

Returns

WindowSpec

Notes

Use Window.unboundedPreceding, Window.unboundedFollowing, and Window.currentRow to specify special boundary values rather than using integral values directly.

Examples

from pyspark.sql import Window, functions as sf

df = spark.createDataFrame(
    [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])

# Calculate the sum of id from the current row to current row + 1 in each category partition.
window = Window.partitionBy("category").orderBy("id").rowsBetween(Window.currentRow, 1)
df.withColumn("sum", sf.sum("id").over(window)).sort("id", "category", "sum").show()
# +---+--------+---+
# | id|category|sum|
# +---+--------+---+
# |  1|       a|  2|
# |  1|       a|  3|
# |  1|       b|  3|
# |  2|       a|  2|
# |  2|       b|  5|
# |  3|       b|  3|
# +---+--------+---+