Del via


sortWithinPartitions

Returns a new DataFrame with each partition sorted by the specified column(s).

Syntax

sortWithinPartitions(*cols: Union[int, str, Column, List[Union[int, str, Column]]], **kwargs: Any)

Parameters

Parameter Type Description
cols int, str, list or Column, optional list of Column or column names or column ordinals to sort by.
ascending bool or list, optional, default True boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the cols.

Returns

DataFrame: DataFrame sorted by partitions.

Notes

A column ordinal starts from 1, which is different from the 0-based __getitem__. If a column ordinal is negative, it means sort descending.

Examples

from pyspark.sql import functions as sf
df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
df.sortWithinPartitions("age", ascending=False)
# DataFrame[age: bigint, name: string]

df.coalesce(1).sortWithinPartitions(1).show()
# +---+-----+
# |age| name|
# +---+-----+
# |  2|Alice|
# |  5|  Bob|
# +---+-----+

df.coalesce(1).sortWithinPartitions(-1).show()
# +---+-----+
# |age| name|
# +---+-----+
# |  5|  Bob|
# |  2|Alice|
# +---+-----+