Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Returns a new DataFrame partitioned by the given partitioning expressions. The resulting DataFrame is hash partitioned.
Syntax
repartition(numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName")
Parameters
| Parameter | Type | Description |
|---|---|---|
numPartitions |
int | can be an int to specify the target number of partitions or a Column. If it is a Column, it will be used as the first partitioning column. If not specified, the default number of partitions is used. |
cols |
str or Column | partitioning columns. |
Returns
DataFrame: Repartitioned DataFrame.
Examples
from pyspark.sql import functions as sf
df = spark.range(0, 64, 1, 9).withColumn(
"name", sf.concat(sf.lit("name_"), sf.col("id").cast("string"))
).withColumn(
"age", sf.col("id") - 32
)
df.repartition(10).select(
sf.spark_partition_id().alias("partition")
).distinct().sort("partition").show()
# +---------+
# |partition|
# +---------+
# | 0|
# ...
# | 9|
# +---------+
df.repartition(7, "age").select(
sf.spark_partition_id().alias("partition")
).distinct().sort("partition").show()
# +---------+
# |partition|
# +---------+
# | 0|
# ...
# | 6|
# +---------+