Del via


persist

Sets the storage level to persist the contents of the DataFrame across operations after the first time it is computed. This can only be used to assign a new storage level if the DataFrame does not have a storage level set yet. If no storage level is specified defaults to (MEMORY_AND_DISK_DESER).

Syntax

persist(storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_DESER)

Parameters

Parameter Type Description
storageLevel StorageLevel Storage level to set for persistence. Default is MEMORY_AND_DISK_DESER.

Returns

DataFrame: Persisted DataFrame.

Notes

The default storage level has changed to MEMORY_AND_DISK_DESER to match Scala in 3.0.

Cached data is shared across all Spark sessions on the cluster.

Examples

df = spark.range(1)
df.persist()
# DataFrame[id: bigint]

df.explain()
# == Physical Plan ==
# InMemoryTableScan ...

from pyspark.storagelevel import StorageLevel
df.persist(StorageLevel.DISK_ONLY)
# DataFrame[id: bigint]