Del via


drop (DataFrameNaFunctions)

Returns a new DataFrame omitting rows with null or NaN values. DataFrame.dropna and DataFrameNaFunctions.drop are aliases of each other.

Syntax

drop(how='any', thresh=None, subset=None)

Parameters

Parameter Type Description
how str, optional Whether to drop a row if it contains any nulls or only if all its values are null. Accepted values are 'any' (default) and 'all'. If thresh is specified, how is ignored.
thresh int, optional If specified, drop rows that have fewer than thresh non-null values. Overwrites how.
subset str, tuple, or list, optional Column names to consider when checking for null or NaN values.

Returns

DataFrame

Examples

from pyspark.sql import Row
df = spark.createDataFrame([
    Row(age=10, height=80.0, name="Alice"),
    Row(age=5, height=float("nan"), name="Bob"),
    Row(age=None, height=None, name="Tom"),
    Row(age=None, height=float("nan"), name=None),
])

Drop the row if it contains any null or NaN value.

df.na.drop().show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# | 10|  80.0|Alice|
# +---+------+-----+

Drop the row only if all its values are null or NaN.

df.na.drop(how='all').show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# |  10|  80.0|Alice|
# |   5|   NaN|  Bob|
# |NULL|  NULL|  Tom|
# +----+------+-----+

Drop rows that have fewer than thresh non-null and non-NaN values.

df.na.drop(thresh=2).show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# | 10|  80.0|Alice|
# |  5|   NaN|  Bob|
# +---+------+-----+

Drop rows with null and NaN values in the specified columns.

df.na.drop(subset=['age', 'name']).show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# | 10|  80.0|Alice|
# |  5|   NaN|  Bob|
# +---+------+-----+