Del via


when (Column)

Evaluate a list of conditions and return one of multiple possible result expressions.

Syntax

when(condition, value)

Parameters

Parameter Type Description
condition Column Boolean condition
value value Value to return if condition is true

Returns

Column

Examples

Example 1: Using when() with conditions and values to create a new Column.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
result = df.select(df.name, sf.when(df.age > 4, 1).when(df.age < 3, -1).otherwise(0))
result.show()
# +-----+------------------------------------------------------------+
# | name|CASE WHEN (age > 4) THEN 1 WHEN (age < 3) THEN -1 ELSE 0 END|
# +-----+------------------------------------------------------------+
# |Alice|                                                          -1|
# |  Bob|                                                           1|
# +-----+------------------------------------------------------------+

Example 2: Chaining multiple when() conditions.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, "Alice"), (4, "Bob"), (6, "Charlie")], ["age", "name"])
result = df.select(
    df.name,
    sf.when(df.age < 3, "Young").when(df.age < 5, "Middle-aged").otherwise("Old")
)
result.show()
# +-------+---------------------------------------------------------------------------+
# |   name|CASE WHEN (age < 3) THEN Young WHEN (age < 5) THEN Middle-aged ELSE Old END|
# +-------+---------------------------------------------------------------------------+
# |  Alice|                                                                      Young|
# |    Bob|                                                                Middle-aged|
# |Charlie|                                                                        Old|
# +-------+---------------------------------------------------------------------------+

Example 3: Using literal values as conditions.

from pyspark.sql import functions as sf
df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
result = df.select(
    df.name, sf.when(sf.lit(True), 1).otherwise(
        sf.raise_error("unreachable")).alias("when"))
result.show()
# +-----+----+
# | name|when|
# +-----+----+
# |Alice|   1|
# |  Bob|   1|
# +-----+----+