between

Vérifiez si la valeur de colonne est comprise entre les limites inférieures et supérieures (inclusives).

Syntaxe

between(lowerBound, upperBound)

Paramètres

Paramètre Type Description
lowerBound valeur ou colonne Valeur limite inférieure
upperBound valeur ou colonne Valeur liée supérieure

Retours

Colonne (booléen)

Exemples

Utilisation entre les valeurs entières :

df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], ["age", "name"])
df.select(df.name, df.age.between(2, 4)).show()
# +-----+---------------------------+
# | name|((age >= 2) AND (age <= 4))|
# +-----+---------------------------+
# |Alice|                       true|
# |  Bob|                      false|
# +-----+---------------------------+

Utilisation d’une chaîne à l’autre avec des valeurs de chaîne :

df = spark.createDataFrame([("Alice", "A"), ("Bob", "B")], ["name", "initial"])
df.select(df.name, df.initial.between("A", "B")).show()
# +-----+-----------------------------------+
# | name|((initial >= A) AND (initial <= B))|
# +-----+-----------------------------------+
# |Alice|                               true|
# |  Bob|                               true|
# +-----+-----------------------------------+

Utilisation entre les valeurs float :

df = spark.createDataFrame(
    [(2.5, "Alice"), (5.5, "Bob")], ["height", "name"])
df.select(df.name, df.height.between(2.0, 5.0)).show()
# +-----+-------------------------------------+
# | name|((height >= 2.0) AND (height <= 5.0))|
# +-----+-------------------------------------+
# |Alice|                                 true|
# |  Bob|                                false|
# +-----+-------------------------------------+

Utilisation entre les valeurs de date :

import pyspark.sql.functions as sf
df = spark.createDataFrame(
    [("Alice", "2023-01-01"), ("Bob", "2023-02-01")], ["name", "date"])
df = df.withColumn("date", sf.to_date(df.date))
df.select(df.name, df.date.between("2023-01-01", "2023-01-15")).show()
# +-----+-----------------------------------------------+
# | name|((date >= 2023-01-01) AND (date <= 2023-01-15))|
# +-----+-----------------------------------------------+
# |Alice|                                           true|
# |  Bob|                                          false|
# +-----+-----------------------------------------------+

Utilisation entre les valeurs d’horodatage :

import pyspark.sql.functions as sf
df = spark.createDataFrame(
    [("Alice", "2023-01-01 10:00:00"), ("Bob", "2023-02-01 10:00:00")],
    schema=["name", "timestamp"])
df = df.withColumn("timestamp", sf.to_timestamp(df.timestamp))
df.select(df.name, df.timestamp.between("2023-01-01", "2023-02-01")).show()
# +-----+---------------------------------------------------------+
# | name|((timestamp >= 2023-01-01) AND (timestamp <= 2023-02-01))|
# +-----+---------------------------------------------------------+
# |Alice|                                                     true|
# |  Bob|                                                    false|
# +-----+---------------------------------------------------------+