Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Returns a new DataFrame replacing a value with another value. DataFrame.replace and DataFrameNaFunctions.replace are aliases of each other. Values for to_replace and value must have the same type and can only be numerics, booleans, or strings. value can be None. When replacing, the new value is cast to the type of the existing column.
Syntax
replace(to_replace, value=None, subset=None)
Parameters
| Parameter | Type | Description |
|---|---|---|
to_replace |
bool, int, float, str, list, or dict | The value to be replaced. If a dict, then value is ignored and to_replace must be a mapping from a value to its replacement. |
value |
bool, int, float, str, or None, optional | The replacement value. If a list, must be the same length and type as to_replace. If a scalar and to_replace is a sequence, the scalar is used as the replacement for each item. |
subset |
list, optional | Column names to consider. Columns in subset that do not have a matching data type are ignored. |
Returns
DataFrame
Notes
For numeric replacements, all values to be replaced must have unique floating-point representations. In case of conflicts (for example, {42: -1, 42.0: 1}), an arbitrary replacement is used.
Examples
df = spark.createDataFrame([
(10, 80, "Alice"),
(5, None, "Bob"),
(None, 10, "Tom"),
(None, None, None)],
schema=["age", "height", "name"])
Replace 10 with 20 in all columns.
df.na.replace(10, 20).show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# | 20| 80|Alice|
# | 5| NULL| Bob|
# |NULL| 20| Tom|
# |NULL| NULL| NULL|
# +----+------+-----+
Replace 'Alice' with null in all columns.
df.na.replace('Alice', None).show()
# +----+------+----+
# | age|height|name|
# +----+------+----+
# | 10| 80|NULL|
# | 5| NULL| Bob|
# |NULL| 10| Tom|
# |NULL| NULL|NULL|
# +----+------+----+
Replace 'Alice' with 'A' and 'Bob' with 'B' in the name column.
df.na.replace(['Alice', 'Bob'], ['A', 'B'], 'name').show()
# +----+------+----+
# | age|height|name|
# +----+------+----+
# | 10| 80| A|
# | 5| NULL| B|
# |NULL| 10| Tom|
# |NULL| NULL|NULL|
# +----+------+----+
Replace 10 with 18 in the age column.
df.na.replace(10, 18, 'age').show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# | 18| 80|Alice|
# | 5| NULL| Bob|
# |NULL| 10| Tom|
# |NULL| NULL| NULL|
# +----+------+-----+