Del via


approxQuantile (DataFrame)

Calculates the approximate quantiles of numerical columns of a DataFrame.

Syntax

approxQuantile(col: Union[str, List[str], Tuple[str]], probabilities: Union[List[float], Tuple[float]], relativeError: float)

Parameters

Parameter Type Description
col str, tuple or list Can be a single column name, or a list of names for multiple columns.
probabilities list or tuple of floats a list of quantile probabilities. Each number must be a float in the range [0, 1]. For example 0.0 is the minimum, 0.5 is the median, 1.0 is the maximum.
relativeError float The relative target precision to achieve (>= 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but gives the same result as 1.

Returns

list: the approximate quantiles at the given probabilities. If the input col is a string, the output is a list of floats. If the input col is a list or tuple of strings, the output is also a list, but each element in it is a list of floats.

Notes

Null values will be ignored in numerical columns before calculation. For columns only containing null values, an empty list is returned.

Examples

data = [(1,), (2,), (3,), (4,), (5,)]
df = spark.createDataFrame(data, ["values"])
quantiles = df.approxQuantile("values", [0.0, 0.5, 1.0], 0.05)
quantiles
# [1.0, 3.0, 5.0]

data = [(1, 10), (2, 20), (3, 30), (4, 40), (5, 50)]
df = spark.createDataFrame(data, ["col1", "col2"])
quantiles = df.approxQuantile(["col1", "col2"], [0.0, 0.5, 1.0], 0.05)
quantiles
# [[1.0, 3.0, 5.0], [10.0, 30.0, 50.0]]