kde

Generates a Kernel Density Estimate (KDE) plot using Gaussian kernels.

In statistics, kernel density estimation is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.

Syntax

kde(bw_method, column=None, ind=None, **kwargs)

Parameters

Parameter	Type	Description
`bw_method`	int or float	The method used to calculate the estimator bandwidth. See `KernelDensity` in PySpark for more information.
`column`	str or list of str, optional	Column name or list of names to use for creating the KDE plot. If `None` (default), all numeric columns are used.
`ind`	list of float, NumPy array, or int, optional	Evaluation points for the estimated PDF. If `None` (default), 1000 equally spaced points are used. If a NumPy array, the KDE is evaluated at those points. If an integer, that many equally spaced points are used.
`**kwargs`	optional	Additional keyword arguments.

Returns

plotly.graph_objs.Figure

Examples

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.kde(bw_method=0.3, ind=100)

Tilbakemeldinger

Var denne siden nyttig?

Last updated on 2026-04-17