Del via


kde

Generates a Kernel Density Estimate (KDE) plot using Gaussian kernels.

In statistics, kernel density estimation is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.

Syntax

kde(bw_method, column=None, ind=None, **kwargs)

Parameters

Parameter Type Description
bw_method int or float The method used to calculate the estimator bandwidth. See KernelDensity in PySpark for more information.
column str or list of str, optional Column name or list of names to use for creating the KDE plot. If None (default), all numeric columns are used.
ind list of float, NumPy array, or int, optional Evaluation points for the estimated PDF. If None (default), 1000 equally spaced points are used. If a NumPy array, the KDE is evaluated at those points. If an integer, that many equally spaced points are used.
**kwargs optional Additional keyword arguments.

Returns

plotly.graph_objs.Figure

Examples

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.kde(bw_method=0.3, ind=100)