Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Generates a Kernel Density Estimate (KDE) plot using Gaussian kernels.
In statistics, kernel density estimation is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.
Syntax
kde(bw_method, column=None, ind=None, **kwargs)
Parameters
| Parameter | Type | Description |
|---|---|---|
bw_method |
int or float | The method used to calculate the estimator bandwidth. See KernelDensity in PySpark for more information. |
column |
str or list of str, optional | Column name or list of names to use for creating the KDE plot. If None (default), all numeric columns are used. |
ind |
list of float, NumPy array, or int, optional | Evaluation points for the estimated PDF. If None (default), 1000 equally spaced points are used. If a NumPy array, the KDE is evaluated at those points. If an integer, that many equally spaced points are used. |
**kwargs |
optional | Additional keyword arguments. |
Returns
plotly.graph_objs.Figure
Examples
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.kde(bw_method=0.3, ind=100)