Merk
Tilgang til denne siden krever autorisasjon. Du kan prøve å logge på eller endre kataloger.
Tilgang til denne siden krever autorisasjon. Du kan prøve å endre kataloger.
Computes specified statistics for numeric and string columns. Available statistics are: count, mean, stddev, min, max, arbitrary approximate percentiles specified as a percentage (e.g., 75%).
Syntax
summary(*statistics: str)
Parameters
| Parameter | Type | Description |
|---|---|---|
statistics |
str, optional | Column names to calculate statistics by (default All columns). |
Returns
DataFrame: A new DataFrame that provides statistics for the given DataFrame.
Notes
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame.
Examples
df = spark.createDataFrame(
[("Bob", 13, 40.3, 150.5), ("Alice", 12, 37.8, 142.3), ("Tom", 11, 44.1, 142.2)],
["name", "age", "weight", "height"],
)
df.select("age", "weight", "height").summary().show()
# +-------+----+------------------+-----------------+
# |summary| age| weight| height|
# +-------+----+------------------+-----------------+
# | count| 3| 3| 3|
# | mean|12.0| 40.73333333333333| 145.0|
# | stddev| 1.0|3.1722757341273704|4.763402145525822|
# | min| 11| 37.8| 142.2|
# | 25%| 11| 37.8| 142.2|
# | 50%| 12| 40.3| 142.3|
# | 75%| 13| 44.1| 150.5|
# | max| 13| 44.1| 150.5|
# +-------+----+------------------+-----------------+
df.select("age", "weight", "height").summary("count", "min", "25%", "75%", "max").show()
# +-------+---+------+------+
# |summary|age|weight|height|
# +-------+---+------+------+
# | count| 3| 3| 3|
# | min| 11| 37.8| 142.2|
# | 25%| 11| 37.8| 142.2|
# | 75%| 13| 44.1| 150.5|
# | max| 13| 44.1| 150.5|
# +-------+---+------+------+